Skip to main content

Ciur is a scrapper layer based on DSL for extracting data

Project description

Ciur

Ciur is a scrapper layer in code development

Ciur is a lib because it has less black magic than a framework

It exports all scrapper related code into separate layer.

If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.

Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.

For more information visit the documentation.

Nutshell

Ciur uses own DSL, here is a small example of a example.org.ciur query:

root `/html/body` +1
    name `.//h1/text()` +1
    paragraph `.//p/text()` +1

This command

$ ciur -p http://example.org -r https://bitbucket.org/ada/python-ciur/raw/HEAD/docs/docker/example.org.ciur

Will produce a json

{
    "root": {
        "name": "Example Domain",
        "paragraph": "This domain is established to be used for illustrative
                       examples in documents. You may use this
                       domain in examples without prior coordination or
                      asking for permission."
    }
}

Installation

The recommendable way to install is via Python Virtual environment.

Install via docker

$ docker run -it python:3.7 bash
root@e4d327153f2f:/# pip install ciur
root@e4d327153f2f:/# ciur --help

root@e4d327153f2f:/# ciur --help
usage: ciur [-h] -p PARSE -r RULE [-w] [-v]

*Ciur is a scrapper layer based on DSL for extracting data*

*Ciur is a lib because it has less black magic than a framework*

If you are annoyed by `Spaghetti code` than we can taste `Lasagna code`
with help of Ciur

https://bitbucket.org/ada/python-ciur

optional arguments:
  -h, --help            show this help message and exit
  -p PARSE, --parse PARSE
                        url or local file path required document for html, xml, pdf. (f.e. http://example.org or /tmp/example.org.html)
  -r RULE, --rule RULE  url or local file path file with parsing dsl rule (f.e. /tmp/example.org.ciur or http:/host/example.org.ciur)
  -w, --ignore_warn     suppress python warning warnings and ciur warnings hints
  -v, --version         show program's version number and exit

Ciur use MIT License

This means that code may be included in proprietary code without any additional restrictions.

Please see LICENSE.

Contribution

The code of Cuir have been concepted in 2012 and, is going to be in continue developing.

All contribution are welcome and should be done via Bitbucket (Pull Request, Issues).

A second alternative as exception (maybe if bitbucket is not available) can be done via email ciur[mail symbol].asta-s.eu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ciur-0.1.8-py3-none-any.whl (26.2 kB) Copy SHA256 hash SHA256 Wheel py3
ciur-0.1.8.tar.gz (21.6 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page