Skip to main content

Ciur is a scrapper layer based on DSL for extracting data

Project description

Ciur

Ciur is a scrapper layer in code development

Ciur is a lib because it has less black magic than a framework

It exports all scrapper related code into separate layer.

If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.

Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.

For more information visit the documentation.

Nutshell

Ciur uses own DSL, here is a small example of a example.org.ciur query:

root `/html/body` +1
    name `.//h1/text()` +1
    paragraph `.//p/text()` +1

This command

$ ciur -p http://example.org -r https://bitbucket.org/ada/python-ciur/raw/HEAD/docs/docker/example.org.ciur

Will produce a json

{
    "root": {
        "name": "Example Domain",
        "paragraph": "This domain is established to be used for illustrative
                       examples in documents. You may use this
                       domain in examples without prior coordination or
                      asking for permission."
    }
}

Installation

pip install ciur

Install via docker

$ docker run -it python:3.9 bash
root@e4d327153f2f:/# pip install ciur
root@e4d327153f2f:/# ciur --help

root@e4d327153f2f:/# ciur --help
usage: ciur [-h] -p PARSE -r RULE [-w] [-v]

*Ciur is a scrapper layer based on DSL for extracting data*

*Ciur is a lib because it has less black magic than a framework*

If you are annoyed by `Spaghetti code` than we can taste `Lasagna code`
with help of Ciur

https://bitbucket.org/ada/python-ciur

optional arguments:
  -h, --help            show this help message and exit
  -p PARSE, --parse PARSE
                        url or local file path required document for html, xml, pdf. (f.e. http://example.org or /tmp/example.org.html)
  -r RULE, --rule RULE  url or local file path file with parsing dsl rule (f.e. /tmp/example.org.ciur or http:/host/example.org.ciur)
  -w, --ignore_warn     suppress python warning warnings and ciur warnings hints
  -v, --version         show program's version number and exit

Ciur use MIT License

This means that code may be included in proprietary code without any additional restrictions.

Please see LICENSE.

Contribution

The code of Cuir was concepted in 2012, and is going to continue developing.

All contributions are welcome and should be done via Bitbucket (Pull Request, Issues).

A second alternative as exception (maybe if bitbucket is not available) can be done via email ciur[mail symbol]asta-s.eu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ciur-0.2.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

ciur-0.2.0-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file ciur-0.2.0.tar.gz.

File metadata

  • Download URL: ciur-0.2.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ciur-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b0b94fb3d0a8d14233a2c17564daf53d31db2ed99d4c38000c22108071d63e67
MD5 67dda96cbebb0ff773bb1347d3d92c28
BLAKE2b-256 ba30899d8e47815512d25dbd92ab850f71b6753da195c25269c6982a3ac9d49e

See more details on using hashes here.

File details

Details for the file ciur-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ciur-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ciur-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b1331e3f36867460bfaf21ee70482bd1aa6081db946fcde6f078268af6d0612
MD5 6155e7356e39dbeb9fb1db17c829df53
BLAKE2b-256 848b1052af86a29e1b1a2191f4df432c7b777cfea59375f551c3f9a0608bf4ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page