Ciur is a scrapper layer based on DSL for extracting data
Project description
Ciur is a scrapper layer in code development
Ciur is a lib because it has less black magic than a framework
It exports all scrapper related code into separate layer.
If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.
Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.
For more information visit the documentation.
Nutshell
Ciur uses own DSL, here is a small example of a example.org.ciur query:
root `/html/body` +1
name `.//h1/text()` +1
paragraph `.//p/text()` +1
This command
$ ciur -p http://example.org -r https://bitbucket.org/ada/python-ciur/raw/HEAD/docs/docker/example.org.ciur
Will produce a json
{
"root": {
"name": "Example Domain",
"paragraph": "This domain is established to be used for illustrative
examples in documents. You may use this
domain in examples without prior coordination or
asking for permission."
}
}
Installation
pip install ciur
Install via docker
$ docker run -it python:3.9 bash
root@e4d327153f2f:/# pip install ciur
root@e4d327153f2f:/# ciur --help
root@e4d327153f2f:/# ciur --help
usage: ciur [-h] -p PARSE -r RULE [-w] [-v]
*Ciur is a scrapper layer based on DSL for extracting data*
*Ciur is a lib because it has less black magic than a framework*
If you are annoyed by `Spaghetti code` than we can taste `Lasagna code`
with help of Ciur
https://bitbucket.org/ada/python-ciur
optional arguments:
-h, --help show this help message and exit
-p PARSE, --parse PARSE
url or local file path required document for html, xml, pdf. (f.e. http://example.org or /tmp/example.org.html)
-r RULE, --rule RULE url or local file path file with parsing dsl rule (f.e. /tmp/example.org.ciur or http:/host/example.org.ciur)
-w, --ignore_warn suppress python warning warnings and ciur warnings hints
-v, --version show program's version number and exit
Ciur use MIT License
This means that code may be included in proprietary code without any additional restrictions.
Please see LICENSE.
Contribution
The code of Cuir was concepted in 2012, and is going to continue developing.
All contributions are welcome and should be done via Bitbucket (Pull Request, Issues).
A second alternative as exception (maybe if bitbucket is not available) can be done via email ciur[mail symbol]asta-s.eu.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ciur-0.2.0.tar.gz
.
File metadata
- Download URL: ciur-0.2.0.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0b94fb3d0a8d14233a2c17564daf53d31db2ed99d4c38000c22108071d63e67 |
|
MD5 | 67dda96cbebb0ff773bb1347d3d92c28 |
|
BLAKE2b-256 | ba30899d8e47815512d25dbd92ab850f71b6753da195c25269c6982a3ac9d49e |
File details
Details for the file ciur-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: ciur-0.2.0-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b1331e3f36867460bfaf21ee70482bd1aa6081db946fcde6f078268af6d0612 |
|
MD5 | 6155e7356e39dbeb9fb1db17c829df53 |
|
BLAKE2b-256 | 848b1052af86a29e1b1a2191f4df432c7b777cfea59375f551c3f9a0608bf4ba |