No project description provided
Project description
Ciur is a scrapper layer in code development
Ciur is a lib because it has less black magic than a framework
It exports all scrapper related code into separate layer.
If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.
Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.
For more information visit the documentation.
Nutshell
Ciur uses own DSL, here is a small example of a example.org.ciur query:
root `/html/body` +1
name `.//h1/text()` +1
paragraph `.//p/text()` +1
This command
$ ciur -p https://example.org -r https://bitbucket.org/ada/python-ciur/raw/HEAD/docs/docker/example.org.ciur
Will produce a json
{
"root": {
"name": "Example Domain",
"paragraph": "This domain is established to be used for illustrative
examples in documents. You may use this
domain in examples without prior coordination or
asking for permission."
}
}
Installation
Ensure that you have lxml OS dependencies and cryptography OS dependencies available.
pip install ciur
Install via docker
$ docker run -it python:3.13.2 bash
root@e4d327153f2f:/# pip install ciur
root@e4d327153f2f:/# ciur --help
root@e4d327153f2f:/# ciur --help
usage: ciur [-h] -p PARSE -r RULE [-w] [-v]
*Ciur is a scrapper layer based on DSL for extracting data*
*Ciur is a lib because it has less black magic than a framework*
If you are annoyed by `Spaghetti code` than we can taste `Lasagna code`
with help of Ciur
https://bitbucket.org/ada/python-ciur
optional arguments:
-h, --help show this help message and exit
-p PARSE, --parse PARSE
url or local file path required document for html, xml, pdf. (f.e. https://example.org or /tmp/example.org.html)
-r RULE, --rule RULE url or local file path file with parsing dsl rule (f.e. /tmp/example.org.ciur or https:/host/example.org.ciur)
-w, --ignore_warn suppress python warning warnings and ciur warnings hints
-v, --version show program's version number and exit
Ciur use MIT License
This means that code may be included in proprietary code without any additional restrictions.
Please see LICENSE.
Contribution
The code of Cuir was conceived in 2012, and is going to continue developing.
All contributions are welcome and should be done via Bitbucket (Pull Request, Issues).
A second alternative as exception (maybe if bitbucket is not available) can be done via email ciur[mail symbol]asta-s.eu.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ciur-0.2.3-py3-none-any.whl.
File metadata
- Download URL: ciur-0.2.3-py3-none-any.whl
- Upload date:
- Size: 29.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
868b268c618fb0d88be1a15cdd958bb7d8ce4c8a6d85808f39b6965cb1fdb7a9
|
|
| MD5 |
fe924432c2d7a64bb2646be0c410f1ef
|
|
| BLAKE2b-256 |
71fe6f5e27d1e3aede6a7143ff6a39a63c909235fc7a0f1e0741eef97d39c19f
|
Provenance
The following attestation bundles were made for ciur-0.2.3-py3-none-any.whl:
Publisher:
actions.yml on a-da/python-ciur
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ciur-0.2.3-py3-none-any.whl -
Subject digest:
868b268c618fb0d88be1a15cdd958bb7d8ce4c8a6d85808f39b6965cb1fdb7a9 - Sigstore transparency entry: 233502782
- Sigstore integration time:
-
Permalink:
a-da/python-ciur@00f7d4f15e63f0fd9f486dacc033c865975a8bf5 -
Branch / Tag:
refs/heads/release/0/2/latest - Owner: https://github.com/a-da
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
actions.yml@00f7d4f15e63f0fd9f486dacc033c865975a8bf5 -
Trigger Event:
pull_request
-
Statement type: