Skip to main content

No project description provided

Project description

Ciur

Ciur is a scrapper layer in code development

Ciur is a lib because it has less black magic than a framework

It exports all scrapper related code into separate layer.

If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.

Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.

For more information visit the documentation.

Nutshell

Ciur uses own DSL, here is a small example of a example.org.ciur query:

root `/html/body` +1
    name `.//h1/text()` +1
    paragraph `.//p/text()` +1

This command

$ ciur -p https://example.org -r https://bitbucket.org/ada/python-ciur/raw/HEAD/docs/docker/example.org.ciur

Will produce a json

{
    "root": {
        "name": "Example Domain",
        "paragraph": "This domain is established to be used for illustrative
                       examples in documents. You may use this
                       domain in examples without prior coordination or
                      asking for permission."
    }
}

Installation

Ensure that you have lxml OS dependencies and cryptography OS dependencies available.

pip install ciur

Install via docker

$ docker run -it python:3.13.2 bash
root@e4d327153f2f:/# pip install ciur
root@e4d327153f2f:/# ciur --help

root@e4d327153f2f:/# ciur --help
usage: ciur [-h] -p PARSE -r RULE [-w] [-v]

*Ciur is a scrapper layer based on DSL for extracting data*

*Ciur is a lib because it has less black magic than a framework*

If you are annoyed by `Spaghetti code` than we can taste `Lasagna code`
with help of Ciur

https://bitbucket.org/ada/python-ciur

optional arguments:
  -h, --help            show this help message and exit
  -p PARSE, --parse PARSE
                        url or local file path required document for html, xml, pdf. (f.e. https://example.org or /tmp/example.org.html)
  -r RULE, --rule RULE  url or local file path file with parsing dsl rule (f.e. /tmp/example.org.ciur or https:/host/example.org.ciur)
  -w, --ignore_warn     suppress python warning warnings and ciur warnings hints
  -v, --version         show program's version number and exit

Ciur use MIT License

This means that code may be included in proprietary code without any additional restrictions.

Please see LICENSE.

Contribution

The code of Cuir was conceived in 2012, and is going to continue developing.

All contributions are welcome and should be done via Bitbucket (Pull Request, Issues).

A second alternative as exception (maybe if bitbucket is not available) can be done via email ciur[mail symbol]asta-s.eu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ciur-0.2.3-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file ciur-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: ciur-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ciur-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 868b268c618fb0d88be1a15cdd958bb7d8ce4c8a6d85808f39b6965cb1fdb7a9
MD5 fe924432c2d7a64bb2646be0c410f1ef
BLAKE2b-256 71fe6f5e27d1e3aede6a7143ff6a39a63c909235fc7a0f1e0741eef97d39c19f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ciur-0.2.3-py3-none-any.whl:

Publisher: actions.yml on a-da/python-ciur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page