Skip to main content

Ciur is a scrapper layer based on DSL for extracting data

Project description

Ciur

Ciur is a scrapper layer in development

Ciur is a lib because it has less black magic than a framework

It exports all scrapper related code into separate layer.

If you are annoyed by Spaghetti code, sql inside php and inline css inside html THEN you also are annoyed by xpath/css code inside crawler.

Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.

For more information visit the documentation.

Nutshell

Ciur uses own DSL, here is a small example of a example.org.ciur query:

root `/html/body` +1
    name `.//h1/text()` +1
    paragraph `.//p/text()` +1

This command

$ ciur --url "http://example.org" --rule "example.org.ciur"

Will produce a json

{
    "root": {
        "name": "Example Domain",
        "paragraph": "This domain is established to be used for illustrative
                       examples in documents. You may use this
                       domain in examples without prior coordination or
                      asking for permission."
    }
}

Installation

The recommendable way to install is via Python Virtual environment.

Ciur use MIT License

This means that code may be included in proprietary code without any additional restrictions.

Please see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ciur-0.1.5.tar.gz (20.4 kB view hashes)

Uploaded Source

Built Distribution

ciur-0.1.5-py3-none-any.whl (25.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page