Skip to main content

Collect data from various sources

Project description

Rollet

Rollet collects, standardizes and completes from various sources.

PyPI PyPI - Status PyPI - Python Version

Installation

Pypi

The safest way to install rollet is to go through pip

python -m pip install rollet

How to use?

Command script

usage: rollet {extract-txt,extract-csv,extract-json} path
              [-h] [-o [OUTFILE]] [-l [LINK]] [-f [FIELDS]] [--start [START]]
              [--size [SIZE]] [-t [TIMESLEEP]]

positional arguments:
  {extract-txt,extract-csv,extract-json} Choose file type option extraction
  path                                   file path

optional arguments:
  -h, --help                   show this help message and exit
  -o [OUTFILE], --outfile      output file path
  -l [LINK], --link  link      field if csv or json
  -f [FIELDS], --fields        fields to keep separated by comma
  --start [START]              number of rows to skip
  --size  [SIZE]               max number of rows to keep
  -t [TIMESLEEP], --timesleep  sleep time in seconds between two pulling

Python

Basic usage

from rollet import get_content
from rollet.extractor import BaseExtractor

url = 'https://example.url.com/content-id'

content_dict = get_content(url)

content_object = BaseExtractor(url)
content_object.title            # Title
content_object.abstract         # Abstract
content_object.lang             # Language
content_object.content_type     # Type (pdf, json, html, ...)
content_object.to_dict()        # Same as get_content

Custom extractors

class CustomExtractor(BaseExtractor):

    @property
    def title(self):
        return self._page.find('title')

And More!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rollet-0.0.1a8.tar.gz (59.7 kB view hashes)

Uploaded Source

Built Distribution

rollet-0.0.1a8-py3-none-any.whl (59.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page