Collect data from various sources
Project description
Rollet
Rollet
collects, standardizes and completes from various sources.
Installation
Pypi
The safest way to install rollet
is to go through pip
python -m pip install rollet
How to use?
Command script
usage: rollet {extract-txt,extract-csv,extract-json} path
[-h] [-o [OUTFILE]] [-l [LINK]] [-f [FIELDS]] [--start [START]]
[--size [SIZE]] [-t [TIMESLEEP]]
positional arguments:
{extract-txt,extract-csv,extract-json} Choose file type option extraction
path file path
optional arguments:
-h, --help show this help message and exit
-o [OUTFILE], --outfile output file path
-l [LINK], --link link field if csv or json
-f [FIELDS], --fields fields to keep separated by comma
--start [START] number of rows to skip
--size [SIZE] max number of rows to keep
-t [TIMESLEEP], --timesleep sleep time in seconds between two pulling
Python
Basic usage
from rollet import get_content
from rollet.extractor import BaseExtractor
url = 'https://example.url.com/content-id'
content_dict = get_content(url)
content_object = BaseExtractor(url)
content_object.title # Title
content_object.abstract # Abstract
content_object.lang # Language
content_object.content_type # Type (pdf, json, html, ...)
content_object.to_dict() # Same as get_content
Custom extractors
class CustomExtractor(BaseExtractor):
@property
def title(self):
return self._page.find('title')
And More!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rollet-0.0.2a0.tar.gz
(59.5 kB
view hashes)
Built Distribution
rollet-0.0.2a0-py3-none-any.whl
(59.4 kB
view hashes)