Skip to main content

UNKNOWN

Project description

Exporters provide a flexible way to export data from multiple sources to multiple destinations, allowing filtering and transforming the data.

This Github repository is used as a central repository.

Getting Started

Install exporters

First of all, we recommend to create a virtualenv:

virtualenv exporters
source exporters/bin/activate

Exporters can be cloned from its Github repository:

git clone git@github.com:scrapinghub/exporters.git

Then, we install the requirements:

cd exporters
pip install -r requirements.txt

Creating a configuration

Then, we can create our first configuration object and store it in a file called config.json.

This configuration will read from an s3 bucket and store it in our filesystem, exporting only the records which have United States in field country:

{
     "reader": {
         "name": "exporters.readers.s3_reader.S3Reader",
         "options": {
             "bucket": "YOUR_BUCKET",
             "aws_access_key_id": "YOUR_ACCESS_KEY",
             "aws_secret_access_key": "YOUR_SECRET_KEY",
             "prefix": "exporters-tutorial/sample-dataset"
         }
     },
     "filter": {
         "name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
         "options": {
             "keys": [
                 {"name": "country", "value": "United States"}
             ]
         }
     },
     "writer":{
         "name": "exporters.writers.fs_writer.FSWriter",
         "options": {
             "filebase": "/tmp/output/"
         }
     }
}

Export with script

We can use the provided script to run this export:

python bin/export.py --config config.json

Use it as a library

The export can be run using exporters as a library:

from exporters.export_managers.basic_exporter import BasicExporter

exporter = BasicExporter.from_file_configuration('config.json')
exporter.export()

Resuming an export job

Let’s suppose we have a pickle file with a previously failed export job. If we want to resume it we must run the export script:

python bin/export.py --resume pickle://pickle-file.pickle

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exporters-0.4.12.tar.gz (56.5 kB view hashes)

Uploaded Source

Built Distribution

exporters-0.4.12-py2-none-any.whl (98.0 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page