Skip to main content

Scrapy exporter for Big Data formats

Project description

Overview

scrapy-contrib-bigexporters provides additional exporters for the web crawling and scraping framework Scrapy (https://scrapy.org).

The following big data formats are supported:

Requirements

  • Python 3.6+

  • Scrapy 2.4+

  • Works on Linux, Windows, macOS, BSD

  • Parquet export requires fastparquet 0.4.1+

  • Avro export requires fastavro 1.1.0

  • ORC export requires pyorc 0.4.0+

Install

The quick way (pip):

pip install scrapy-contrib-bigexporters

Alternatively, you can install it from conda-forge:

conda install -c conda-forge scrapy-contrib-bigexporters

Depending on which format you want to use you need to install one or more of the following libraries.

Avro:

pip install fastavro

ORC:

pip install pyorc

Parquet:

pip install fastparquet

Additional libraries may be needed for specific compression algorithms. See “Use”.

Use

Use of the library is simple. Install it with your Scrapy project as described above.You only need to configure the exporter in the Scrapy settings, run your scraper and the data will be exported into your desired format. There is no development needed.

See here for configuring the exporter in settings:

Source

The source is available at:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-contrib-bigexporters-0.4.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_contrib_bigexporters-0.4.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-contrib-bigexporters-0.4.0.tar.gz.

File metadata

File hashes

Hashes for scrapy-contrib-bigexporters-0.4.0.tar.gz
Algorithm Hash digest
SHA256 977324c20c554f66cdcec178950cee2ede2dede920d64de82f032e571a0efb81
MD5 1b62bc7211d672b8c37095b4bc87b724
BLAKE2b-256 8d8a2d50874376cfd39f44d8ec28705ecd906589a97dd21ae91976bda813629f

See more details on using hashes here.

File details

Details for the file scrapy_contrib_bigexporters-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_contrib_bigexporters-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c16836d4a4271b304182db4cd3b46a8735da789d1bb6257488886c4a4bef82f
MD5 e8376ea69b480ff8d79b458399153054
BLAKE2b-256 56fb27fb38b9d43ecd672234bbeacaa5b249c0ad22e0a056c1f1cf37623893db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page