Skip to main content

Scrapy exporter for Big Data formats

Project description

Overview

scrapy-contrib-bigexporters provides additional exporters for the web crawling and scraping framework Scrapy (https://scrapy.org).

The following big data formats are supported:

Requirements

  • Python 3.11+

  • Scrapy 2.11+

  • Works on Linux, Windows, macOS, BSD

  • Parquet export requires fastparquet 2024.02+

  • Avro export requires fastavro 1.9+

  • ORC export requires pyorc 0.9+

Install

The quick way (pip):

pip install scrapy-contrib-bigexporters

Alternatively, you can install it from conda-forge:

conda install -c conda-forge scrapy-contrib-bigexporters

Depending on which format you want to use you need to install one or more of the following libraries.

Avro:

pip install fastavro

ORC:

pip install pyorc

Parquet:

pip install fastparquet

Additional libraries may be needed for specific compression algorithms. See “Use”.

Use

Use of the library is simple. Install it with your Scrapy project as described above.You only need to configure the exporter in the Scrapy settings, run your scraper and the data will be exported into your desired format. There is no development needed.

See here for configuring the exporter in settings:

Source

The source is available at:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_contrib_bigexporters-0.5.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file scrapy_contrib_bigexporters-0.5.0.tar.gz.

File metadata

File hashes

Hashes for scrapy_contrib_bigexporters-0.5.0.tar.gz
Algorithm Hash digest
SHA256 e39bf629058dc41d97fc7af786cd79044c98f9aa964fb307e301354c1c117c14
MD5 4db3cd04c5fedb2c0f787d03b351ca11
BLAKE2b-256 6ac234631c676f713c280aaf5545f9ca342fe3188be594d0ba9bdd2d7e6347fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_contrib_bigexporters-0.5.0.tar.gz:

Publisher: publish_pypi.yml on ZuInnoTe/scrapy-contrib-bigexporters

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapy_contrib_bigexporters-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_contrib_bigexporters-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e60a92af869b61df098226bf285f49ebd634d262ba9db3d28751f39b79651d3
MD5 8e0475c1af39a24048b90e368aef7c9c
BLAKE2b-256 cb8849c02694f1fd55283fcaf37fff0836c02b3bca3db05c57cc444748eb42da

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_contrib_bigexporters-0.5.0-py3-none-any.whl:

Publisher: publish_pypi.yml on ZuInnoTe/scrapy-contrib-bigexporters

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page