Skip to main content

scraping code to use with sciop-coordinated scrapes

Project description

sciop-scraping

A (yet-to-be-named) tool to enable scraping of very large datasets to be distributed across multiple volunteers and then reassembled as dataset parts on sciop.

NB. this is currently a work in progress, and it depends on planned features in sciop that are not yet stabilised. If you're interested in contributing, experience of or interest in web scraping, Python CLI tools and/or REST APIs would be very helpful.

We absolutely want this to be as easy to use as possible, so as soon as we can we'll be adding detailed documentation and putting out a call for wider testing. Watch this space!

In the meantime, please subscribe to the Safeguarding Research & Data forum for pointers to datasets that need saving along with help & advice with collecting them, preparing for upload and creating torrents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciop_scraping-0.1.9.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sciop_scraping-0.1.9-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file sciop_scraping-0.1.9.tar.gz.

File metadata

  • Download URL: sciop_scraping-0.1.9.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.3 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a05cf838f9078d4b82368a73c36ddb6e0f8f8b168dfdd1bdafb46cc87cba28b7
MD5 9d3aec79fc3694281a8a6b5ddb6c577f
BLAKE2b-256 21db6a4461a3e3bc3e93f998456dafa929bd4f23009e84d1aa46d92309a4c4ec

See more details on using hashes here.

File details

Details for the file sciop_scraping-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: sciop_scraping-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.3 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 31e6526a44cbd9a47386ad7d79f89865a605ebf84622656cedee4abff2ce10d8
MD5 c98b0debb7e3bf5db511ae1dff9352c8
BLAKE2b-256 49c14ba1cf6c506b42ec79641f99e6403beb9df2f3a885ca175bb96a8ee7796a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page