Skip to main content

scraping code to use with sciop-coordinated scrapes

Project description

sciop-scraping

A (yet-to-be-named) tool to enable scraping of very large datasets to be distributed across multiple volunteers and then reassembled as dataset parts on sciop.

NB. this is currently a work in progress, and it depends on planned features in sciop that are not yet stabilised. If you're interested in contributing, experience of or interest in web scraping, Python CLI tools and/or REST APIs would be very helpful.

We absolutely want this to be as easy to use as possible, so as soon as we can we'll be adding detailed documentation and putting out a call for wider testing. Watch this space!

In the meantime, please subscribe to the Safeguarding Research & Data forum for pointers to datasets that need saving along with help & advice with collecting them, preparing for upload and creating torrents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciop_scraping-0.1.7.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sciop_scraping-0.1.7-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file sciop_scraping-0.1.7.tar.gz.

File metadata

  • Download URL: sciop_scraping-0.1.7.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.3 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.7.tar.gz
Algorithm Hash digest
SHA256 392326115031a1766b04e4849e8e0a066f6191e84140253ef490d161c57636f7
MD5 480c8c60c806d17415d6bf9a95ab4b51
BLAKE2b-256 0a6d84db55834b62d75f7e4d643cf325f83606690647fefe4bef3ea347ed2f2b

See more details on using hashes here.

File details

Details for the file sciop_scraping-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: sciop_scraping-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.3 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 175db905bd5fb1a23ea36e4fd4094b7906511741c38a25441b13c79286efa73c
MD5 8f4a56b52b8927964e8ecca3a9f2da75
BLAKE2b-256 62140097bf3c6a49c34654d0f6cba7d27d97014e537880a7622f68cf7fe1e5ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page