Skip to main content

scraping code to use with sciop-coordinated scrapes

Project description

sciop-scraping

A (yet-to-be-named) tool to enable scraping of very large datasets to be distributed across multiple volunteers and then reassembled as dataset parts on sciop.

NB. this is currently a work in progress, and it depends on planned features in sciop that are not yet stabilised. If you're interested in contributing, experience of or interest in web scraping, Python CLI tools and/or REST APIs would be very helpful.

We absolutely want this to be as easy to use as possible, so as soon as we can we'll be adding detailed documentation and putting out a call for wider testing. Watch this space!

In the meantime, please subscribe to the Safeguarding Research & Data forum for pointers to datasets that need saving along with help & advice with collecting them, preparing for upload and creating torrents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciop_scraping-0.1.10.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sciop_scraping-0.1.10-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file sciop_scraping-0.1.10.tar.gz.

File metadata

  • Download URL: sciop_scraping-0.1.10.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.4 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.10.tar.gz
Algorithm Hash digest
SHA256 49d2543f469c8a3cb738a8ab300cd7ad8dc7b48b93c7ad17c87a269d2b5873f7
MD5 cb2f72b1e437d0879cfe2b3b2b2f4ba4
BLAKE2b-256 04ca3c148fdb96c465dd14795ee3fe986b9d31d52e8f2b1b8c1bb8d76cb7fbd2

See more details on using hashes here.

File details

Details for the file sciop_scraping-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: sciop_scraping-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.4 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 eb879e96976bb1b9401be90141c8b016ecea08a4de5a74ea393b91601990f058
MD5 75576a1aacc092f6b613bc7b1a8a4a5c
BLAKE2b-256 915151af669b1d3e335f1851c91e2a4e54bef85a90cc3c3101f5adf9eafdb80f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page