Skip to main content

scraping code to use with sciop-coordinated scrapes

Project description

sciop-scraping

A (yet-to-be-named) tool to enable scraping of very large datasets to be distributed across multiple volunteers and then reassembled as dataset parts on sciop.

NB. this is currently a work in progress, and it depends on planned features in sciop that are not yet stabilised. If you're interested in contributing, experience of or interest in web scraping, Python CLI tools and/or REST APIs would be very helpful.

We absolutely want this to be as easy to use as possible, so as soon as we can we'll be adding detailed documentation and putting out a call for wider testing. Watch this space!

In the meantime, please subscribe to the Safeguarding Research & Data forum for pointers to datasets that need saving along with help & advice with collecting them, preparing for upload and creating torrents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciop_scraping-0.1.4.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sciop_scraping-0.1.4-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file sciop_scraping-0.1.4.tar.gz.

File metadata

  • Download URL: sciop_scraping-0.1.4.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.2 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.4.tar.gz
Algorithm Hash digest
SHA256 94f0011a446f52d985ce44e9fff40c8ab8b45322e941eaeb04d21ab8f4ff1c61
MD5 61c7f6a4504e58abb7bb2b0fb65b8ed4
BLAKE2b-256 ba10bd47324afcf8d0b88056ca5a8e303828db58b622d75d49d55a13853d6dc1

See more details on using hashes here.

File details

Details for the file sciop_scraping-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: sciop_scraping-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.25.2 CPython/3.13.5 Linux/6.12.12+bpo-amd64

File hashes

Hashes for sciop_scraping-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 71a6892162bd8da791086bfde8f5cf5cff4737bfcea6881308b1e3874901544d
MD5 9ae8c7bf56aa66f57ef709e73cb6be64
BLAKE2b-256 adffd5d2254cf91049f2dc6bfb5ab86fc412daa5dfe4370565c0be51a04220c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page