scraping code to use with sciop-coordinated scrapes
Project description
sciop-scraping
A (yet-to-be-named) tool to enable scraping of very large datasets to be distributed across multiple volunteers and then reassembled as dataset parts on sciop.
NB. this is currently a work in progress, and it depends on planned features in sciop that are not yet stabilised. If you're interested in contributing, experience of or interest in web scraping, Python CLI tools and/or REST APIs would be very helpful.
We absolutely want this to be as easy to use as possible, so as soon as we can we'll be adding detailed documentation and putting out a call for wider testing. Watch this space!
In the meantime, please subscribe to the Safeguarding Research & Data forum for pointers to datasets that need saving along with help & advice with collecting them, preparing for upload and creating torrents.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sciop_scraping-0.1.4.tar.gz.
File metadata
- Download URL: sciop_scraping-0.1.4.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.25.2 CPython/3.13.5 Linux/6.12.12+bpo-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94f0011a446f52d985ce44e9fff40c8ab8b45322e941eaeb04d21ab8f4ff1c61
|
|
| MD5 |
61c7f6a4504e58abb7bb2b0fb65b8ed4
|
|
| BLAKE2b-256 |
ba10bd47324afcf8d0b88056ca5a8e303828db58b622d75d49d55a13853d6dc1
|
File details
Details for the file sciop_scraping-0.1.4-py3-none-any.whl.
File metadata
- Download URL: sciop_scraping-0.1.4-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.25.2 CPython/3.13.5 Linux/6.12.12+bpo-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71a6892162bd8da791086bfde8f5cf5cff4737bfcea6881308b1e3874901544d
|
|
| MD5 |
9ae8c7bf56aa66f57ef709e73cb6be64
|
|
| BLAKE2b-256 |
adffd5d2254cf91049f2dc6bfb5ab86fc412daa5dfe4370565c0be51a04220c9
|