CORD 19 tools and utilities
Project description
COVID-19 Data Tools
Tools for making COVID 19 data slightly easier for everyone!
The Paperset class
This is a class for lazily loading papers from the CORD-19 dataset. Here are the instructions for use:
-
Download a dataset in tar.gz form from the Download Here section, or using download bash script in this repository (which automatically completes step 2 for you)
-
Extract it into a directory of your choice (functionality for leaving the tarballs unpacked/online may be added later, this is version 0.0.1), for example:
tar -xvzf comm_use_subset.tar.gz
- Load it into python!
import cotools
from pprint import pprint
# no `/` at the end please!
data = cotools.Paperset("data/comm_use_subset")
# indexes with ints
pprint(data[0])
# and slices!
pprint(data[:2])
Lets talk for a bit about how it works, and why it doesnt take a gigantic amount of memory. The files are not actually loaded into python until the data is indexed. Upon indexing, the files at those indexes are read into python, resulting in a list of dictionaries.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cord_19_tools-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f99fcf201f8c7a55487e05ce459481cce7f19aa5fbef5ca45660b2934c9cef6c |
|
MD5 | 24e7211aac417a350cab6c1f0a236a03 |
|
BLAKE2b-256 | 7c60ce6197d7bab5819125caa326acb8104f55ebf2d22f1aecc98731ab3cde42 |