Tool to download mySociety datasets
Project description
mysoc-dataset
A python package and CLI to download mySociety datasets.
The best example of one of these datasets is: https://mysociety.github.io/uk_local_authority_names_and_codes/
These datasets are versioned frictionless datapackages. A repository may contain multiple data packages, each data package may contain multiple versions. A version of a datapackage will contain multiple resources (.csvs) and some composite files made up of resources (.xlsx or .sqlite).
Installation
It can be installed with:
python -m pip install mysoc-dataset
or if using poetry:
poetry add mysoc-dataset
Usage
As a package
The package has two basic functions (with helpful error messages) to access the url or a dataframe of the resource.
from mysoc_dataset import get_dataset_url, get_dataset_df
url = get_dataset_url(
repo_name="uk_local_authority_names_and_codes",
package_name="uk_la_future",
version_name="latest",
file_name="uk_local_authorities_future.csv",
)
# get a pandas dataframe
df = get_dataset_df(
repo_name="uk_local_authority_names_and_codes",
package_name="uk_la_future",
version_name="latest",
file_name="uk_local_authorities_future.csv",
)
As a CLI
The CLI can be used to explore avaliable data using the list
command, get the frictionless datapackage that describes the repo using detail
fetch the url with the url
command or download the file using download
.
This can be used to source files or pipe the URLs into other functions without writing python scripts.
The CLI can either be run as python -m mysoc_dataset
or mysoc-dataset
.
For instance, the following will print the datapackage.json
that describes the underlying contents.
mysoc-dataset detail --repo uk_local_authority_names_and_codes --version latest --package uk_la_future
And the following will get the URL of the resource, pegged to the 1
major version:
mysoc-dataset url --repo uk_local_authority_names_and_codes --version 1 --package uk_la_future --file uk_local_authorities_future.csv
If the dataset has had a major change, a warning will indicate this is no longer the latest version - while not introducing breaking changes to headers without the script being changed.
Use mysoc-dataset --help
for more instructions.
If using the CLI for a dataset, please fill out a survey of what you are using it for to help us explain the value of the data to funders. You can get a URL to the survey page using the 'survey' command.
mysoc-dataset survey --repo uk_local_authority_names_and_codes --version latest --package uk_la_future --file uk_local_authorities_future.csv
Maintenance
If the repo has a valid PYPI_TOKEN secret, and if the poetry version is bumped and all tests pass - the GitHub Action will automatically publish on push to the main branch.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mysoc_dataset-0.3.0.tar.gz
.
File metadata
- Download URL: mysoc_dataset-0.3.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.2 Linux/5.15.0-1031-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59157721c255dbfd2425c1d70e4f4060caf8e211d6950eec3cbf9f2f7beaef5f |
|
MD5 | b04794a009c511f7a4538440689f4aea |
|
BLAKE2b-256 | f9b29ac9c70322388a7e2f3ccfa64bf637488efdb2326bdc0a074c021e243e8c |
File details
Details for the file mysoc_dataset-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: mysoc_dataset-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.2 Linux/5.15.0-1031-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3613f1babec9ead5b3e568b1f0a3faad3f73efe893f0e99b153c858d1d2b67d8 |
|
MD5 | 4ef5f19ee5ba6de45f7dd562aec0a81d |
|
BLAKE2b-256 | c9c6f07d4fac5ecab6b0c86a0330a86f3ca0b8b2295b47dcf2e8917134d35c41 |