Skip to main content

Tool to download mySociety datasets

Project description

mysoc-dataset

A python package and CLI to download mySociety datasets.

The best example of one of these datasets is: https://mysociety.github.io/uk_local_authority_names_and_codes/

These datasets are versioned frictionless datapackages. A repository may contain multiple data packages, each data package may contain multiple versions. A version of a datapackage will contain multiple resources (.csvs) and some composite files made up of resources (.xlsx or .sqlite).

Installation

It can be installed with:

python -m pip install mysoc-dataset

or if using poetry:

poetry add mysoc-dataset

Usage

As a package

The package has two basic functions (with helpful error messages) to access the url or a dataframe of the resource.

from mysoc_dataset import get_dataset_url, get_dataset_df

url = get_dataset_url(
    repo_name="uk_local_authority_names_and_codes",
    package_name="uk_la_future",
    version_name="latest",
    file_name="uk_local_authorities_future.csv",
)

# get a pandas dataframe
df = get_dataset_df(
    repo_name="uk_local_authority_names_and_codes",
    package_name="uk_la_future",
    version_name="latest",
    file_name="uk_local_authorities_future.csv",
)

As a CLI

The CLI can be used to explore avaliable data using the list command, get the frictionless datapackage that describes the repo using detail fetch the url with the url command or download the file using download.

This can be used to source files or pipe the URLs into other functions without writing python scripts.

The CLI can either be run as python -m mysoc_dataset or mysoc-dataset.

For instance, the following will print the datapackage.json that describes the underlying contents.

mysoc-dataset detail --repo uk_local_authority_names_and_codes --version latest --package uk_la_future

And the following will get the URL of the resource, pegged to the 1 major version:

mysoc-dataset url --repo uk_local_authority_names_and_codes --version 1 --package uk_la_future --file uk_local_authorities_future.csv

If the dataset has had a major change, a warning will indicate this is no longer the latest version - while not introducing breaking changes to headers without the script being changed.

Use mysoc-dataset --help for more instructions.

If using the CLI for a dataset, please fill out a survey of what you are using it for to help us explain the value of the data to funders. You can get a URL to the survey page using the 'survey' command.

mysoc-dataset survey --repo uk_local_authority_names_and_codes --version latest --package uk_la_future --file uk_local_authorities_future.csv

Maintenance

If the repo has a valid PYPI_TOKEN secret, and if the poetry version is bumped and all tests pass - the GitHub Action will automatically publish on push to the main branch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mysoc_dataset-0.3.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

mysoc_dataset-0.3.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file mysoc_dataset-0.3.0.tar.gz.

File metadata

  • Download URL: mysoc_dataset-0.3.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.2 Linux/5.15.0-1031-azure

File hashes

Hashes for mysoc_dataset-0.3.0.tar.gz
Algorithm Hash digest
SHA256 59157721c255dbfd2425c1d70e4f4060caf8e211d6950eec3cbf9f2f7beaef5f
MD5 b04794a009c511f7a4538440689f4aea
BLAKE2b-256 f9b29ac9c70322388a7e2f3ccfa64bf637488efdb2326bdc0a074c021e243e8c

See more details on using hashes here.

File details

Details for the file mysoc_dataset-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mysoc_dataset-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.2 Linux/5.15.0-1031-azure

File hashes

Hashes for mysoc_dataset-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3613f1babec9ead5b3e568b1f0a3faad3f73efe893f0e99b153c858d1d2b67d8
MD5 4ef5f19ee5ba6de45f7dd562aec0a81d
BLAKE2b-256 c9c6f07d4fac5ecab6b0c86a0330a86f3ca0b8b2295b47dcf2e8917134d35c41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page