Skip to main content

Access DAX datasets.

Project description

PyPI PyPI - Python Version PyPI - Implementation Gitter Runtime Tests Lint Docs Development Environment

PyDAX is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:

  • a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and

  • a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.

Install the Package & its Dependencies

To install the latest version of PyDAX, run

$ pip install pydax

Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file, cd /path/to/pydax-source) and run

$ pip install -U .

Quick Start

Import the package and load a dataset. PyDAX will download WikiText-103 dataset (version 1.0.1) if it’s not already downloaded, and then load it.

import pydax
wikitext103_data = pydax.load_dataset('wikitext103')

View available PyDAX datasets and their versions.

>>> pydax.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}

To view your globally set configs for PyDAX, such as your default data directory, use pydax.get_config.

>>> pydax.get_config()
Config(DATADIR=PosixPath('dir/to/download/load/from'), ..., DATASET_SCHEMA_FILE_URL='file/to/load/datasets/from')

By default, pydax.load_dataset downloads to and loads from ~/.pydax/data/<dataset-name>/<dataset-version>/. To change the default data directory, use pydax.init.

pydax.init(DATADIR='new/dir/to/download/load/from')

Load a previously downloaded dataset using pydax.load_dataset. With the new default data dir set, PyDAX now searches for the Groningen Meaning Bank dataset (version 1.0.2) in new/dir/to/download/load/from/gmb/1.0.2/.

gmb_data = load_dataset('gmb', version='1.0.2', download=False)  # assuming GMB dataset was already downloaded

To learn more about PyDAX, check out the documentation and the tutorial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydax-0.2.0.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydax-0.2.0-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file pydax-0.2.0.tar.gz.

File metadata

  • Download URL: pydax-0.2.0.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.6

File hashes

Hashes for pydax-0.2.0.tar.gz
Algorithm Hash digest
SHA256 36396cce7dac017b52fae34c578bc06a23c3bceb03436e7ecd6d61b41bd6ea8c
MD5 00dca87e7ad557ee8563d5e601efb921
BLAKE2b-256 71ea969dcc43c99eaf9eedeec141f652c3a4b3ed55bb25ab3b45db726ebdb745

See more details on using hashes here.

File details

Details for the file pydax-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pydax-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.6

File hashes

Hashes for pydax-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9167a467f001e9323d32543148c6e33dffebc7120a475f8915baf4b3cfea5772
MD5 94c3c2769d9a235c8ad33893588061ca
BLAKE2b-256 9dfb17916eb2d28eab8c1e37aff8a86c8ba56100ed1189ca4ebf360901ee8924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page