Skip to main content

Access DAX datasets.

Project description

PyPI PyPI - Python Version PyPI - Implementation Gitter Runtime Tests Lint Docs Development Environment

Caution: PyDAX is in BETA phase and still under development. Do NOT use it in production.

PyDAX is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:

  • a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and

  • a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.

Install the Package & its Dependencies

To install the latest version of PyDAX, run

$ pip install pydax

Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file, cd /path/to/pydax-source) and run

$ pip install -U .

Quick Start

Import the package and load a dataset. PyDAX will download WikiText-103 dataset (version 1.0.1) if it’s not already downloaded, and then load it.

import pydax
wikitext103_data = pydax.load_dataset('wikitext103')

View available PyDAX datasets and their versions.

>>> pydax.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}

To view your globally set configs for PyDAX, such as your default data directory, use pydax.get_config.

>>> pydax.get_config()
Config(DATADIR=PosixPath('dir/to/dowload/load/from'), ..., DATASET_SCHEMA_URL='file/to/load/datasets/from')

By default, pydax.load_dataset downloads to and loads from ~/.pydax/data/<dataset-name>/<dataset-version>/. To change the default data directory, use pydax.init.

pydax.init(DATADIR='new/dir/to/dowload/load/from')

Load a previously downloaded dataset using pydax.load_dataset. With the new default data dir set, PyDAX now searches for the Groningen Meaning Bank dataset (version 1.0.2) in new/dir/to/dowload/load/from/gmb/1.0.2/.

gmb_data = load_dataset('gmb', version='1.0.2', download=False)  # assuming GMB dataset was already downloaded

Create a Dataset Schema File

The information of a dataset is stored in a schema file. To create a schema file for your dataset, check out the examples in our default repository. (Details of the format of the schema file are to be documented.)

Notebooks

For a more extensive look at PyDAX functionality, check out these notebooks:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydax-0.1b1.tar.gz (6.8 MB view details)

Uploaded Source

Built Distribution

pydax-0.1b1-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file pydax-0.1b1.tar.gz.

File metadata

  • Download URL: pydax-0.1b1.tar.gz
  • Upload date:
  • Size: 6.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for pydax-0.1b1.tar.gz
Algorithm Hash digest
SHA256 7be9fef1500cf5d0a6152f6865f6fdc623f89e8d2130eb21773253966bc40168
MD5 29033e3513255c768d01594f926ee17e
BLAKE2b-256 b7f4efd7070faeb7cc26628fb97c59bfe19712419db8865c064cfedbd85b06f0

See more details on using hashes here.

File details

Details for the file pydax-0.1b1-py3-none-any.whl.

File metadata

  • Download URL: pydax-0.1b1-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for pydax-0.1b1-py3-none-any.whl
Algorithm Hash digest
SHA256 4475e2d0008700c01a8757bc345c239907f432910e7b9b073485f019d394b76d
MD5 97601de40aa4f4d366f85d69392005d5
BLAKE2b-256 4126c6ec7ceac0a436178c75bbbaee9312877c910a5624539c182008abb46670

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page