Skip to main content

Another data transformation language

Project description

adtl – another data transformation language

Python 3.9+

tests codecov Code style: black

adtl is a data transformation language (DTL) used by some applications in Global.health, notably for the ISARIC clinical data pipeline at globaldothealth/isaric and the InsightBoard project dashboard at globaldothealth/InsightBoard

Documentation: ReadTheDocs

Installation

You can install this package using either pipx or pip. Installing via pipx offers advantages if you want to just use the adtl tool standalone from the command line, as it isolates the Python package dependencies in a virtual environment. On the other hand, pip installs packages to the global environment which is generally not recommended as it can interfere with other packages on your system.

  • Installation via pipx:

    pipx install adtl
    
  • Installation via pip:

    python3 -m pip install adtl
    

If you are writing code which depends on adtl (instead of using the command-line program), then it is best to add a dependency on adtl to your Python build tool of choice.

To use the development version, replace adtl with the full GitHub URL:

pip install git+https://github.com/globaldothealth/adtl

Rationale

Most existing data transformation languages are usually in a XML dialect, though there are recent variations in other file formats. In addition, many DTLs use a custom domain specific language. The primary utility of this DTL is to provide a easy to use library in Python for basic data transformations, which are specified in a JSON file. It is not meant to be a comprehensive, and adtl can be used as a step within a larger data processing pipeline.

Usage

adtl can be used from the command line or as a Python library

As a CLI:

adtl parse specification-file input-file

Here specification-file is the parser specification (as TOML or JSON) and input-file is the data file (not the data dictionary) that adtl will transform using the instructions in the specification.

If adtl is not in your PATH, this may give an error. Either add the location where the adtl script is installed to your PATH, or try running adtl as a module

python3 -m adtl parse specification-file input-file

Running adtl will create output files with the name of the parser, suffixed with table names in the current working directory.

Before trying to transform your data, you can check that your specification file matches the format adtl expects, and for fields which may have been either misspelled or missed out during the mapping, by using:

adtl check specification-file input-file

Python library:

import adtl

parser = adtl.Parser(specification)
print(parser.tables) # list of tables created

for row in parser.parse().read_table(table):
    print(row)

alternatively to get an output file as a CSV, similarly to the CLI:

import adtl

data = adtl.parse("specification-file", "input-file")

where data is returned as a dictionary of pandas dataframes, one for each table.

Development

Install pre-commit and setup pre-commit hooks (pre-commit install) which will do linting checks before commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adtl-0.13.0.tar.gz (52.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adtl-0.13.0-py3-none-any.whl (63.5 kB view details)

Uploaded Python 3

File details

Details for the file adtl-0.13.0.tar.gz.

File metadata

  • Download URL: adtl-0.13.0.tar.gz
  • Upload date:
  • Size: 52.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adtl-0.13.0.tar.gz
Algorithm Hash digest
SHA256 00c2f6603874cbe2e110a5694b735f86ac344d422cb7c6666b9e63ced7eae9c6
MD5 13cf2b218db3e92907dfd0dcb6ed4a78
BLAKE2b-256 1a8a9c50e31a62bb86a968b12b94801074eede8819de4c1a0c2f36e1466211b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for adtl-0.13.0.tar.gz:

Publisher: publish.yml on globaldothealth/adtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file adtl-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: adtl-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 63.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adtl-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7278f44fed9b8d5cc6f9a73f6bc8d460d9ff23e82ae9d9d428cbd04165cc62c3
MD5 989342e6f861bc783eec8614b4150c06
BLAKE2b-256 d8e444ae3f1c10cbc0631c0d77f46f535e4f889adb5ddefdbdf083f69398f470

See more details on using hashes here.

Provenance

The following attestation bundles were made for adtl-0.13.0-py3-none-any.whl:

Publisher: publish.yml on globaldothealth/adtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page