Another data transformation language
Project description
adtl – another data transformation language
adtl is a data transformation language (DTL) used by some applications in Global.health, notably for the ISARIC clinical data pipeline at globaldothealth/isaric and the InsightBoard project dashboard at globaldothealth/InsightBoard
Documentation: ReadTheDocs
Installation
You can install this package using either pipx
or pip. Installing via pipx offers advantages if you want to just use the
adtl tool standalone from the command line, as it isolates the Python
package dependencies in a virtual environment. On the other hand, pip installs
packages to the global environment which is generally not recommended as it
can interfere with other packages on your system.
-
Installation via
pipx:pipx install adtl
-
Installation via
pip:python3 -m pip install adtl
If you are writing code which depends on adtl (instead of using the
command-line program), then it is best to add a dependency on adtl to your
Python build tool of choice.
To use the development version, replace adtl with the full GitHub URL:
pip install git+https://github.com/globaldothealth/adtl
Rationale
Most existing data transformation languages are usually in a XML dialect, though there are recent variations in other file formats. In addition, many DTLs use a custom domain specific language. The primary utility of this DTL is to provide a easy to use library in Python for basic data transformations, which are specified in a JSON file. It is not meant to be a comprehensive, and adtl can be used as a step within a larger data processing pipeline.
Usage
adtl can be used from the command line or as a Python library
As a CLI:
adtl parse specification-file input-file
Here specification-file is the parser specification (as TOML or JSON) and input-file is the data file (not the data dictionary) that adtl will transform using the instructions in the specification.
If adtl is not in your PATH, this may give an error. Either add the location where the adtl script is installed to your PATH, or try running adtl as a module
python3 -m adtl parse specification-file input-file
Running adtl will create output files with the name of the parser, suffixed with table names in the current working directory.
Before trying to transform your data, you can check that your specification file matches the format adtl expects, and for fields which may have been either misspelled or missed out during the mapping, by using:
adtl check specification-file input-file
Python library:
import adtl
parser = adtl.Parser(specification)
print(parser.tables) # list of tables created
for row in parser.parse().read_table(table):
print(row)
alternatively to get an output file as a CSV, similarly to the CLI:
import adtl
data = adtl.parse("specification-file", "input-file")
where data is returned as a dictionary of pandas dataframes, one for each table.
Development
Install pre-commit and setup pre-commit hooks
(pre-commit install) which will do linting checks before commit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adtl-0.13.0.tar.gz.
File metadata
- Download URL: adtl-0.13.0.tar.gz
- Upload date:
- Size: 52.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c2f6603874cbe2e110a5694b735f86ac344d422cb7c6666b9e63ced7eae9c6
|
|
| MD5 |
13cf2b218db3e92907dfd0dcb6ed4a78
|
|
| BLAKE2b-256 |
1a8a9c50e31a62bb86a968b12b94801074eede8819de4c1a0c2f36e1466211b0
|
Provenance
The following attestation bundles were made for adtl-0.13.0.tar.gz:
Publisher:
publish.yml on globaldothealth/adtl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adtl-0.13.0.tar.gz -
Subject digest:
00c2f6603874cbe2e110a5694b735f86ac344d422cb7c6666b9e63ced7eae9c6 - Sigstore transparency entry: 855127534
- Sigstore integration time:
-
Permalink:
globaldothealth/adtl@5bdda112de8a6ca88acfbad2efabfd42763176dc -
Branch / Tag:
refs/tags/0.13.0 - Owner: https://github.com/globaldothealth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5bdda112de8a6ca88acfbad2efabfd42763176dc -
Trigger Event:
release
-
Statement type:
File details
Details for the file adtl-0.13.0-py3-none-any.whl.
File metadata
- Download URL: adtl-0.13.0-py3-none-any.whl
- Upload date:
- Size: 63.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7278f44fed9b8d5cc6f9a73f6bc8d460d9ff23e82ae9d9d428cbd04165cc62c3
|
|
| MD5 |
989342e6f861bc783eec8614b4150c06
|
|
| BLAKE2b-256 |
d8e444ae3f1c10cbc0631c0d77f46f535e4f889adb5ddefdbdf083f69398f470
|
Provenance
The following attestation bundles were made for adtl-0.13.0-py3-none-any.whl:
Publisher:
publish.yml on globaldothealth/adtl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adtl-0.13.0-py3-none-any.whl -
Subject digest:
7278f44fed9b8d5cc6f9a73f6bc8d460d9ff23e82ae9d9d428cbd04165cc62c3 - Sigstore transparency entry: 855127537
- Sigstore integration time:
-
Permalink:
globaldothealth/adtl@5bdda112de8a6ca88acfbad2efabfd42763176dc -
Branch / Tag:
refs/tags/0.13.0 - Owner: https://github.com/globaldothealth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5bdda112de8a6ca88acfbad2efabfd42763176dc -
Trigger Event:
release
-
Statement type: