This repository contains the implementation of dataset loader for Parquet files.
Project description
aircheckdata: AIRCHECK Parquet Dataset Loader
A lightweight Python package and CLI tool for listing and loading AIRCHECK datasets, with built-in support for column selection, progress tracking, and automatic local caching. This is the Pythonic way to programmatically access datasets that are also available for download via the AIRCHECK website. Before using any dataset, please ensure you have read and agreed to the dataset agreement HitGen End User License Agreement (EULA)
✅ Best Practices
-
Use virtual environments to avoid dependency conflicts:
python -m venv .venv source .venv/bin/activate # On Windows use .venv\Scripts\activate
-
Always validate that your code respects data privacy and licensing terms.
-
Avoid storing large datasets in version control. Let
aircheckdatahandle caching.
📦 Installation
You can install the package from PyPI:
pip install aircheckdata
For development and testing (optional):
pip install -e ".[dev]"
Installation verification (optional)
Verify that the installation was successful by running unit tests
pytest tests/
🔧 Usage in a Python Project (Virtual Environment)
aircheckdata can be used directly from your Python environment to:
- List pre-configured datasets
- View available columns and metadata
- Load datasets with optional filtering and progress indicators
Quick Start
List Datasets
from aircheckdata import list_datasets
datasets = list_datasets()
for name, desc in datasets.items():
print(f"{name}: {desc}")
View Available Columns
from aircheckdata import get_columns
columns = get_columns('HitGen','WDR91')
for col, desc in columns.items():
print(f"{col}: {desc}")
Load dataset
from aircheckdata import load_dataset
df = load_dataset('HitGen','WDR91', columns=['ECFP6','ECFP4','LABEL'], show_progress=False) or
df = load_dataset('HitGen','WDR91', columns=['ECFP6','ECFP4','LABEL']) # Download specified data columns with progressbar
df = load_dataset() # Download once, then cache locally (by default it loads WDR91 Target)
print(df.head())
Advanced Usage
# Load only selected columns
df = load_dataset('WDR91', columns=['ECFP6', 'ECFP4', 'LABEL'])
# Show progress while loading
df = load_dataset('WDR91', show_progress=True)
💻 CLI Usage
The aircheckdata CLI enables quick access to datasets via command-line:
aircheckdata --help
Options and Examples
| Option | Description |
|---|---|
list |
List all available datasets |
load name |
Load a specific dataset (defaults to WDR91) |
--columns col1,col2 |
Select columns to load or list columns of a dataset |
Examples
# List datasets
aircheckdata list
# Load the default dataset (WDR91)
aircheckdata --load
# Load a dataset with selected columns
aircheckdata --load "WDR91" --columns ECFP6,ECFP4
# View available columns (defaults to WDR91 if no name is given)
aircheckdata columns
📜 License and Terms of Use
This package is distributed under the MIT License. However, the datasets it provides access to are subject to the HitGen End User License Agreement (EULA).
⚠️ By using any dataset accessed via
aircheckdata, you agree to abide by the HitGen EULA.Please refer to the full license terms and conditions here: 👉 https://www.aircheck.ai/docs/HitGen.pdf
📚 Pre-configured Datasets
Currently available datasets include:
WDR91: A curated Parquet dataset provided by HitGen
🛠 Requirements
- Python 3.7+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file airctest-1.1.0.tar.gz.
File metadata
- Download URL: airctest-1.1.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9c5171c15388d21e597fb7e4759255712b751ad1aeda5d1353ebf0ce1062b8a
|
|
| MD5 |
60a8ebba0ee0a44f796fba4014ae66c2
|
|
| BLAKE2b-256 |
8d98f95127a36e7f0abc0730006d7bdd9ad1316ed899cada6a854afa0eff70d8
|
File details
Details for the file airctest-1.1.0-py3-none-any.whl.
File metadata
- Download URL: airctest-1.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70a647ac1af1082934f1f6620fea64c31168c06f3c0e99d29bf37a666c52473f
|
|
| MD5 |
d6130e64581598f3788d778b3ac2e3f6
|
|
| BLAKE2b-256 |
37c111dc960ed445b64c5684f9e85152423c09a7cb1d5bbd2404649ff024507f
|