A Python package for asynchronously enhancing tabular files via APIs.
Project description
Tabular-Enhancement-Tool
WARNING: this project is still in its early stages, and the code is written primarily by an AI coding agent. Please use with caution.
A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.
Why
In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.
The Tabular Enhancement Tool (tet) is designed to streamline this enrichment process:
- Multi-source enhancement: Fetches data from external JSON-based REST APIs.
- High Performance via Multi-threading: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
- Data Integrity and Precision: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is retained exactly as it appeared in the source.
- Append-Only Enhancement: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
- Response Flattening: By default, the tool expands API response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the
datafield from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred. - Strict Order Preservation: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
- Flexible field mapping: Map DataFrame columns to API payload fields. Supports nested dictionaries and lists for complex JSON payloads.
- HTTP GET and POST support: Choose the appropriate method for your API, with support for URL templating and query parameters.
- REST API Authentication: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
Installation
You can install the package directly from the source directory:
pip install tabular-enhancement-tool
This will automatically install the required dependencies (pandas, requests, openpyxl) and provide the tet command.
Usage
License
Distributed under the MIT License. See LICENSE for more information.
Development and CI/CD
- Linting & Formatting: Ruff is used to maintain high code quality and consistent style.
- Documentation: Managed by Sphinx and hosted on Read the Docs.
- Tested: via pytest and CodeCov.
Credits
This tool was authored by Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabular_enhancement_tool-0.2.2.tar.gz.
File metadata
- Download URL: tabular_enhancement_tool-0.2.2.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d714791cd2bfd69d91718e4cc2e742dbcbebd7f4a1a2b43c6db4df33e04206e2
|
|
| MD5 |
089b9c74451934aa8171d35c5f693ab4
|
|
| BLAKE2b-256 |
639f0845eaed264ee66bc53bae14cd805792d97b4333dac5f8daf45d0b8e8a91
|
Provenance
The following attestation bundles were made for tabular_enhancement_tool-0.2.2.tar.gz:
Publisher:
publish.yml on Mikuana/tabular-enhancement-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabular_enhancement_tool-0.2.2.tar.gz -
Subject digest:
d714791cd2bfd69d91718e4cc2e742dbcbebd7f4a1a2b43c6db4df33e04206e2 - Sigstore transparency entry: 1043125703
- Sigstore integration time:
-
Permalink:
Mikuana/tabular-enhancement-tool@48d87691cdea6be05c49f70996e5c55a493d3bad -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mikuana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@48d87691cdea6be05c49f70996e5c55a493d3bad -
Trigger Event:
push
-
Statement type:
File details
Details for the file tabular_enhancement_tool-0.2.2-py3-none-any.whl.
File metadata
- Download URL: tabular_enhancement_tool-0.2.2-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa5dcbae1da6138626701711e31a796161f9df990ca72b0df8b74046983b0c8f
|
|
| MD5 |
14c8972af46a26a8d2b2abb3f7e181a9
|
|
| BLAKE2b-256 |
76011e8c98c60e46217376c51e4b6d48c1fcd0ad36515f2da46542701f1d2e86
|
Provenance
The following attestation bundles were made for tabular_enhancement_tool-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on Mikuana/tabular-enhancement-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabular_enhancement_tool-0.2.2-py3-none-any.whl -
Subject digest:
fa5dcbae1da6138626701711e31a796161f9df990ca72b0df8b74046983b0c8f - Sigstore transparency entry: 1043125763
- Sigstore integration time:
-
Permalink:
Mikuana/tabular-enhancement-tool@48d87691cdea6be05c49f70996e5c55a493d3bad -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mikuana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@48d87691cdea6be05c49f70996e5c55a493d3bad -
Trigger Event:
push
-
Statement type: