Skip to main content

A Python package for asynchronously enhancing tabular files via APIs.

Project description

Tabular-Enhancement-Tool

Documentation Status PyPI version codecov Code style: ruff

WARNING: this project is still in its early stages, and the code is written primarily by an AI coding agent. Please use with caution.

A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.

Why

In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.

The Tabular Enhancement Tool (tet) is designed to streamline this enrichment process:

  • Multi-source enhancement: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
  • High Performance via Multi-threading: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
  • Data Integrity and Precision: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is retained exactly as it appeared in the source.
  • Append-Only Enhancement: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
  • Response Flattening: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the data field from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred.
  • Strict Order Preservation: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
  • Flexible field mapping: Map DataFrame columns to API payload fields or database query filters.
  • HTTP GET and POST support: Choose the appropriate method for your API, with support for URL templating and query parameters.
  • REST API Authentication: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
  • SQLAlchemy Integration: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).

Installation

You can install the package directly from the source directory:

pip install tabular-enhancement-tool

This will automatically install the required dependencies (pandas, requests, openpyxl) and provide the tet command.

Usage

Read the docs. Documentation Status

License

Distributed under the MIT License. See LICENSE for more information.

Development and CI/CD

  • Linting & Formatting: Ruff is used to maintain high code quality and consistent style.
  • Documentation: Managed by Sphinx and hosted on Read the Docs. Documentation Status
  • Tested: via pytest and CodeCov. codecov

Credits

This tool was authored by Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_enhancement_tool-0.1.4.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_enhancement_tool-0.1.4-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file tabular_enhancement_tool-0.1.4.tar.gz.

File metadata

  • Download URL: tabular_enhancement_tool-0.1.4.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tabular_enhancement_tool-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5294575d7dc361ba0885d0f7a92907f10accaab7063457703175f355acad60e7
MD5 430fadb1ca20cbcd68f7ab34e48ca0f1
BLAKE2b-256 1f620a22d12eb83facee983b0a1823a18a4651cffa0beda475daa10ffde9146d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.4.tar.gz:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tabular_enhancement_tool-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_enhancement_tool-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa6c98221a8d3e544487dfbeec776d1826e4e7f5bed703a97ddc8e12d2926b0
MD5 2b486f482e921d8d155c8aca95bdfe6b
BLAKE2b-256 318a0cf8cb4ad96e1fde0df4bf34a1be08251c024133dc95c5060b0501d47c79

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.4-py3-none-any.whl:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page