A Python package for asynchronously enhancing tabular files via APIs.
Project description
Tabular-Enhancement-Tool
A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.
Why Tabular-Enhancement-Tool?
In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.
The Tabular-Enhancement-Tool (tet) is designed to streamline this enrichment process:
- Multi-source enhancement: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
- High Performance via Multi-threading: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
- Data Integrity and Precision: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is retained exactly as it appeared in the source.
- Append-Only Enhancement: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
- Response Flattening: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the
datafield from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred. - Strict Order Preservation: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
- Flexible field mapping: Map DataFrame columns to API payload fields or database query filters.
- HTTP GET and POST support: Choose the appropriate method for your API, with support for URL templating and query parameters.
- REST API Authentication: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
- SQLAlchemy Integration: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).
Installation
You can install the package directly from the source directory:
pip install .
This will automatically install the required dependencies (pandas, requests, openpyxl) and provide the tabular-enhancer command.
Usage
Command Line Interface (CLI)
After installation, you can run the tool using the tabular-enhancer command:
tabular-enhancer input_data.csv \
--api_url "https://api.example.com/process" \
--mapping '{"api_field_1": "csv_column_a", "api_field_2": "csv_column_b"}' \
--max_workers 10
Arguments:
input_file: Path to your CSV, Excel, TSV, TXT, or Parquet file.--max_workers: (Optional) Number of concurrent threads (default: 5).--no_flatten: (Optional) Do not expand response objects into individual columns.
API Options:
--api_url: The endpoint where the POST request will be sent.--mapping: A JSON string mapping API payload keys to your file's column names. e.g.'{"api_field": "csv_column"}'.--method: (Optional) HTTP method to use (POSTorGET, default:POST).--auth_type: (Optional) Authentication type (basic,bearer, orapikey).--auth_user: (Optional) Username forbasicauth.--auth_pass: (Optional) Password forbasicauth.--auth_token: (Optional) Token forbearerorapikeyauth.--auth_header: (Optional) Custom header forapikeyauth (default:X-API-Key).
SQLAlchemy Options:
--db_url: The SQLAlchemy connection URL to the target database.--table_name: Name of the table to query for enhancement.--mapping: A JSON list of column names in your file to be used as filters (WHERE clause) for the query. e.g.'["email_address"]'.
CLI Usage Examples:
# REST API Enhancement
tabular-enhancer input.csv \
--api_url "https://api.example.com/process" \
--mapping '{"user_id": "id"}'
# SQLAlchemy Database Enhancement
tabular-enhancer data.xlsx \
--db_url "postgresql://user:pass@localhost/dbname" \
--table_name "users" \
--mapping '["email_address"]'
CLI Authentication Examples:
# Basic Auth
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type basic --auth_user "admin" --auth_pass "secret"
# Bearer Token
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type bearer --auth_token "your_token"
# API Key
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type apikey --auth_token "your_api_key"
# GET request with URL templating
tabular-enhancer data.csv --api_url "https://api.weather.gov/points/{lat},{lon}" --mapping '{"lat": "latitude", "lon": "longitude"}' --method GET
REST API Enhancement
import pandas as pd
import tabular_enhancement_tool as tet
# Load data
df = tet.read_tabular_file("my_data.xlsx")
# API Configuration
api_url = "https://api.example.com/v1/enrich"
mapping = {"user_id": "ID"}
enhancer = tet.TabularEnhancer(api_url, mapping)
# Process
df_enhanced = enhancer.process_dataframe(df)
# Save
tet.save_tabular_file(df_enhanced, "my_data.xlsx")
REST API Enhancement (POST Example)
The following example demonstrates how to use the httpbin.org public API to simulate posting data from a CSV file.
import tabular_enhancement_tool as tet
# Load data
df = tet.read_tabular_file("examples/posts_data.csv")
# HTTPBin API configuration
api_url = "https://httpbin.org/post"
mapping = {
"title": "title",
"body": "body",
"userId": "userId"
}
enhancer = tet.TabularEnhancer(api_url, mapping, method="POST")
# Process
df_enhanced = enhancer.process_dataframe(df)
# Save
tet.save_tabular_file(df_enhanced, "examples/posts_data.csv", suffix="_enhanced")
REST API Enhancement (GET with URL Templating)
import tabular_enhancement_tool as tet
# Load data with coordinates
df = tet.read_tabular_file("cities.csv")
# NWS API example
api_url = "https://api.weather.gov/points/{lat},{lon}"
mapping = {"lat": "lat", "lon": "lon"}
headers = {"User-Agent": "(myweatherapp.com, contact@example.com)"}
enhancer = tet.TabularEnhancer(api_url, mapping, method="GET", headers=headers)
# Process
df_enhanced = enhancer.process_dataframe(df)
SQLAlchemy Database Enhancement (Core)
import pandas as pd
import tabular_enhancement_tool as tet
# Load data
df = tet.read_tabular_file("data.csv")
# SQLAlchemy Configuration
db_url = "postgresql://user:pass@localhost/dbname"
mapping = ["ID"]
enhancer = tet.ODBCEnhancer(db_url, mapping, table_name="orders")
# Process
df_enhanced = enhancer.process_dataframe(df)
# Save
tet.save_tabular_file(df_enhanced, "data.csv")
SQLAlchemy Database Enhancement (ORM)
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy import Column, Integer, String
import tabular_enhancement_tool as tet
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String)
role = Column(String)
# Load data
df = tet.read_tabular_file("data.csv")
# Process using ORM model
enhancer = tet.ODBCEnhancer("sqlite:///mydb.db", mapping=["id"], model=User)
df_enhanced = enhancer.process_dataframe(df)
SQLAlchemy Database Enhancement (SQLite Example)
This example shows how to use the ODBCEnhancer with a SQLite database to enrich a CSV file.
import tabular_enhancement_tool as tet
# Load data
df = tet.read_tabular_file("users.csv")
# SQLite connection URL
db_url = "sqlite:///company_data.db"
# Match by 'email' and fetch related columns from the 'employees' table
enhancer = tet.ODBCEnhancer(
connection_url=db_url,
mapping=["email"],
table_name="employees"
)
# Process and save
df_enhanced = enhancer.process_dataframe(df)
tet.save_tabular_file(df_enhanced, "users.csv", suffix="_enriched")
License
Distributed under the MIT License. See LICENSE for more information.
Development and CI/CD
- Linting & Formatting: Ruff is used to maintain high code quality and consistent style.
- Documentation: Managed by Sphinx and hosted on Read the Docs. For more detailed examples and API documentation, please visit the official documentation site.
- Publishing: The
mainbranch is automatically built and published to PyPI on every push. Note: Remember to bump the version insetup.pyandtabular_enhancement_tool/__init__.pybefore pushing tomain.
Credits
This tool was authored by Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabular_enhancement_tool-0.1.2.tar.gz.
File metadata
- Download URL: tabular_enhancement_tool-0.1.2.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e103e01dfbede147d99f216ac903979f124d4b8ee746337cf1c411f5229d615
|
|
| MD5 |
4d3815879996b5206619504a837f69be
|
|
| BLAKE2b-256 |
07661219488e673dfed480eeb72339b810f977a7812a75422d797d51f472ea9d
|
Provenance
The following attestation bundles were made for tabular_enhancement_tool-0.1.2.tar.gz:
Publisher:
publish.yml on Mikuana/tabular-enhancement-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabular_enhancement_tool-0.1.2.tar.gz -
Subject digest:
2e103e01dfbede147d99f216ac903979f124d4b8ee746337cf1c411f5229d615 - Sigstore transparency entry: 1033262255
- Sigstore integration time:
-
Permalink:
Mikuana/tabular-enhancement-tool@7620f86438f91429ed12e33dbe8c6000b3c6db9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mikuana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7620f86438f91429ed12e33dbe8c6000b3c6db9e -
Trigger Event:
push
-
Statement type:
File details
Details for the file tabular_enhancement_tool-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tabular_enhancement_tool-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e74b72b9c5280ab3e24982a92e0f8d3275de14bdbf5529dfcb60d0b8bc734ee
|
|
| MD5 |
8a9f388db876018b2edd49b9b4e541fb
|
|
| BLAKE2b-256 |
aece9b04285f7ae367e0f7c0fea2b354c2c2534effbac6e55a231e5ec1eed926
|
Provenance
The following attestation bundles were made for tabular_enhancement_tool-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on Mikuana/tabular-enhancement-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabular_enhancement_tool-0.1.2-py3-none-any.whl -
Subject digest:
7e74b72b9c5280ab3e24982a92e0f8d3275de14bdbf5529dfcb60d0b8bc734ee - Sigstore transparency entry: 1033262352
- Sigstore integration time:
-
Permalink:
Mikuana/tabular-enhancement-tool@7620f86438f91429ed12e33dbe8c6000b3c6db9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mikuana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7620f86438f91429ed12e33dbe8c6000b3c6db9e -
Trigger Event:
push
-
Statement type: