Skip to main content

A Python package for asynchronously enhancing tabular files via APIs.

Project description

Tabular-Enhancement-Tool

Documentation Status PyPI version codecov Code style: ruff

A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.

Why Tabular-Enhancement-Tool?

In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.

The Tabular-Enhancement-Tool (tet) is designed to streamline this enrichment process:

  • Multi-source enhancement: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
  • High Performance via Multi-threading: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
  • Data Integrity and Precision: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is retained exactly as it appeared in the source.
  • Append-Only Enhancement: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
  • Response Flattening: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the data field from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred.
  • Strict Order Preservation: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
  • Flexible field mapping: Map DataFrame columns to API payload fields or database query filters.
  • HTTP GET and POST support: Choose the appropriate method for your API, with support for URL templating and query parameters.
  • REST API Authentication: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
  • SQLAlchemy Integration: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).

Installation

You can install the package directly from the source directory:

pip install .

This will automatically install the required dependencies (pandas, requests, openpyxl) and provide the tabular-enhancer command.

Usage

Command Line Interface (CLI)

After installation, you can run the tool using the tabular-enhancer command:

tabular-enhancer input_data.csv \
    --api_url "https://api.example.com/process" \
    --mapping '{"api_field_1": "csv_column_a", "api_field_2": "csv_column_b"}' \
    --max_workers 10

Arguments:

  • input_file: Path to your CSV, Excel, TSV, TXT, or Parquet file.
  • --max_workers: (Optional) Number of concurrent threads (default: 5).
  • --no_flatten: (Optional) Do not expand response objects into individual columns.

API Options:

  • --api_url: The endpoint where the POST request will be sent.
  • --mapping: A JSON string mapping API payload keys to your file's column names. e.g. '{"api_field": "csv_column"}'.
  • --method: (Optional) HTTP method to use (POST or GET, default: POST).
  • --auth_type: (Optional) Authentication type (basic, bearer, or apikey).
  • --auth_user: (Optional) Username for basic auth.
  • --auth_pass: (Optional) Password for basic auth.
  • --auth_token: (Optional) Token for bearer or apikey auth.
  • --auth_header: (Optional) Custom header for apikey auth (default: X-API-Key).

SQLAlchemy Options:

  • --db_url: The SQLAlchemy connection URL to the target database.
  • --table_name: Name of the table to query for enhancement.
  • --mapping: A JSON list of column names in your file to be used as filters (WHERE clause) for the query. e.g. '["email_address"]'.

CLI Usage Examples:

# REST API Enhancement
tabular-enhancer input.csv \
    --api_url "https://api.example.com/process" \
    --mapping '{"user_id": "id"}'

# SQLAlchemy Database Enhancement
tabular-enhancer data.xlsx \
    --db_url "postgresql://user:pass@localhost/dbname" \
    --table_name "users" \
    --mapping '["email_address"]'

CLI Authentication Examples:

# Basic Auth
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type basic --auth_user "admin" --auth_pass "secret"

# Bearer Token
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type bearer --auth_token "your_token"

# API Key
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type apikey --auth_token "your_api_key"

# GET request with URL templating
tabular-enhancer data.csv --api_url "https://api.weather.gov/points/{lat},{lon}" --mapping '{"lat": "latitude", "lon": "longitude"}' --method GET

REST API Enhancement

import pandas as pd
import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("my_data.xlsx")

# API Configuration
api_url = "https://api.example.com/v1/enrich"
mapping = {"user_id": "ID"}
enhancer = tet.TabularEnhancer(api_url, mapping)

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "my_data.xlsx")

REST API Enhancement (POST Example)

The following example demonstrates how to use the httpbin.org public API to simulate posting data from a CSV file.

import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("examples/posts_data.csv")

# HTTPBin API configuration
api_url = "https://httpbin.org/post"
mapping = {
    "title": "title",
    "body": "body",
    "userId": "userId"
}

enhancer = tet.TabularEnhancer(api_url, mapping, method="POST")

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "examples/posts_data.csv", suffix="_enhanced")

REST API Enhancement (GET with URL Templating)

import tabular_enhancement_tool as tet

# Load data with coordinates
df = tet.read_tabular_file("cities.csv")

# NWS API example
api_url = "https://api.weather.gov/points/{lat},{lon}"
mapping = {"lat": "lat", "lon": "lon"}
headers = {"User-Agent": "(myweatherapp.com, contact@example.com)"}

enhancer = tet.TabularEnhancer(api_url, mapping, method="GET", headers=headers)

# Process
df_enhanced = enhancer.process_dataframe(df)

SQLAlchemy Database Enhancement (Core)

import pandas as pd
import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("data.csv")

# SQLAlchemy Configuration
db_url = "postgresql://user:pass@localhost/dbname"
mapping = ["ID"]
enhancer = tet.ODBCEnhancer(db_url, mapping, table_name="orders")

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "data.csv")

SQLAlchemy Database Enhancement (ORM)

from sqlalchemy.orm import DeclarativeBase
from sqlalchemy import Column, Integer, String
import tabular_enhancement_tool as tet

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String)
    role = Column(String)

# Load data
df = tet.read_tabular_file("data.csv")

# Process using ORM model
enhancer = tet.ODBCEnhancer("sqlite:///mydb.db", mapping=["id"], model=User)
df_enhanced = enhancer.process_dataframe(df)

SQLAlchemy Database Enhancement (SQLite Example)

This example shows how to use the ODBCEnhancer with a SQLite database to enrich a CSV file.

import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("users.csv")

# SQLite connection URL
db_url = "sqlite:///company_data.db"

# Match by 'email' and fetch related columns from the 'employees' table
enhancer = tet.ODBCEnhancer(
    connection_url=db_url,
    mapping=["email"],
    table_name="employees"
)

# Process and save
df_enhanced = enhancer.process_dataframe(df)
tet.save_tabular_file(df_enhanced, "users.csv", suffix="_enriched")

License

Distributed under the MIT License. See LICENSE for more information.

Development and CI/CD

  • Linting & Formatting: Ruff is used to maintain high code quality and consistent style.
  • Documentation: Managed by Sphinx and hosted on Read the Docs. For more detailed examples and API documentation, please visit the official documentation site.
  • Publishing: The main branch is automatically built and published to PyPI on every push. Note: Remember to bump the version in setup.py and tabular_enhancement_tool/__init__.py before pushing to main.

Credits

This tool was authored by Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_enhancement_tool-0.1.3.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_enhancement_tool-0.1.3-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file tabular_enhancement_tool-0.1.3.tar.gz.

File metadata

  • Download URL: tabular_enhancement_tool-0.1.3.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tabular_enhancement_tool-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1b4433971f498303638be3851ec78901e5821399bde93f0716700697f3d5e7b7
MD5 5d7d0bafe2b0793398f8fe0af170a88d
BLAKE2b-256 200cd070c5e1ccef655368b9ab143a4f731d4a8a0242f592bdfb7781c0646269

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.3.tar.gz:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tabular_enhancement_tool-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_enhancement_tool-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 91a83f7fcd5aabce6dd592d668500d9adfc8f31812c7e64ed4eced2a49b9a18b
MD5 4c1df1c617f80959abeac6f80a14c513
BLAKE2b-256 5113b6cd83254ad6173c8bfccd4b64b0c813a295e0aad057b3afd0c4228ce4f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.3-py3-none-any.whl:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page