Skip to main content

A Python package for asynchronously enhancing tabular files via APIs.

Project description

Tabular-Enhancement-Tool

Documentation Status PyPI version Python Tests codecov Code style: ruff

A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.

Why Tabular-Enhancement-Tool?

In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.

The Tabular-Enhancement-Tool (tet) is designed to streamline this enrichment process:

  • Multi-source enhancement: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
  • High Performance via Multi-threading: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
  • Data Integrity and Precision: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is retained exactly as it appeared in the source.
  • Append-Only Enhancement: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
  • Response Flattening: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the data field from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred.
  • Strict Order Preservation: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
  • Flexible field mapping: Map DataFrame columns to API payload fields or database query filters.
  • HTTP GET and POST support: Choose the appropriate method for your API, with support for URL templating and query parameters.
  • REST API Authentication: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
  • SQLAlchemy Integration: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).

Installation

You can install the package directly from the source directory:

pip install .

This will automatically install the required dependencies (pandas, requests, openpyxl) and provide the tabular-enhancer command.

Usage

Command Line Interface (CLI)

After installation, you can run the tool using the tabular-enhancer command:

tabular-enhancer input_data.csv \
    --api_url "https://api.example.com/process" \
    --mapping '{"api_field_1": "csv_column_a", "api_field_2": "csv_column_b"}' \
    --max_workers 10

Arguments:

  • input_file: Path to your CSV, Excel, TSV, TXT, or Parquet file.
  • --max_workers: (Optional) Number of concurrent threads (default: 5).
  • --no_flatten: (Optional) Do not expand response objects into individual columns.

API Options:

  • --api_url: The endpoint where the POST request will be sent.
  • --mapping: A JSON string mapping API payload keys to your file's column names. e.g. '{"api_field": "csv_column"}'.
  • --method: (Optional) HTTP method to use (POST or GET, default: POST).
  • --auth_type: (Optional) Authentication type (basic, bearer, or apikey).
  • --auth_user: (Optional) Username for basic auth.
  • --auth_pass: (Optional) Password for basic auth.
  • --auth_token: (Optional) Token for bearer or apikey auth.
  • --auth_header: (Optional) Custom header for apikey auth (default: X-API-Key).

SQLAlchemy Options:

  • --db_url: The SQLAlchemy connection URL to the target database.
  • --table_name: Name of the table to query for enhancement.
  • --mapping: A JSON list of column names in your file to be used as filters (WHERE clause) for the query. e.g. '["email_address"]'.

CLI Usage Examples:

# REST API Enhancement
tabular-enhancer input.csv \
    --api_url "https://api.example.com/process" \
    --mapping '{"user_id": "id"}'

# SQLAlchemy Database Enhancement
tabular-enhancer data.xlsx \
    --db_url "postgresql://user:pass@localhost/dbname" \
    --table_name "users" \
    --mapping '["email_address"]'

CLI Authentication Examples:

# Basic Auth
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type basic --auth_user "admin" --auth_pass "secret"

# Bearer Token
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type bearer --auth_token "your_token"

# API Key
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type apikey --auth_token "your_api_key"

# GET request with URL templating
tabular-enhancer data.csv --api_url "https://api.weather.gov/points/{lat},{lon}" --mapping '{"lat": "latitude", "lon": "longitude"}' --method GET

REST API Enhancement

import pandas as pd
import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("my_data.xlsx")

# API Configuration
api_url = "https://api.example.com/v1/enrich"
mapping = {"user_id": "ID"}
enhancer = tet.TabularEnhancer(api_url, mapping)

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "my_data.xlsx")

REST API Enhancement (POST Example)

The following example demonstrates how to use the httpbin.org public API to simulate posting data from a CSV file.

import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("examples/posts_data.csv")

# HTTPBin API configuration
api_url = "https://httpbin.org/post"
mapping = {
    "title": "title",
    "body": "body",
    "userId": "userId"
}

enhancer = tet.TabularEnhancer(api_url, mapping, method="POST")

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "examples/posts_data.csv", suffix="_enhanced")

REST API Enhancement (GET with URL Templating)

import tabular_enhancement_tool as tet

# Load data with coordinates
df = tet.read_tabular_file("cities.csv")

# NWS API example
api_url = "https://api.weather.gov/points/{lat},{lon}"
mapping = {"lat": "lat", "lon": "lon"}
headers = {"User-Agent": "(myweatherapp.com, contact@example.com)"}

enhancer = tet.TabularEnhancer(api_url, mapping, method="GET", headers=headers)

# Process
df_enhanced = enhancer.process_dataframe(df)

SQLAlchemy Database Enhancement (Core)

import pandas as pd
import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("data.csv")

# SQLAlchemy Configuration
db_url = "postgresql://user:pass@localhost/dbname"
mapping = ["ID"]
enhancer = tet.ODBCEnhancer(db_url, mapping, table_name="orders")

# Process
df_enhanced = enhancer.process_dataframe(df)

# Save
tet.save_tabular_file(df_enhanced, "data.csv")

SQLAlchemy Database Enhancement (ORM)

from sqlalchemy.orm import DeclarativeBase
from sqlalchemy import Column, Integer, String
import tabular_enhancement_tool as tet

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String)
    role = Column(String)

# Load data
df = tet.read_tabular_file("data.csv")

# Process using ORM model
enhancer = tet.ODBCEnhancer("sqlite:///mydb.db", mapping=["id"], model=User)
df_enhanced = enhancer.process_dataframe(df)

SQLAlchemy Database Enhancement (SQLite Example)

This example shows how to use the ODBCEnhancer with a SQLite database to enrich a CSV file.

import tabular_enhancement_tool as tet

# Load data
df = tet.read_tabular_file("users.csv")

# SQLite connection URL
db_url = "sqlite:///company_data.db"

# Match by 'email' and fetch related columns from the 'employees' table
enhancer = tet.ODBCEnhancer(
    connection_url=db_url,
    mapping=["email"],
    table_name="employees"
)

# Process and save
df_enhanced = enhancer.process_dataframe(df)
tet.save_tabular_file(df_enhanced, "users.csv", suffix="_enriched")

License

Distributed under the MIT License. See LICENSE for more information.

Development and CI/CD

  • Linting & Formatting: Ruff is used to maintain high code quality and consistent style.
  • Documentation: Managed by Sphinx and hosted on Read the Docs. For more detailed examples and API documentation, please visit the official documentation site.
  • Publishing: The main branch is automatically built and published to PyPI on every push. Note: Remember to bump the version in setup.py and tabular_enhancement_tool/__init__.py before pushing to main.

Credits

This tool was authored by Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_enhancement_tool-0.1.2.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_enhancement_tool-0.1.2-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file tabular_enhancement_tool-0.1.2.tar.gz.

File metadata

  • Download URL: tabular_enhancement_tool-0.1.2.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tabular_enhancement_tool-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2e103e01dfbede147d99f216ac903979f124d4b8ee746337cf1c411f5229d615
MD5 4d3815879996b5206619504a837f69be
BLAKE2b-256 07661219488e673dfed480eeb72339b810f977a7812a75422d797d51f472ea9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.2.tar.gz:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tabular_enhancement_tool-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_enhancement_tool-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7e74b72b9c5280ab3e24982a92e0f8d3275de14bdbf5529dfcb60d0b8bc734ee
MD5 8a9f388db876018b2edd49b9b4e541fb
BLAKE2b-256 aece9b04285f7ae367e0f7c0fea2b354c2c2534effbac6e55a231e5ec1eed926

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabular_enhancement_tool-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Mikuana/tabular-enhancement-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page