Skip to main content

A schema-free data mapper that turns JSON, XML, or CSV into a unified Python object graph with dot-notation and access-at-runtime.

Project description


🚀 Incorporator (v1.0.7)

A schema-free data mapper that turns JSON, XML, or CSV into a unified Python object graph with dot-notation and access-at-runtime.

PyPI version Python Versions

Pydantic v2 HTTPX Checked with mypy

License: MIT

✨ Highlights

  • Works with unpredictable JSON APIs—and effortlessly digests XML, CSV, NDJSON, and SQLite—without writing a single line of schema.
  • Turns raw data into native Python objects instantly, bypassing the need for manual model definitions or brittle classes.
  • Handles changing JSON structures at runtime, absorbing missing keys or mutating data types without throwing validation errors.
  • Harnesses Pydantic and HTTPX under the hood without forcing you to write data classes, connection poolers, or pagination while loops.

🎯 Use this when:

  • You are working with evolving, undocumented, or heavily nested JSON APIs.
  • You need a universal bridge to instantly map legacy XML or flat CSVs into the exact same Python object graph.
  • You are exhausted by writing boilerplate models and validation logic just to explore a new data source.
  • You need to extract deeply nested web data, transform it, and pivot it straight into a local SQL database seamlessly.

📖 Table of Contents


🛠️ How it Works: Zero-Schema Ingestion

Imagine receiving this spacecraft telemetry JSON. Notice how the nested "st" dictionary changes its structure completely for every subsystem (pos vs sig vs bat). Standard parsers would crash instantly.

The Input (telemetry.json):

[
  {"id":"NAV", "st":{"pos":[12,44], "ok":1}},
  {"id":"COM", "st":{"sig":78, "ok":1}},
  {"id":"PWR", "st":{"bat":92, "ok":1}},
  {"id":"THR", "st":{"lvl":63, "ok":0}}
]

The Incorporator Way: Feed it the unpredictable JSON. Incorporator dynamically unifies the changing structures into a single object graph and gives you instant dot-notation access.

import asyncio
from incorporator import Incorporator

async def main():
    # 1. Parse unpredictable JSON directly into Python objects. No models defined!
    systems = await Incorporator.incorp(
        inc_file="telemetry.json",
        inc_code="id" # Sets 'id' as the O(1) Memory Registry lookup key
    )

    # 2. Instantly access the unified Python object graph via dot-notation
    print(f"Navigation Position: {systems.inc_dict['NAV'].st.pos}")   # Output: [12, 44]
    print(f"Power Battery Level: {systems.inc_dict['PWR'].st.bat}%")  # Output: 92%
    
    # 3. Interpret and manipulate data effortlessly at runtime
    thr = systems.inc_dict["THR"]
    if not thr.st.ok:
        print(f"⚠️ THRUST FAILURE! Efficiency dropped to {thr.st.lvl}")
        
asyncio.run(main())

🤷‍♂️ Wait, what if my data isn't JSON?

It doesn't matter. Incorporator automatically infers the format from the URL or file extension. The syntax never changes.

We natively support JSON, NDJSON (JSON Lines), CSV, TSV, PSV, XML, and SQLite, with optional support for binary Apache Avro streams.

If that exact same telemetry data comes from a legacy system as XML or CSV:

# The syntax doesn't change for XML...
systems_xml = await Incorporator.incorp(inc_file="telemetry.xml", inc_code="id")
print(systems_xml.inc_dict["NAV"].st.pos) # Output:['12', '44']

# ...and it works instantly for CSV, TSV, or streaming NDJSON logs!
systems_csv = await Incorporator.incorp(inc_file="telemetry.csv", inc_code="id")

📦 Installation

Built entirely on the Python standard library, Pydantic V2 metaprogramming, and HTTPX.

pip install incorporator

Dependencies: pydantic (>=2.0), httpx, tenacity.

For Big Data streams, GIL-free hyperthreading, and ultra-fast Rust compression, use our zero-bloat extras:

pip install incorporator[speedups] # Unlocks GIL-free orjson & lxml parsing
pip install incorporator[cramjam]  # Unlocks zstd, lz4, snappy, brotli compression
pip install incorporator[avro]     # Unlocks Apache Avro binary streams
pip install incorporator[all]      # Installs the complete Enterprise Big Data suite

⛪ The "Holy Trinity" API

Manage your entire data lifecycle with just three @classmethod factories. Everything Incorporator does stems from these three commands:

  1. incorp(): Extract & Transform. (Docs) Fetch unknown data, clean it dynamically, and build the Python object graph.
  2. refresh(): Stateful Updates. (Docs) Pass existing objects back in to seamlessly fetch live updates and hydrate your memory registries.
  3. export(): Load. (Docs) Instantly serialize your deeply nested Python objects out to clean CSV, XML, SQLite, or JSON files.

🕵️‍♂️ The DX Inspector: .test()

Don't know the shape of an API? Don't open Postman. Don't write a schema. Let Incorporator write your code for you.

When exploring a new endpoint, simply swap .incorp() for .test() to trigger the Just-In-Time (JIT) API Profiler. It safely fetches a single page, analyzes the data tree using regex-based value scoring, and prints exactly what kwargs you need to write.

import asyncio
from incorporator import Incorporator

class User(Incorporator): pass

# 1. Hit an unknown API
asyncio.run(User.test(inc_url="https://api.unknown.com/v1/users"))

The Console Output: Instantly, Incorporator prints a complete mapping of the API directly to your terminal:

======================================================================
🕵️‍♂️  INCORPORATOR DX INSPECTOR
======================================================================

📦 1. PAYLOAD STRUCTURE:
   ├── metadata (dict)
   │   ├── count: int = 1500
   │   └── page: int = 1
   └── results (list, len=1500)
       ├── user_uuid: str = a1b2c3d4-e5f6...
       ├── full_name: str = Jimmy Jenkins
       ├── status: bool = True
       ├── created_at: str = 2026-05-12T14:32:00Z
       └── address (dict)

   ⚠️  WARNING: The root object is a dictionary, but it contains arrays.
   💡 SUGGESTION: You probably want to add `rec_path='results'` to your incorp() call.

🔑 2. IDENTITY MAPPING:
   Recommended kwargs for O(1) Memory Registry:
   ✅ inc_code='user_uuid'
   ✅ inc_name='full_name'

🛠️  3. ETL / TYPE CASTING SUGGESTIONS:
   💡 We detected string-based timestamps. Consider passing:
      conv_dict={
          'created_at': inc(datetime),
      }
======================================================================

⚡️ Core Superpowers

1. The 1-Liner: Pagination, Cleaning, & Type Casting

Example: Fetching Space Devs upcoming launches.

You don't need a while loop to paginate, and you don't need to define a massive schema to drill into nested data.

from datetime import datetime
from incorporator import Incorporator, NextUrlPaginator, inc

class Launch(Incorporator): pass

launches = await Launch.incorp(
    inc_url="https://ll.thespacedevs.com/2.2.0/launch/upcoming/",
    rec_path="results",                   # Drill past the useless metadata wrapper
    inc_page=NextUrlPaginator("next"),    # Auto-paginate using the 'next' JSON key
    call_lim=2,                           # Safely cap at 2 pages
    excl_lst=["image", "vid_urls"],       # Drop heavy unneeded keys instantly
    conv_dict={
        "net": inc(datetime)              # Safely cast ISO-8601 strings to datetime objects
    }
)

# Access deeply nested, strongly-typed attributes with ZERO schema definition
print(f"🚀 {launches[0].name}")
print(f"⏰ {launches[0].net.strftime('%B %d, %Y')}")
print(f"📍 {launches[0].pad.location.name}") # Dot-notation straight through nested dicts!

2. Deep Enrichment & Array Reduction

Example: Discovering Pokémon and flattening their stats.

When APIs return heavily nested arrays, Incorporator lets you intercept them using calc(), run a custom Python reduction function, and flatten them into simple native types.

from incorporator.methods.converters import calc

def calculate_bst(stats_array) -> int:
    """Reduces a nested JSON array into a single integer."""
    return sum(stat.get("base_stat", 0) for stat in stats_array if isinstance(stat, dict))

# 1. Shallow Discovery (Fetches URLs)
pokemon_nav = await Nav.incorp(..., inc_child="url") 

# 2. Deep Enrichment (Spawns concurrent requests to all discovered URLs seamlessly)
enriched_pokemon = await Pokemon.incorp(
    inc_parent=pokemon_nav,  # Routes the parent list directly into the network engine!
    inc_code="id",
    conv_dict={
        # Intercepts the raw JSON array, calculates the total, and saves it as an integer!
        "stats": calc(calculate_bst, "stats", default=0, target_type=int),
    },
    name_chg=[("stats", "base_stat_total")] # Rename the key dynamically
)

3. Multi-API Graph Fusion

Example: Fusing CoinGecko assets with Binance Live Order Books.

Stop writing manual matching loops or dumping data into SQL just to join it. Incorporator lets you bind independent APIs together natively using link_to.

from incorporator.methods.converters import link_to, calc

# 1. Define a clean, null-safe formatting function (No lambdas!)
def to_usdt(sym: str) -> str:
    return f"{str(sym).upper()}USDT" if sym else None

# 2. Fetch Binance Order Books (Instantly becomes an O(1) in-memory registry)
binance_books = await BinanceBook.incorp(
    inc_url="https://api.binance.us/.../bookTicker", 
    inc_code="symbol"
)

# 3. Fetch CoinGecko Assets and fuse them dynamically
assets = await CryptoAsset.incorp(
    inc_url="https://api.coingecko.com/...",
    inc_code="id",
    conv_dict={
        # We pass our named formatting function cleanly into the extractor
        "live_book": calc(link_to(binance_books, extractor=to_usdt), "symbol")
    }
)

# Traverse the unified multi-API graph natively
print(f"{assets[0].name} Live Bid: {assets[0].live_book.bidPrice}")

4. XML Ingestion & Declarative Bulk POSTs

Example: Auditing a local XML ledger against a Federal Database.

Need to send a batch POST request based on dynamically extracted XML data? Pass a parent object and use the magical join_all() token to automatically concatenate parent IDs across a Bulk POST payload.

from incorporator.methods.converters import join_all

# 1. Ingest a local XML file
invoices = await Invoice.incorp(
    inc_file="jimmy_ledger.xml",
    rec_path="Dealership.AuditFile.Invoices.Invoice",
    inc_child="Vehicle.VIN" # Extract the VIN numbers from the XML
)

# 2. Declarative Bulk POST using the XML data!
govt_specs = await NHTSASpec.incorp(
    inc_url="https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/",
    inc_parent=invoices,
    http_method="POST",
    payload_type="form",
    form_payload={
        "format": "json",
        "data": join_all(";") # Magically joins all XML VINs by a semicolon!
    },
    rec_path="Results",
    inc_code="VIN"
)

5. The Local Database Pivot (JSON ➡️ SQLite)

Example: Moving a JSON API directly into a local SQL database.

Incorporator treats binary SQLite databases natively. You don't need to write CREATE TABLE schemas or loop through rows. Incorporator inspects the Python types, auto-generates the SQL schema, and executes C-speed bulk inserts instantly.

# 1. Fetch JSON API data
users = await User.incorp("https://api.domain.com/v1/users")

# 2. Dump directly to a local SQLite database! 
# Incorporator automatically creates the 'user' table and maps the schema.
await User.export(users, "local_warehouse.db")

# 3. Read it back using a native SQL query!
active_users = await User.incorp(
    inc_file="local_warehouse.db", 
    sql_query="SELECT * FROM user WHERE is_active = 1"
)

🛠 Enterprise Resilience & Features

🚀 GIL-Free Hyperthreading

Incorporator handles all Disk I/O and format parsing on background threads. When installed with [speedups], the framework seamlessly lazy-loads Rust and C extensions (orjson, lxml) to release the Python GIL, natively mapping multi-gigabyte data sources across all available CPU cores without stalling your async event loop.

🗜️ Invisible Archiving & Compression

Stop writing zipfile extraction logic for compressed API payloads. Incorporator natively detects, intercepts, and decompresses gzip, bz2, lzma, zip, and tar archives in the background—without changing a single line of your parsing code.

# Automatically finds, extracts, and parses the JSON hidden inside the ZIP archive!
sales = await Sales.incorp("https://api.system.com/dump/sales_2026.json.zip")

# Export to a flat CSV, then seamlessly compress it to GZIP in a background thread
await Sales.export(sales, "cleaned_sales.csv", compression="gz")

📡 Invisible Networking & DLQs

You never have to manage httpx.AsyncClient contexts. Incorporator handles shared connection pools natively. It includes exponential backoff retries via Tenacity. If a URL repeatedly fails with an HTTP 429, it gracefully skips it and places it in a Dead Letter Queue.

if launches.failed_sources:
    print(f"DLQ Alert: Programmatically retry these {len(launches.failed_sources)} URLs.")

🧠 Zero-OOM Memory Management

When fetching hundreds of thousands of records, standard Python lists of dicts cause Out-Of-Memory (OOM) crashes. Incorporator wraps lists in an IncorporatorList. Every instance automatically registers itself into its class inc_dict—backed by a weakref.WeakValueDictionary. You get lightning-fast O(1) lookups without blocking the Garbage Collector.

🗄️ Non-Blocking Observability

Swap your base class to LoggedIncorporator and set enable_logging=True. Incorporator spins up QueueHandler background threads to write auto-rotating JSON-line logs (api.log, error.log, debug.log) so disk I/O never blocks your asyncio event loop.

🔄 Stateful Updates & Cross-Format Exports

Fetch XML, interact with it as clean Python objects, and dump it to CSV instantly.

# Update state in memory, then serialize to disk safely without boilerplate
await Incorporator.refresh(launches)
await Incorporator.export(launches, "upcoming_launches.csv", format_type="csv")

📚 Documentation & Examples

The best way to learn Incorporator is through our deeply documented API references and Guided Tutorials.

API References

Guided Tutorials (Real-World Examples)

Check out the /examples directory for runnable code, and the links below for detailed Markdown walkthroughs of each feature:


🤝 Philosophy & Contributing

Incorporator is built on strict OOP principles, non-blocking observability, and a forgiving metaprogramming shield. We trap standard library exceptions (JSONDecodeError, httpx.HTTPStatusError) and gracefully recast them as domain errors. Your event loop is safe with us.

Contributions are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incorporator-1.0.7.tar.gz (60.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

incorporator-1.0.7-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file incorporator-1.0.7.tar.gz.

File metadata

  • Download URL: incorporator-1.0.7.tar.gz
  • Upload date:
  • Size: 60.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for incorporator-1.0.7.tar.gz
Algorithm Hash digest
SHA256 042b0baafe847eb51ef81efe5a2d2e63a1f09e9c73a87c4c4fa394cb8d2ad587
MD5 464db1dc4200dcd31d6822d44af3f717
BLAKE2b-256 6a437615239bebd53477f3e627b0dd27e385c584f7f9cb4a80155f1dd1fd97b2

See more details on using hashes here.

File details

Details for the file incorporator-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: incorporator-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for incorporator-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9bafe423976a88ce9ee6dafe3a4009acb6026fe34413b0d00e3f26ce3adc7910
MD5 94b4a91fa6322ed08f7d58b513d45ce9
BLAKE2b-256 c9ffa8bde470c502d247d8ce0638243608381b5fcb8590de84a289444e20680f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page