Skip to main content

Data infrastructure for the Boti ecosystem

Project description

boti-data

boti-data is the data access and data transformation layer of the Boti ecosystem.

It builds on top of boti and gives teams a reusable interface for working with structured data across databases, parquet datasets, schema-controlled transformations, and distributed or partitioned loading workflows.

What boti-data is for

Many teams have the same recurring problem: business logic depends on data that lives in multiple places, arrives in slightly different shapes, and is loaded through a mix of notebooks, scripts, ad hoc SQL, and one-off helpers.

boti-data helps turn that into a more coherent data access layer.

It is designed for codebases that need to:

  • connect to named data sources consistently
  • reflect or model database tables without hand-writing everything up front
  • load data through a gateway instead of bespoke query snippets everywhere
  • normalise and validate schemas before downstream use
  • combine parquet and database workflows in one library
  • scale from simple local reads to partitioned or distributed loading

Problems boti-data solves

boti-data is useful when data code is suffering from issues like:

  • repeated connection boilerplate across notebooks and services
  • slow, fragile query code copied from place to place
  • inconsistent schema assumptions between producers and consumers
  • difficult transitions from exploratory analysis to reusable pipelines
  • manual join and field-mapping logic repeated in many modules
  • no common abstraction for loading data from SQL and parquet sources

By centralising those patterns, boti-data reduces duplicated plumbing and makes transformations easier to reason about.

Why boti-data can make a huge difference

The biggest benefit of boti-data is that it creates a shared data interface between infrastructure and business logic.

That means teams can spend less time rewriting access code and more time working on actual transformations, validation rules, and downstream decisions.

It can make a major difference when:

  • analysts and engineers share the same source systems
  • a notebook prototype needs to become production code
  • multiple data products depend on the same tables or parquet layouts
  • schema drift is a recurring source of errors
  • large extracts need partitioning or distributed execution
  • teams want a clean boundary between connection details and transformation logic

Domain areas where it is especially valuable

boti-data is intentionally general-purpose, but it is especially strong in domains where structured operational data must be transformed into reliable analytical or decision-ready datasets.

Examples include:

  • analytics engineering: building reusable source loaders, schema maps, and standardised transformations
  • business operations: consolidating data from transactional systems, planning tools, and operational databases
  • finance and controlling: reconciling structured data with explicit schema expectations and repeatable joins
  • risk, compliance, and audit: validating input shape, tracing transformations, and standardising access patterns
  • customer and product analytics: joining behavioural and operational datasets with less custom plumbing
  • supply chain and logistics: unifying inventory, movement, order, and status data from several systems
  • data platform and internal tooling: giving teams a common gateway layer instead of ad hoc connectors
  • ML feature preparation: building reliable dataset assembly steps from SQL and parquet sources

In those settings, the gains are not just convenience. They show up as better reuse, fewer integration bugs, and faster movement from exploration to production.

Core capabilities

  • SQL database resources
  • async and sync database access helpers
  • SQLAlchemy model reflection and registries
  • connection catalogues
  • parquet resources and readers
  • gateway-style loading APIs
  • filter expressions
  • schema normalisation and validation helpers
  • field mapping and join helpers
  • partitioned and distributed data workflows

Installation

Install directly:

pip install boti-data

Or install through the core package extra:

pip install "boti[data]"

Imports

boti-data uses the top-level Python package boti_data:

from boti_data import (
    ConnectionCatalog,
    DataGateway,
    DataHelper,
    FieldMap,
    ParquetDataConfig,
    ParquetDataResource,
    SqlAlchemyModelBuilder,
    SqlDatabaseConfig,
    SqlDatabaseResource,
)

Lower-level modules are also available:

from boti_data.db import SqlDatabaseConfig, SqlDatabaseResource
from boti_data.gateway import DataGateway
from boti_data.parquet import ParquetDataConfig, ParquetDataResource
from boti_data.schema import validate_schema

Examples

SQL resource

from boti_data import SqlDatabaseConfig, SqlDatabaseResource

config = SqlDatabaseConfig(connection_url="sqlite:///example.db", query_only=True)

with SqlDatabaseResource(config) as db:
    with db.session() as session:
        rows = session.execute(...)

Gateway

from boti_data import DataGateway, SqlDatabaseConfig

gateway = DataGateway(
    backend="sqlalchemy",
    config=SqlDatabaseConfig(connection_url="sqlite:///example.db", query_only=True),
)

Relationship to boti

boti-data depends on boti, and reuses:

  • logging
  • resource lifecycle
  • secure I/O helpers
  • project/environment utilities

If you only need the runtime primitives, install boti. If you need a stronger data access and transformation layer, install boti-data or boti[data].

Development & Deployment

See docs/DEPLOYMENT.md for publishing instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boti_data-0.1.1.tar.gz (79.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boti_data-0.1.1-py3-none-any.whl (92.1 kB view details)

Uploaded Python 3

File details

Details for the file boti_data-0.1.1.tar.gz.

File metadata

  • Download URL: boti_data-0.1.1.tar.gz
  • Upload date:
  • Size: 79.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boti_data-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ad74ac19472096aa1e664bcfd0af0487010311353e412f46336e7f0df0f36b2
MD5 aa4311ca756d172b49b77c7aaf054669
BLAKE2b-256 36e81d82db34d852ccb8df466701e1da48b2063cb575d207c2c752e1afc3349a

See more details on using hashes here.

File details

Details for the file boti_data-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: boti_data-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 92.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boti_data-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 61dc788d9d566edccdc354f0d62c0de118537a738b093ffe6c82c9bed86806a6
MD5 77390759366313852a678a7803642ff4
BLAKE2b-256 f05aab4b68cde3c00e68e72f98ad2fdd6b3feba6df2b3b71fe398d04bb333d0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page