Data infrastructure for the Boti ecosystem
Project description
boti-data
boti-data is the data access and data transformation layer of the Boti ecosystem.
It builds on top of boti and gives teams a reusable interface for working with structured data across databases, parquet datasets, schema-controlled transformations, and distributed or partitioned loading workflows.
What boti-data is for
Many teams have the same recurring problem: business logic depends on data that lives in multiple places, arrives in slightly different shapes, and is loaded through a mix of notebooks, scripts, ad hoc SQL, and one-off helpers.
boti-data helps turn that into a more coherent data access layer.
It is designed for codebases that need to:
- connect to named data sources consistently
- reflect or model database tables without hand-writing everything up front
- load data through a gateway instead of bespoke query snippets everywhere
- normalise and validate schemas before downstream use
- combine parquet and database workflows in one library
- scale from simple local reads to partitioned or distributed loading
Problems boti-data solves
boti-data is useful when data code is suffering from issues like:
- repeated connection boilerplate across notebooks and services
- slow, fragile query code copied from place to place
- inconsistent schema assumptions between producers and consumers
- difficult transitions from exploratory analysis to reusable pipelines
- manual join and field-mapping logic repeated in many modules
- no common abstraction for loading data from SQL and parquet sources
By centralising those patterns, boti-data reduces duplicated plumbing and makes transformations easier to reason about.
Why boti-data can make a huge difference
The biggest benefit of boti-data is that it creates a shared data interface between infrastructure and business logic.
That means teams can spend less time rewriting access code and more time working on actual transformations, validation rules, and downstream decisions.
It can make a major difference when:
- analysts and engineers share the same source systems
- a notebook prototype needs to become production code
- multiple data products depend on the same tables or parquet layouts
- schema drift is a recurring source of errors
- large extracts need partitioning or distributed execution
- teams want a clean boundary between connection details and transformation logic
Domain areas where it is especially valuable
boti-data is intentionally general-purpose, but it is especially strong in domains where structured operational data must be transformed into reliable analytical or decision-ready datasets.
Examples include:
- analytics engineering: building reusable source loaders, schema maps, and standardised transformations
- business operations: consolidating data from transactional systems, planning tools, and operational databases
- finance and controlling: reconciling structured data with explicit schema expectations and repeatable joins
- risk, compliance, and audit: validating input shape, tracing transformations, and standardising access patterns
- customer and product analytics: joining behavioural and operational datasets with less custom plumbing
- supply chain and logistics: unifying inventory, movement, order, and status data from several systems
- data platform and internal tooling: giving teams a common gateway layer instead of ad hoc connectors
- ML feature preparation: building reliable dataset assembly steps from SQL and parquet sources
In those settings, the gains are not just convenience. They show up as better reuse, fewer integration bugs, and faster movement from exploration to production.
Core capabilities
- SQL database resources
- async and sync database access helpers
- SQLAlchemy model reflection and registries
- connection catalogues
- parquet resources and readers
- gateway-style loading APIs
- filter expressions
- schema normalisation and validation helpers
- field mapping and join helpers
- partitioned and distributed data workflows
Installation
Install directly:
pip install boti-data
Or install through the core package extra:
pip install "boti[data]"
Imports
boti-data uses the top-level Python package boti_data:
from boti_data import (
ConnectionCatalog,
DataGateway,
DataHelper,
FieldMap,
ParquetDataConfig,
ParquetDataResource,
SqlAlchemyModelBuilder,
SqlDatabaseConfig,
SqlDatabaseResource,
)
Lower-level modules are also available:
from boti_data.db import SqlDatabaseConfig, SqlDatabaseResource
from boti_data.gateway import DataGateway
from boti_data.parquet import ParquetDataConfig, ParquetDataResource
from boti_data.schema import validate_schema
Examples
SQL resource
from boti_data import SqlDatabaseConfig, SqlDatabaseResource
config = SqlDatabaseConfig(connection_url="sqlite:///example.db", query_only=True)
with SqlDatabaseResource(config) as db:
with db.session() as session:
rows = session.execute(...)
Gateway
from boti_data import DataGateway, SqlDatabaseConfig
gateway = DataGateway(
backend="sqlalchemy",
config=SqlDatabaseConfig(connection_url="sqlite:///example.db", query_only=True),
)
Relationship to boti
boti-data depends on boti, and reuses:
- logging
- resource lifecycle
- secure I/O helpers
- project/environment utilities
If you only need the runtime primitives, install boti.
If you need a stronger data access and transformation layer, install boti-data or boti[data].
Development & Deployment
See docs/DEPLOYMENT.md for publishing instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boti_data-0.1.1.tar.gz.
File metadata
- Download URL: boti_data-0.1.1.tar.gz
- Upload date:
- Size: 79.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ad74ac19472096aa1e664bcfd0af0487010311353e412f46336e7f0df0f36b2
|
|
| MD5 |
aa4311ca756d172b49b77c7aaf054669
|
|
| BLAKE2b-256 |
36e81d82db34d852ccb8df466701e1da48b2063cb575d207c2c752e1afc3349a
|
File details
Details for the file boti_data-0.1.1-py3-none-any.whl.
File metadata
- Download URL: boti_data-0.1.1-py3-none-any.whl
- Upload date:
- Size: 92.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61dc788d9d566edccdc354f0d62c0de118537a738b093ffe6c82c9bed86806a6
|
|
| MD5 |
77390759366313852a678a7803642ff4
|
|
| BLAKE2b-256 |
f05aab4b68cde3c00e68e72f98ad2fdd6b3feba6df2b3b71fe398d04bb333d0c
|