Core infrastructure for the Boti ecosystem
Project description
boti
boti stands for Base Object Transformation Interface.
It is a Python library for building reliable, reusable transformation-oriented software: scripts, services, data pipelines, batch jobs, notebook helpers, and internal tooling that all need the same operational foundations.
At its core, boti is about giving transformation code a consistent runtime model:
- how resources are opened and closed
- how file access is constrained and validated
- how projects discover their root and runtime configuration
- how logs are emitted in a predictable way
The repository also contains the companion package boti-data, which extends that foundation with SQL, parquet, schema, gateway, and distributed data capabilities.
What problem boti solves
A lot of data and automation code starts small and quickly becomes operationally messy:
- ad hoc setup and teardown logic
- duplicated path and file handling
- environment loading spread across scripts and notebooks
- inconsistent logging and diagnostics
- brittle assumptions about where code is running from
That usually leads to code that works in one notebook or one machine, but becomes fragile when reused in pipelines, packaged services, shared libraries, or scheduled jobs.
boti gives those projects a small set of opinionated runtime primitives so the same code can move more cleanly between local development, automation, and production workflows.
Why boti is useful
boti is useful when you want transformation code to behave like a real software component instead of a collection of one-off scripts.
It helps by:
- standardising resource lifecycle with
ManagedResource - making constrained file access explicit with
SecureResource - centralising project-root and environment discovery with
ProjectService - giving the codebase a shared logging model with
Logger
This is especially valuable when multiple teams or notebooks interact with the same codebase, because it reduces hidden assumptions and makes behaviour more predictable.
What boti-data adds
boti-data is the data layer for the Boti ecosystem.
Where boti solves the runtime and application-structure problems, boti-data solves the data access and data movement problems that appear once teams need to work across databases, parquet files, schemas, and distributed workloads.
It provides:
- SQL database resources and session management
- SQLAlchemy model reflection and model registries
- connection catalogues for named data sources
- gateway-style loading APIs
- parquet resources and readers
- schema normalisation, validation, and field mapping
- filter expressions and join helpers
- partitioned and distributed loading workflows
In practice, it helps teams replace repetitive, hand-rolled access code with a consistent interface for loading, validating, shaping, and moving data.
Where boti-data can make a big difference
boti-data is useful anywhere teams need to bridge operational systems and analytical workflows without rewriting the same infrastructure over and over.
It can be especially impactful in domains such as:
- analytics engineering: consistent loading from source systems into analysis-ready frames
- business intelligence: reusable connection catalogues, filters, and schema handling across reports
- operations and supply chain: joining transactional data from multiple systems with safer loading patterns
- finance and risk: explicit schemas, reproducible transformations, and controlled access to structured data
- customer, product, and growth analytics: repeatable extraction and normalisation across many upstream tables
- ML and feature pipelines: partitioned loads, parquet workflows, and predictable resource management
- research and notebook-heavy teams: moving from exploratory code to reusable library code without losing speed
The value is largest when data work sits in the gap between raw infrastructure and business logic: not just querying tables, but building maintainable, reusable data interfaces.
Packages
Core package
pip install boti
Core imports:
from boti import Logger, ManagedResource, ProjectService, SecureResource
from boti.core import is_secure_path
You can also import from boti.core directly:
from boti.core import Logger, ManagedResource, ProjectService, SecureResource
Core + data package
pip install "boti[data]"
or:
pip install boti-data
Data imports live under the separate top-level package:
from boti_data import DataGateway, DataHelper, SqlDatabaseConfig, SqlDatabaseResource
Quick start
Managed resource
from boti import ManagedResource
class MyResource(ManagedResource):
def _cleanup(self) -> None:
print("cleaning up")
with MyResource() as resource:
print(resource.closed) # False
Filesystem configuration
FilesystemConfig provides a typed way to describe where a resource should read and write data. It uses fsspec underneath, so boti can work with the local filesystem, S3-compatible object storage, and any other backend supported by your installed fsspec drivers.
Local files
from boti.core.filesystem import FilesystemConfig, create_filesystem
config = FilesystemConfig(
fs_type="file",
fs_path="/srv/boti/data",
)
fs = create_filesystem(config)
with fs.open("/srv/boti/data/example.txt", "w") as handle:
handle.write("hello")
S3 server connections
Use this pattern when connecting to AWS S3 or to an S3-compatible server such as MinIO, Ceph, or another internal object-storage endpoint.
from boti.core.filesystem import FilesystemConfig, FilesystemAdapter
config = FilesystemConfig(
fs_type="s3",
fs_path="analytics-bucket/raw/events",
fs_key="ACCESS_KEY",
fs_secret="SECRET_KEY",
fs_endpoint="https://minio.internal.example",
fs_region="eu-west-1",
)
adapter = FilesystemAdapter(config)
fs = adapter.get_filesystem()
with fs.open("analytics-bucket/raw/events/2026-04-15.json", "rb") as handle:
payload = handle.read()
fs_endpoint points at the S3 server, while fs_path identifies the bucket and prefix you want to work with.
Other supported filesystems
Any backend recognised by the installed fsspec stack can be used through fs_type. Common examples include:
memoryfor tests and ephemeral workflowsgcsfor Google Cloud Storageazorabfsfor Azure storageftp,sftp, orhttpwhere the relevant driver is installed
from boti.core.filesystem import FilesystemConfig
memory_config = FilesystemConfig(fs_type="memory", fs_path="scratch")
gcs_config = FilesystemConfig(fs_type="gcs", fs_path="my-bucket/datasets")
azure_config = FilesystemConfig(fs_type="az", fs_path="container/path")
Project service
from boti import ProjectService
project_root = ProjectService.detect_project_root()
env_file = ProjectService.setup_environment(project_root)
Secure file access
SecureResource wraps file operations in a sandbox. By default it allows paths under the detected project root and the system temporary directory, and you can add extra allowlisted paths explicitly.
from pathlib import Path
from boti import SecureResource
from boti.core.models import ResourceConfig
config = ResourceConfig(project_root=Path.cwd())
with SecureResource(config=config) as resource:
contents = resource.read_text_secure("README.md")
Allow an additional trusted directory
from pathlib import Path
from boti import SecureResource
from boti.core.models import ResourceConfig
config = ResourceConfig(
project_root=Path("/workspace/project"),
extra_allowed_paths=[Path("/srv/shared/reference-data")],
)
with SecureResource(config=config) as resource:
reference = resource.read_text_secure("/srv/shared/reference-data/lookup.csv")
Block unsafe paths
from pathlib import Path
from boti import SecureResource
from boti.core.models import ResourceConfig
config = ResourceConfig(project_root=Path("/workspace/project"))
with SecureResource(config=config) as resource:
try:
resource.read_text_secure("/etc/passwd")
except PermissionError:
print("outside the configured sandbox roots")
Logger
Logger provides a thread-safe, non-blocking logging layer with secure file handling and sensitive-data redaction.
Quick logger
from pathlib import Path
from boti import Logger
logger = Logger.default_logger(
logger_name="daily_job",
log_file="daily_job",
base_dir=Path("/workspace/project"),
)
logger.info("starting extraction")
logger.warning("retrying after transient error")
Explicit logger configuration
from pathlib import Path
from boti.core.logger import Logger
from boti.core.models import LoggerConfig
config = LoggerConfig(
log_dir=Path("/workspace/project/logs"),
logger_name="etl.pipeline",
log_file="etl_pipeline",
verbose=True,
)
logger = Logger(config)
logger.set_level(Logger.INFO)
logger.info("rows loaded=%s", 1200)
Subclassing ManagedResource
ManagedResource supports both synchronous and asynchronous cleanup patterns, so custom resources can expose the same lifecycle contract whether they wrap filesystems, clients, sockets, or other runtime state.
Synchronous resource
from boti import ManagedResource
class FilesystemResource(ManagedResource):
def write_text(self, path: str, content: str) -> None:
fs = self.require_fs()
with fs.open(path, "w", encoding="utf-8") as handle:
handle.write(content)
def read_text(self, path: str) -> str:
fs = self.require_fs()
with fs.open(path, "r", encoding="utf-8") as handle:
return handle.read()
def _cleanup(self) -> None:
if self._owns_fs and self.fs is not None:
self.fs = None
import fsspec
resource = FilesystemResource(fs_factory=lambda: fsspec.filesystem("memory"))
with resource:
resource.write_text("memory://example.txt", "hello from fsspec")
print(resource.read_text("memory://example.txt"))
Asynchronous resource
import asyncio
from boti import ManagedResource
class AsyncClientResource(ManagedResource):
def __init__(self, client) -> None:
super().__init__()
self.client = client
async def _acleanup(self) -> None:
await self.client.aclose()
async def main(client) -> None:
async with AsyncClientResource(client) as resource:
await asyncio.sleep(0)
If a subclass only implements _cleanup(), await resource.aclose() will fall back to running the synchronous cleanup safely.
More package-specific docs
Development
Run tests with the project interpreter:
PYTHONPATH=src python -m pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boti-0.1.0.tar.gz.
File metadata
- Download URL: boti-0.1.0.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1faedcc2bbdb0a15a0f7c9a28452ad9e08f78cf5f0da6c7f746d2772bbd0651
|
|
| MD5 |
81df940e8154f19aedb2debac67d7d33
|
|
| BLAKE2b-256 |
b5c6ddd5f48cc19e6409b66ac71aad6964afd885e9d654249dad857cae4bfba6
|
File details
Details for the file boti-0.1.0-py3-none-any.whl.
File metadata
- Download URL: boti-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cc0414c48b8726a728c194c44a3a03f4f8d25fbc398a7fe09f37582707c716b
|
|
| MD5 |
3fcd61183346ddbe610e30079810396a
|
|
| BLAKE2b-256 |
93ae15981bdaa5bcdc68090c05e155d8e62f067e5bde9303b4087b6340628333
|