Skip to main content

Core infrastructure for the Boti ecosystem

Project description

boti

boti stands for Base Object Transformation Interface.

It is a Python library for building reliable, reusable transformation-oriented software: scripts, services, data pipelines, batch jobs, notebook helpers, and internal tooling that all need the same operational foundations.

At its core, boti is about giving transformation code a consistent runtime model:

  • how resources are opened and closed
  • how file access is constrained and validated
  • how projects discover their root and runtime configuration
  • how logs are emitted in a predictable way

The repository also contains the companion package boti-data, which extends that foundation with SQL, parquet, schema, gateway, and distributed data capabilities.

What problem boti solves

A lot of data and automation code starts small and quickly becomes operationally messy:

  • ad hoc setup and teardown logic
  • duplicated path and file handling
  • environment loading spread across scripts and notebooks
  • inconsistent logging and diagnostics
  • brittle assumptions about where code is running from

That usually leads to code that works in one notebook or one machine, but becomes fragile when reused in pipelines, packaged services, shared libraries, or scheduled jobs.

boti gives those projects a small set of opinionated runtime primitives so the same code can move more cleanly between local development, automation, and production workflows.

Why boti is useful

boti is useful when you want transformation code to behave like a real software component instead of a collection of one-off scripts.

It helps by:

  • standardising resource lifecycle with ManagedResource
  • making constrained file access explicit with SecureResource
  • centralising project-root and environment discovery with ProjectService
  • giving the codebase a shared logging model with Logger

This is especially valuable when multiple teams or notebooks interact with the same codebase, because it reduces hidden assumptions and makes behaviour more predictable.

What boti-data adds

boti-data is the data layer for the Boti ecosystem.

Where boti solves the runtime and application-structure problems, boti-data solves the data access and data movement problems that appear once teams need to work across databases, parquet files, schemas, and distributed workloads.

It provides:

  • SQL database resources and session management
  • SQLAlchemy model reflection and model registries
  • connection catalogues for named data sources
  • gateway-style loading APIs
  • parquet resources and readers
  • schema normalisation, validation, and field mapping
  • filter expressions and join helpers
  • partitioned and distributed loading workflows

In practice, it helps teams replace repetitive, hand-rolled access code with a consistent interface for loading, validating, shaping, and moving data.

Where boti-data can make a big difference

boti-data is useful anywhere teams need to bridge operational systems and analytical workflows without rewriting the same infrastructure over and over.

It can be especially impactful in domains such as:

  • analytics engineering: consistent loading from source systems into analysis-ready frames
  • business intelligence: reusable connection catalogues, filters, and schema handling across reports
  • operations and supply chain: joining transactional data from multiple systems with safer loading patterns
  • finance and risk: explicit schemas, reproducible transformations, and controlled access to structured data
  • customer, product, and growth analytics: repeatable extraction and normalisation across many upstream tables
  • ML and feature pipelines: partitioned loads, parquet workflows, and predictable resource management
  • research and notebook-heavy teams: moving from exploratory code to reusable library code without losing speed

The value is largest when data work sits in the gap between raw infrastructure and business logic: not just querying tables, but building maintainable, reusable data interfaces.

Packages

Core package

pip install boti

Core imports:

from boti import Logger, ManagedResource, ProjectService, SecureResource
from boti.core import is_secure_path

You can also import from boti.core directly:

from boti.core import Logger, ManagedResource, ProjectService, SecureResource

Core + data package

pip install "boti[data]"

or:

pip install boti-data

Data imports live under the separate top-level package:

from boti_data import DataGateway, DataHelper, SqlDatabaseConfig, SqlDatabaseResource

Quick start

Managed resource

from boti import ManagedResource


class MyResource(ManagedResource):
    def _cleanup(self) -> None:
        print("cleaning up")


with MyResource() as resource:
    print(resource.closed)  # False

Filesystem configuration

FilesystemConfig provides a typed way to describe where a resource should read and write data. It uses fsspec underneath, so boti can work with the local filesystem, S3-compatible object storage, and any other backend supported by your installed fsspec drivers.

Local files

from boti.core.filesystem import FilesystemConfig, create_filesystem

config = FilesystemConfig(
    fs_type="file",
    fs_path="/srv/boti/data",
)

fs = create_filesystem(config)
with fs.open("/srv/boti/data/example.txt", "w") as handle:
    handle.write("hello")

S3 server connections

Use this pattern when connecting to AWS S3 or to an S3-compatible server such as MinIO, Ceph, or another internal object-storage endpoint.

from boti.core.filesystem import FilesystemConfig, FilesystemAdapter

config = FilesystemConfig(
    fs_type="s3",
    fs_path="analytics-bucket/raw/events",
    fs_key="ACCESS_KEY",
    fs_secret="SECRET_KEY",
    fs_endpoint="https://minio.internal.example",
    fs_region="eu-west-1",
)

adapter = FilesystemAdapter(config)
fs = adapter.get_filesystem()

with fs.open("analytics-bucket/raw/events/2026-04-15.json", "rb") as handle:
    payload = handle.read()

fs_endpoint points at the S3 server, while fs_path identifies the bucket and prefix you want to work with.

Other supported filesystems

Any backend recognised by the installed fsspec stack can be used through fs_type. Common examples include:

  • memory for tests and ephemeral workflows
  • gcs for Google Cloud Storage
  • az or abfs for Azure storage
  • ftp, sftp, or http where the relevant driver is installed
from boti.core.filesystem import FilesystemConfig

memory_config = FilesystemConfig(fs_type="memory", fs_path="scratch")
gcs_config = FilesystemConfig(fs_type="gcs", fs_path="my-bucket/datasets")
azure_config = FilesystemConfig(fs_type="az", fs_path="container/path")

Project service

from boti import ProjectService

project_root = ProjectService.detect_project_root()
env_file = ProjectService.setup_environment(project_root)

Secure file access

SecureResource wraps file operations in a sandbox. By default it allows paths under the detected project root and the system temporary directory, and you can add extra allowlisted paths explicitly.

from pathlib import Path

from boti import SecureResource
from boti.core.models import ResourceConfig

config = ResourceConfig(project_root=Path.cwd())

with SecureResource(config=config) as resource:
    contents = resource.read_text_secure("README.md")

Allow an additional trusted directory

from pathlib import Path

from boti import SecureResource
from boti.core.models import ResourceConfig

config = ResourceConfig(
    project_root=Path("/workspace/project"),
    extra_allowed_paths=[Path("/srv/shared/reference-data")],
)

with SecureResource(config=config) as resource:
    reference = resource.read_text_secure("/srv/shared/reference-data/lookup.csv")

Block unsafe paths

from pathlib import Path

from boti import SecureResource
from boti.core.models import ResourceConfig

config = ResourceConfig(project_root=Path("/workspace/project"))

with SecureResource(config=config) as resource:
    try:
        resource.read_text_secure("/etc/passwd")
    except PermissionError:
        print("outside the configured sandbox roots")

Logger

Logger provides a thread-safe, non-blocking logging layer with secure file handling and sensitive-data redaction.

Quick logger

from pathlib import Path

from boti import Logger

logger = Logger.default_logger(
    logger_name="daily_job",
    log_file="daily_job",
    base_dir=Path("/workspace/project"),
)

logger.info("starting extraction")
logger.warning("retrying after transient error")

Explicit logger configuration

from pathlib import Path

from boti.core.logger import Logger
from boti.core.models import LoggerConfig

config = LoggerConfig(
    log_dir=Path("/workspace/project/logs"),
    logger_name="etl.pipeline",
    log_file="etl_pipeline",
    verbose=True,
)

logger = Logger(config)
logger.set_level(Logger.INFO)
logger.info("rows loaded=%s", 1200)

Subclassing ManagedResource

ManagedResource supports both synchronous and asynchronous cleanup patterns, so custom resources can expose the same lifecycle contract whether they wrap filesystems, clients, sockets, or other runtime state.

Synchronous resource

from boti import ManagedResource


class FilesystemResource(ManagedResource):
    def write_text(self, path: str, content: str) -> None:
        fs = self.require_fs()
        with fs.open(path, "w", encoding="utf-8") as handle:
            handle.write(content)

    def read_text(self, path: str) -> str:
        fs = self.require_fs()
        with fs.open(path, "r", encoding="utf-8") as handle:
            return handle.read()

    def _cleanup(self) -> None:
        if self._owns_fs and self.fs is not None:
            self.fs = None
import fsspec

resource = FilesystemResource(fs_factory=lambda: fsspec.filesystem("memory"))

with resource:
    resource.write_text("memory://example.txt", "hello from fsspec")
    print(resource.read_text("memory://example.txt"))

Asynchronous resource

import asyncio

from boti import ManagedResource


class AsyncClientResource(ManagedResource):
    def __init__(self, client) -> None:
        super().__init__()
        self.client = client

    async def _acleanup(self) -> None:
        await self.client.aclose()


async def main(client) -> None:
    async with AsyncClientResource(client) as resource:
        await asyncio.sleep(0)

If a subclass only implements _cleanup(), await resource.aclose() will fall back to running the synchronous cleanup safely.

More package-specific docs

Development

Run tests with the project interpreter:

PYTHONPATH=src python -m pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boti-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boti-0.1.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file boti-0.1.0.tar.gz.

File metadata

  • Download URL: boti-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boti-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c1faedcc2bbdb0a15a0f7c9a28452ad9e08f78cf5f0da6c7f746d2772bbd0651
MD5 81df940e8154f19aedb2debac67d7d33
BLAKE2b-256 b5c6ddd5f48cc19e6409b66ac71aad6964afd885e9d654249dad857cae4bfba6

See more details on using hashes here.

File details

Details for the file boti-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: boti-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boti-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1cc0414c48b8726a728c194c44a3a03f4f8d25fbc398a7fe09f37582707c716b
MD5 3fcd61183346ddbe610e30079810396a
BLAKE2b-256 93ae15981bdaa5bcdc68090c05e155d8e62f067e5bde9303b4087b6340628333

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page