Skip to main content

Some tools for building a Translator BigKG. This software project is experimental and unfinished.

Project description

stitch-proj

Some tools for building a Translator BigKG.
This software project is experimental and under active development.


Installation

From PyPI

pip install stitch-proj

For development

pip install stitch-proj[dev]

From source

git clone https://github.com/Translator-CATRAX/stitch-proj.git
cd stitch-proj
pip install -e .[dev]

Overview

There are two primary intended users of stitch-proj:

  1. Ingester A developer who wants to ingest the Babel concept identifier normalization database into a local SQLite database.

  2. Querier A developer building an application (e.g., a BigKG build system) who wants to programmatically query a local Babel SQLite database.


Package Structure

This project uses a src/ layout:

src/
  stitch_proj/
    ingest_babel.py
    local_babel.py
    row_counts.py
    stitchutils.py

Import the package as:

import stitch_proj

Tools

  • stitch_proj.ingest_babel Downloads and ingests the Babel database into a local SQLite database.

  • stitch_proj.local_babel Provides functions for querying a local Babel SQLite database.

  • stitch_proj.row_counts Prints table row counts for a local Babel SQLite database.


Running the Ingest

After installation, the console script is available:

ingest-babel --help

Or invoke via module:

python -m stitch_proj.ingest_babel --help

A full ingest requires:

  • CPython 3.12
  • At least 32 GiB RAM
  • ~600 GiB temporary disk space
  • ~200 GiB for the final SQLite database

A full ingest may take 30–40 hours depending on hardware.


Downloading a Pre-Built Babel Database

A pre-built SQLite file is available from S3:

https://rtx-kg2-public.s3.us-west-2.amazonaws.com/babel-20250331-p1.sqlite

Place it in a directory such as:

db/babel.sqlite

You can then use stitch_proj.local_babel to query it.


Running Tests

Ensure a valid babel.sqlite file exists locally, then run:

pytest -v

Some tests require internet connectivity.


Systems Tested

ingest_babel.py has been tested on:

  • Ubuntu 24.04 (x86_64, Intel Xeon)
  • Ubuntu 24.04 (ARM64, AWS Graviton3)
  • macOS 14 (Apple Silicon)

The package is pure Python and platform-independent, but large ingests require substantial memory and storage.


Development Workflow

Run linting, typing, and tests with:

pytest
ruff check .
mypy src

Or install development dependencies:

pip install -e .[dev]

License

MIT License. See LICENSE.


Citation

Please see the Babel project's CITATION.cff:

https://github.com/TranslatorSRI/Babel/blob/master/CITATION.cff

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stitch_proj-0.1.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stitch_proj-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file stitch_proj-0.1.0.tar.gz.

File metadata

  • Download URL: stitch_proj-0.1.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for stitch_proj-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1ee2773673614fa98fd9043148c3965e921650e8c015f7141624109720a6bcc3
MD5 f8c4876fd20a122f3c6e1e83910c6b3c
BLAKE2b-256 cf5ff98d1c2a52aec55e9092d8513c905a2869c3c1dcc55249878d2eccc4dd12

See more details on using hashes here.

File details

Details for the file stitch_proj-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: stitch_proj-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for stitch_proj-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e78bf4cd66f50cccb0051cec862ecbc955dac3758b5726ad7dc667753c22b765
MD5 5440537a0c754bb3e88e81624b7339df
BLAKE2b-256 f018b0fa50900c589c2eb892feffcb3230521dc472d6d6740c3b605119047e9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page