QALITA Platform Core lib for common function used in pack

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
- Information Technology
License
- Other/Proprietary License
Natural Language
- English
Operating System
- Unix
Programming Language
Topic
- Software Development :: Quality Assurance

Project description

QALITA Core

QALITA Core is a lightweight helper library used by QALITA packs to load data from multiple sources, materialize them to Parquet in deterministic chunks, and share common utilities (sanitization and aggregation helpers).

Key features

Unified data access via a simple DataSource abstraction and factory
File, database, and object storage loaders with streaming to Parquet
Deterministic, size-bounded Parquet chunking with stable filenames
Safe Parquet writing for pandas DataFrames (automatic sanitization)
Shared aggregators for completeness, outliers, duplicates, and timeliness
Minimal pack runtime with JSON config loading and simple asset persistence

Supported sources

Files: CSV (.csv), Excel (.xlsx), JSON, Parquet (pass-through)
Databases: PostgreSQL, MySQL, Oracle, MS SQL Server, SQLite
Object storage: Amazon S3, Google Cloud Storage, Azure Blob (via abfs), HDFS

Notes:

Folder, MongoDB classes exist as placeholders; MongoDB is not yet implemented.
SQLite is supported through the generic DatabaseSource when selected via type: "sqlite".

Installation

Prerequisites: Python 3.10–3.12 and uv.

Install dependencies and set your environment:

pip install uv
uv sync

Open a uv shell when developing:

uv shell

Quickstart

Use within a Pack

Pack loads four JSON files by default (overridable) and provides load_data() for source or target triggers.

from qalita_core.pack import Pack

pack = Pack(configs={
    "pack_conf": "./pack_conf.json",
    "source_conf": "./source_conf.json",
    "target_conf": "./target_conf.json",
    "agent_file": "~/.qalita/.worker",
})

# Ensure chunking/output are set (can be in pack_conf["job"] too)
pack.pack_config.setdefault("job", {})
pack.pack_config["job"]["parquet_output_dir"] = "./parquet"
pack.pack_config["job"]["chunk_rows"] = 100_000

# Load source
source_paths = pack.load_data("source")
# Load target (optional)
target_paths = pack.load_data("target")

# Persist custom metrics/recommendations/schemas to JSON files
pack.metrics.data.append({"key": "score", "value": "0.95", "scope": {"perimeter": "dataset", "value": "my_dataset"}})
pack.metrics.save()       # writes metrics.json
pack.recommendations.save()  # writes recommendations.json
pack.schemas.save()          # writes schemas.json

Parquet chunking and filenames

CSV/JSON/Excel are streamed with chunksize into multiple parquet files.
Databases are read with chunked SQL via SQLAlchemy/pandas.read_sql.
Filenames use a stable pattern: <source>_<object>_part_<k>.parquet where:
- <source> is a slug of the source type (e.g. file, sqlite, postgresql).
- <object> is a slug of the table name, query label, or file stem.
- Example: file_testdata_part_1.parquet, sqlite_items_part_3.parquet, sqlite_query_part_2.parquet.

Configure output and size via pack_config:

parquet_output_dir (default: ./parquet)
chunk_rows (default: 100000)
Optional job.source.skiprows applied to CSV/Excel

Safe Parquet writing for pandas

On import, QALITA Core installs a small monkeypatch so DataFrame.to_parquet:

Ensures column names are strings
Decodes bytes to UTF‑8 strings when present
Normalizes mixed-type object columns and categoricals
Defaults to engine="pyarrow"

You can also call the sanitizer explicitly:

from qalita_core import sanitize_dataframe_for_parquet
clean_df = sanitize_dataframe_for_parquet(df)

Aggregation helpers (for packs)

Helpers centralize common result/metric aggregation logic:

from qalita_core import (
    detect_chunked_from_items,
    normalize_and_dedupe_recommendations,
    CompletenessAggregator,
    OutlierAggregator,
    DuplicateAggregator,
    TimelinessAggregator,
)

CompletenessAggregator: column/dataset completeness and schema extraction
OutlierAggregator: per-column and dataset outlier/normality metrics
DuplicateAggregator: duplicate counts and dataset-level score using key columns
TimelinessAggregator: dates/years coverage and recency scoring

Development

Tests: uv run pytest
Formatting: uv run black .
Linting: uv run flake8 and uv run pylint <module>
Editable install while debugging:

uv sync
uv pip install -e .

Documentation

Additional material can be found in the online documentation: https://doc.qalita.io/.

Project details

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
- Information Technology
License
- Other/Proprietary License
Natural Language
- English
Operating System
- Unix
Programming Language
Topic
- Software Development :: Quality Assurance

Release history Release notifications | RSS feed

1.5.2

Mar 18, 2026

1.5.1

Feb 3, 2026

1.5.0

Feb 3, 2026

1.4.1

Jan 23, 2026

This version

1.4.0

Jan 14, 2026

1.3.0

Jan 8, 2026

1.2.4

Sep 11, 2025

1.2.3

Sep 11, 2025

1.2.2

Sep 11, 2025

1.2.1

Sep 5, 2025

1.2.0

Sep 5, 2025

1.1.1

Sep 5, 2025

1.1.0

Sep 2, 2025

1.0.1

Aug 28, 2025

1.0.0

Aug 26, 2025

0.6.6

Aug 20, 2025

0.6.5

Aug 20, 2025

0.6.4

Aug 20, 2025

0.6.3

Aug 19, 2025

0.6.2

Aug 12, 2025

0.6.0

Aug 12, 2025

0.5.0

Mar 24, 2025

0.4.0

Feb 15, 2025

0.3.0

Mar 17, 2024

0.2.0

Feb 23, 2024

0.1.0

Feb 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qalita_core-1.4.0.tar.gz (1.4 MB view details)

Uploaded Jan 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qalita_core-1.4.0-py3-none-any.whl (21.2 kB view details)

Uploaded Jan 14, 2026 Python 3

File details

Details for the file qalita_core-1.4.0.tar.gz.

File metadata

Download URL: qalita_core-1.4.0.tar.gz
Upload date: Jan 14, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qalita_core-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`7d2b629f259efb577b6430e8c0db378434f38efb544a54fed3ccbf2309f8ab34`
MD5	`0a78f00a3e8dbd4d8e23eda819c6c8f3`
BLAKE2b-256	`3aef39fcd8d1ebf2c36a2c5154ab1900fc8ff6fca97927adb06d5436397f7a88`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qalita_core-1.4.0.tar.gz:

Publisher: ci.yml on qalita-io/qalita-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qalita_core-1.4.0.tar.gz
- Subject digest: 7d2b629f259efb577b6430e8c0db378434f38efb544a54fed3ccbf2309f8ab34
- Sigstore transparency entry: 821976553
- Sigstore integration time: Jan 14, 2026
Source repository:
- Permalink: qalita-io/qalita-core@08ab167ab3098c725995c7953cbd1eec020670cb
- Branch / Tag: refs/tags/1.4.0
- Owner: https://github.com/qalita-io
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@08ab167ab3098c725995c7953cbd1eec020670cb
- Trigger Event: push

File details

Details for the file qalita_core-1.4.0-py3-none-any.whl.

File metadata

Download URL: qalita_core-1.4.0-py3-none-any.whl
Upload date: Jan 14, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qalita_core-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`883117623cb439b1492b4750c424fa9838f430b99ac5b5d3ead34a8df5e9001f`
MD5	`8d2f6c0161f9ce034a18cc854d2d13f5`
BLAKE2b-256	`56326a845c49c0d63bbf52a036509ba379bd88a9e65e41862a2418a4c59056e1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qalita_core-1.4.0-py3-none-any.whl:

Publisher: ci.yml on qalita-io/qalita-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qalita_core-1.4.0-py3-none-any.whl
- Subject digest: 883117623cb439b1492b4750c424fa9838f430b99ac5b5d3ead34a8df5e9001f
- Sigstore transparency entry: 821976609
- Sigstore integration time: Jan 14, 2026
Source repository:
- Permalink: qalita-io/qalita-core@08ab167ab3098c725995c7953cbd1eec020670cb
- Branch / Tag: refs/tags/1.4.0
- Owner: https://github.com/qalita-io
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@08ab167ab3098c725995c7953cbd1eec020670cb
- Trigger Event: push

qalita-core 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

QALITA Core

Key features

Supported sources

Installation

Quickstart

Use within a Pack

Parquet chunking and filenames

Safe Parquet writing for pandas

Aggregation helpers (for packs)

Development

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance