Skip to main content

A library for managing data contracts and quality control in behavioral datasets.

Project description

contraqctor

contraqctor CI PyPI - Version License ruff uv

contraqctor

A library for managing data contracts and quality control in behavioral datasets.

⚠️ Caution:
This repository is currently under active development and is subject to frequent changes. Features and APIs may evolve without prior notice.

Installing and Upgrading

If you choose to clone the repository, you can install the package by running the following command from the root directory of the repository:

pip install .

Otherwise, you can use pip:

pip install contraqctor

Getting started and API usage

The library provides two main functionalities: data contracts for standardized data loading and quality control tools for data validation.

Creating and Using Data Contracts

Data contracts provide a standard way to access and load data from various sources. Here's a simple example:

from pathlib import Path
from contraqctor.contract import Dataset, DataStreamCollection
from contraqctor.contract.csv import Csv
from contraqctor.contract.text import Text

# Define the dataset structure
dataset_root = Path("path/to/dataset")
my_dataset = Dataset(
    name="my_dataset",
    version="1.0.0",
    description="Example dataset",
    data_streams=[
        DataStreamCollection(
            name="Behavior",
            description="Behavior data",
            data_streams=[
                Csv(
                    "Position",
                    description="Animal position data",
                    reader_params=Csv.make_params(
                        path=dataset_root / "behavior/position.csv",
                    ),
                ),
                Text(
                    name="Log",
                    description="Session log file",
                    reader_params=Text.make_params(
                        path=dataset_root / "behavior/session.log",
                    ),
                ),
            ],
        ),
    ],
)

# Load a specific stream
position_data = my_dataset["Behavior"]["Position"].load().data
print(f"Position data shape: {position_data.shape}")

# Load all streams and handle errors
my_dataset.load_all()

Quality Control of Primary Data

The QC module helps validate your data to ensure it meets specific requirements:

import contraqctor.qc as qc

# Using the dataset created above
data_stream = my_dataset["Behavior"]["Position"]

# Create and run test suites
runner = qc.Runner()

# Add test suites for different data types
runner.add_suite(qc.csv.CsvTestSuite(data_stream))

# Or create your own custom test suite
class MyCustomTestSuite(qc.Suite):
    def __init__(self, data_stream):
        self.data_stream = data_stream
        
    def test_has_expected_columns(self):
        """Check if data has required columns."""
        expected_cols = {"timestamp", "x", "y", "speed"}
        if not expected_cols.issubset(self.data_stream.data.columns):
            missing = expected_cols - set(self.data_stream.data.columns)
            return self.fail_test(None, f"Missing columns: {missing}")
        return self.pass_test(None, "All required columns present")

runner.add_suite(MyCustomTestSuite(data_stream))

# Run all tests and display results
results = runner.run_all_with_progress()

For more detailed examples, please check the Examples folder.


Contributors

Contributions to this repository are welcome! However, please ensure that your code adheres to the recommended DevOps practices below:

Linting

We use ruff as our primary linting tool.

Testing

Attempt to add tests when new features are added. To run the currently available tests, run uv run pytest from the root of the repository.

Lock files

We use uv to manage our lock files and therefore encourage everyone to use uv as a package manager as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contraqctor-0.6.0.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contraqctor-0.6.0-py3-none-any.whl (72.0 kB view details)

Uploaded Python 3

File details

Details for the file contraqctor-0.6.0.tar.gz.

File metadata

  • Download URL: contraqctor-0.6.0.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contraqctor-0.6.0.tar.gz
Algorithm Hash digest
SHA256 a33b3337a687f18869bcfcafd5782d78e512d32ed2b0c42fd37df221f4d642a0
MD5 58428beb96584cd87c3336826bb2bc3a
BLAKE2b-256 e5b360d189079b7f2f022d3eb5fc6ee2179b8c91e28a39e0d4b0e01dcb0effb6

See more details on using hashes here.

File details

Details for the file contraqctor-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: contraqctor-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contraqctor-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b62d21227487219a5745189b109002d548f3cb936df3698b278cdac4675d9cd4
MD5 0921a9d0ef0ceeafce1d512e696d9831
BLAKE2b-256 e209b733cd351469c8bd2fb2b5679d58132e68463f04193697986f7f731929f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page