Skip to main content

A library for managing data contracts and quality control in behavioral datasets.

Project description

contraqctor

contraqctor CI PyPI - Version License ruff uv

contraqctor

A library for managing data contracts and quality control in behavioral datasets.

⚠️ Caution:
This repository is currently under active development and is subject to frequent changes. Features and APIs may evolve without prior notice.

Installing and Upgrading

If you choose to clone the repository, you can install the package by running the following command from the root directory of the repository:

pip install .

Otherwise, you can use pip:

pip install contraqctor

Getting started and API usage

The library provides two main functionalities: data contracts for standardized data loading and quality control tools for data validation.

Creating and Using Data Contracts

Data contracts provide a standard way to access and load data from various sources. Here's a simple example:

from pathlib import Path
from contraqctor.contract import Dataset, DataStreamCollection
from contraqctor.contract.csv import Csv
from contraqctor.contract.text import Text

# Define the dataset structure
dataset_root = Path("path/to/dataset")
my_dataset = Dataset(
    name="my_dataset",
    version="1.0.0",
    description="Example dataset",
    data_streams=[
        DataStreamCollection(
            name="Behavior",
            description="Behavior data",
            data_streams=[
                Csv(
                    "Position",
                    description="Animal position data",
                    reader_params=Csv.make_params(
                        path=dataset_root / "behavior/position.csv",
                    ),
                ),
                Text(
                    name="Log",
                    description="Session log file",
                    reader_params=Text.make_params(
                        path=dataset_root / "behavior/session.log",
                    ),
                ),
            ],
        ),
    ],
)

# Load a specific stream
position_data = my_dataset["Behavior"]["Position"].load().data
print(f"Position data shape: {position_data.shape}")

# Load all streams and handle errors
my_dataset.load_all()

Quality Control of Primary Data

The QC module helps validate your data to ensure it meets specific requirements:

import contraqctor.qc as qc

# Using the dataset created above
data_stream = my_dataset["Behavior"]["Position"]

# Create and run test suites
runner = qc.Runner()

# Add test suites for different data types
runner.add_suite(qc.csv.CsvTestSuite(data_stream))

# Or create your own custom test suite
class MyCustomTestSuite(qc.Suite):
    def __init__(self, data_stream):
        self.data_stream = data_stream
        
    def test_has_expected_columns(self):
        """Check if data has required columns."""
        expected_cols = {"timestamp", "x", "y", "speed"}
        if not expected_cols.issubset(self.data_stream.data.columns):
            missing = expected_cols - set(self.data_stream.data.columns)
            return self.fail_test(None, f"Missing columns: {missing}")
        return self.pass_test(None, "All required columns present")

runner.add_suite(MyCustomTestSuite(data_stream))

# Run all tests and display results
results = runner.run_all_with_progress()

For more detailed examples, please check the Examples folder.


Contributors

Contributions to this repository are welcome! However, please ensure that your code adheres to the recommended DevOps practices below:

Linting

We use ruff as our primary linting tool.

Testing

Attempt to add tests when new features are added. To run the currently available tests, run uv run pytest from the root of the repository.

Lock files

We use uv to manage our lock files and therefore encourage everyone to use uv as a package manager as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contraqctor-0.5.7.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contraqctor-0.5.7-py3-none-any.whl (69.2 kB view details)

Uploaded Python 3

File details

Details for the file contraqctor-0.5.7.tar.gz.

File metadata

  • Download URL: contraqctor-0.5.7.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contraqctor-0.5.7.tar.gz
Algorithm Hash digest
SHA256 04c7540fb6d7f8c4f20d14e8e0f301cd80cff218254e1692b4bcfd8a13a48a7b
MD5 85e05a176df284d0836dea4f16419797
BLAKE2b-256 9fdb462e9992cf98faf9b5a7eb8ff9906d7812266df377a0d4790c968610fd40

See more details on using hashes here.

File details

Details for the file contraqctor-0.5.7-py3-none-any.whl.

File metadata

  • Download URL: contraqctor-0.5.7-py3-none-any.whl
  • Upload date:
  • Size: 69.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contraqctor-0.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 78b40003a79f1bf8071f89af1c970e856ea2ad10d93baffc8e3a5cd078a2cc62
MD5 3c0a369536a1de4538dcf47e25b1c862
BLAKE2b-256 daca7037a0bcbf4431fcce48f6b1033df0c8172e047c35531c0b78438aecb2ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page