Skip to main content

A library for managing data contracts and quality control in behavioral datasets.

Project description

contraqctor

contraqctor CI PyPI - Version License ruff uv

contraqctor

A library for managing data contracts and quality control in behavioral datasets.

⚠️ Caution:
This repository is currently under active development and is subject to frequent changes. Features and APIs may evolve without prior notice.

Installing and Upgrading

If you choose to clone the repository, you can install the package by running the following command from the root directory of the repository:

pip install .

Otherwise, you can use pip:

pip install contraqctor

Getting started and API usage

The library provides two main functionalities: data contracts for standardized data loading and quality control tools for data validation.

Creating and Using Data Contracts

Data contracts provide a standard way to access and load data from various sources. Here's a simple example:

from pathlib import Path
from contraqctor.contract import Dataset, DataStreamCollection
from contraqctor.contract.csv import Csv
from contraqctor.contract.text import Text

# Define the dataset structure
dataset_root = Path("path/to/dataset")
my_dataset = Dataset(
    name="my_dataset",
    version="1.0.0",
    description="Example dataset",
    data_streams=[
        DataStreamCollection(
            name="Behavior",
            description="Behavior data",
            data_streams=[
                Csv(
                    "Position",
                    description="Animal position data",
                    reader_params=Csv.make_params(
                        path=dataset_root / "behavior/position.csv",
                    ),
                ),
                Text(
                    name="Log",
                    description="Session log file",
                    reader_params=Text.make_params(
                        path=dataset_root / "behavior/session.log",
                    ),
                ),
            ],
        ),
    ],
)

# Load a specific stream
position_data = my_dataset["Behavior"]["Position"].load().data
print(f"Position data shape: {position_data.shape}")

# Load all streams and handle errors
my_dataset.load_all()

Quality Control of Primary Data

The QC module helps validate your data to ensure it meets specific requirements:

import contraqctor.qc as qc

# Using the dataset created above
data_stream = my_dataset["Behavior"]["Position"]

# Create and run test suites
runner = qc.Runner()

# Add test suites for different data types
runner.add_suite(qc.csv.CsvTestSuite(data_stream))

# Or create your own custom test suite
class MyCustomTestSuite(qc.Suite):
    def __init__(self, data_stream):
        self.data_stream = data_stream
        
    def test_has_expected_columns(self):
        """Check if data has required columns."""
        expected_cols = {"timestamp", "x", "y", "speed"}
        if not expected_cols.issubset(self.data_stream.data.columns):
            missing = expected_cols - set(self.data_stream.data.columns)
            return self.fail_test(None, f"Missing columns: {missing}")
        return self.pass_test(None, "All required columns present")

runner.add_suite(MyCustomTestSuite(data_stream))

# Run all tests and display results
results = runner.run_all_with_progress()

For more detailed examples, please check the Examples folder.


Contributors

Contributions to this repository are welcome! However, please ensure that your code adheres to the recommended DevOps practices below:

Linting

We use ruff as our primary linting tool.

Testing

Attempt to add tests when new features are added. To run the currently available tests, run uv run pytest from the root of the repository.

Lock files

We use uv to manage our lock files and therefore encourage everyone to use uv as a package manager as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contraqctor-0.4.9.tar.gz (179.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contraqctor-0.4.9-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file contraqctor-0.4.9.tar.gz.

File metadata

  • Download URL: contraqctor-0.4.9.tar.gz
  • Upload date:
  • Size: 179.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.18

File hashes

Hashes for contraqctor-0.4.9.tar.gz
Algorithm Hash digest
SHA256 2ab5ba8973ed4f0dc275830601cfa3c760872073e58c11a4433615f090c99d9f
MD5 df5695de166253c964ec9364546791ea
BLAKE2b-256 ba9e457b8bf87aac1b71e311519ee07fb6d643237e5b3f265a688eded522069f

See more details on using hashes here.

File details

Details for the file contraqctor-0.4.9-py3-none-any.whl.

File metadata

File hashes

Hashes for contraqctor-0.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 56216c57f355d2866f1ea4a70d46320d66ad154fa13fa10a005b543ca1c47b9a
MD5 01f18a2bd1067975d0265045a62d10ef
BLAKE2b-256 25377d24feaec06e6562fa4574018a6fa1d6cdd262bc2654d939d341a37576c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page