Skip to main content

A library for managing data contracts and quality control in behavioral datasets.

Project description

contraqctor

contraqctor CI PyPI - Version License ruff uv

contraqctor

A library for managing data contracts and quality control in behavioral datasets.

⚠️ Caution:
This repository is currently under active development and is subject to frequent changes. Features and APIs may evolve without prior notice.

Installing and Upgrading

If you choose to clone the repository, you can install the package by running the following command from the root directory of the repository:

pip install .

Otherwise, you can use pip:

pip install contraqctor

Getting started and API usage

The library provides two main functionalities: data contracts for standardized data loading and quality control tools for data validation.

Creating and Using Data Contracts

Data contracts provide a standard way to access and load data from various sources. Here's a simple example:

from pathlib import Path
from contraqctor.contract import Dataset, DataStreamCollection
from contraqctor.contract.csv import Csv
from contraqctor.contract.text import Text

# Define the dataset structure
dataset_root = Path("path/to/dataset")
my_dataset = Dataset(
    name="my_dataset",
    version="1.0.0",
    description="Example dataset",
    data_streams=[
        DataStreamCollection(
            name="Behavior",
            description="Behavior data",
            data_streams=[
                Csv(
                    "Position",
                    description="Animal position data",
                    reader_params=Csv.make_params(
                        path=dataset_root / "behavior/position.csv",
                    ),
                ),
                Text(
                    name="Log",
                    description="Session log file",
                    reader_params=Text.make_params(
                        path=dataset_root / "behavior/session.log",
                    ),
                ),
            ],
        ),
    ],
)

# Load a specific stream
position_data = my_dataset["Behavior"]["Position"].load().data
print(f"Position data shape: {position_data.shape}")

# Load all streams and handle errors
my_dataset.load_all()

Quality Control of Primary Data

The QC module helps validate your data to ensure it meets specific requirements:

import contraqctor.qc as qc

# Using the dataset created above
data_stream = my_dataset["Behavior"]["Position"]

# Create and run test suites
runner = qc.Runner()

# Add test suites for different data types
runner.add_suite(qc.csv.CsvTestSuite(data_stream))

# Or create your own custom test suite
class MyCustomTestSuite(qc.Suite):
    def __init__(self, data_stream):
        self.data_stream = data_stream
        
    def test_has_expected_columns(self):
        """Check if data has required columns."""
        expected_cols = {"timestamp", "x", "y", "speed"}
        if not expected_cols.issubset(self.data_stream.data.columns):
            missing = expected_cols - set(self.data_stream.data.columns)
            return self.fail_test(None, f"Missing columns: {missing}")
        return self.pass_test(None, "All required columns present")

runner.add_suite(MyCustomTestSuite(data_stream))

# Run all tests and display results
results = runner.run_all_with_progress()

For more detailed examples, please check the Examples folder.


Contributors

Contributions to this repository are welcome! However, please ensure that your code adheres to the recommended DevOps practices below:

Linting

We use ruff as our primary linting tool.

Testing

Attempt to add tests when new features are added. To run the currently available tests, run uv run pytest from the root of the repository.

Lock files

We use uv to manage our lock files and therefore encourage everyone to use uv as a package manager as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contraqctor-0.4.7.tar.gz (173.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contraqctor-0.4.7-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file contraqctor-0.4.7.tar.gz.

File metadata

  • Download URL: contraqctor-0.4.7.tar.gz
  • Upload date:
  • Size: 173.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for contraqctor-0.4.7.tar.gz
Algorithm Hash digest
SHA256 7867be507026cba8d88ad55f2287c56b73ca638352934f29070ca4d83ea4108c
MD5 efa55b508f47d9ed9678127190ba7f51
BLAKE2b-256 17baaa09e2a6d816531abdfd1d345618f60f51ba68f43b9e1469d8fdfb90fddf

See more details on using hashes here.

File details

Details for the file contraqctor-0.4.7-py3-none-any.whl.

File metadata

File hashes

Hashes for contraqctor-0.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 bcb4282fc0e919e82865020b45c256c7cd02d8625698853ba3b5b7d4cadb9aa6
MD5 cfc318b04df1e5d98fd24f07855b9dab
BLAKE2b-256 bf883ca556f3c2f9c887f54898a32869543e51298ce324ea0cd0c44aa02b217d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page