Skip to main content

Data management and scoring tools for the M2C2 project

Project description

Mobile Monitoring of Cognitive Change (M2C2) Platform

📘 M2C2 DataKit (m2datakit): Universal Loading, Assurance, and Scoring

This is the documentation for the M2C2 DataKit Python package 🐍, which is part of the M2C2 Platform. The M2C2 Platform is a comprehensive system designed to facilitate the collection, processing, and analysis of mobile cognitive data (aka, ambulatory cognitive assessments, cognitive activities, and brain games).

🚀 A set of R, Python, and NPM packages for scoring M2C2kit Data! 🚀

PyPI version

Documentation

See here for documentation

Notebooks

Notebooks are included in the source distribution (sdist). To access them:

pip download --no-binary :all: m2datakit
# unpack the .tar.gz, then open notebooks/

🔧 Installation

pip install m2datakit

Optional: Jupyter kernel (if using notebooks)

python -m ipykernel install --user --name="m2datakit" --display-name "Python (m2datakit)"

🛠️ Setup for Developers of this Package

# Create a local venv (uv) and install the package in editable mode
uv venv
uv pip install -e .

# (optional) install dev tools you use (isort/ruff/flake8, etc.)

Developers


Changelog

Source: https://github.com/m2c2-project/datakit

See CHANGELOG.md


🗺️ Roadmap

  • Clear API docs (MkDocs + mkdocstrings)
  • Reliable error handling (no silent exceptions, explicit strict/lenient modes)
  • Robust logging policy (opt-in, no import-time side effects)
  • Typed source configs + schema validation
  • CI + golden-data tests per task
  • Performance optimization (vectorization / fewer copies)

🎯 Purpose

Enable researchers to plug in data from varied sources (e.g., MongoDB, UAS, MetricWire, CSV bundles) and apply a consistent pipeline for:

  • Input validation
  • Scoring via predefined rules
  • Inspection and summarization
  • Tidy export and codebook generation

🧠 L.A.S.S.I.E. Pipeline Summary

Step Method Purpose
L LASSIE.load() Load raw data from a supported source (e.g., MongoDB, UAS, MetricWire).
A LASSIE.assure() Validate that required columns exist before processing.
S LASSIE.score() Apply scoring logic based on predefined or custom rules.
S LASSIE.summarize() Aggregate scored data by participant, session, or custom groups.
I LASSIE.inspect() Visualize distributions or pairwise plots for quality checks.
E LASSIE.export() Save scored and summarized data to tidy files and optionally metadata.

🔌 Supported Sources

You may have used M2C2kit tasks via our various integrations, including the ones listed below. Each integration has its own loader class, which is responsible for reading the data and converting it into a format that can be processed by the m2datakit package. Keep in mind that you are responsible for ensuring that the data is in the correct format for each loader class.

In the future we anticipate creating loaders for downloading data via API.

Source Type Loader Class Key Arguments Notes
mongodb MongoDBImporter source_path (URL, to JSON) Expects flat or nested JSON documents.
multicsv MultiCSVImporter source_map (dict of CSV paths) Each activity type is its own file.
metricwire MetricWireImporter source_path (glob pattern or default) Processes JSON files from unzipped export.
qualtrics QualtricsImporter source_path (URL to CSV) Each activity's trial saves data to a new column.

🧪 Example: Full Pipeline

For a full pipeline, go to our datakit-notebooks repo


💡 Contributions Welcome!

📌 Have ideas? Found a bug? Want to improve the package? Open an issue!.

📜 Code of Conduct - Please be respectful and follow community guidelines.


Acknowledgements

The development of m2datakit was made possible with support from NIA (1U2CAG060408-01).


🌎 More Resources:

📌 M2C2 Official Website

📌 M2C2kit Official Documentation Website

📌 Pushing to PyPI

📌 What is JSON?


What is What? 🧠 Summary

Thing Type Description
m2datakit Library/Package Top-level Python package
core/, loaders/, tasks/ Subpackages Contain logically grouped modules
log.py, export.py, etc. Modules Individual Python files
__init__.py Special Module Marks the directory as a package

🎬 Inspired by:

Inspiration for Package, Lassie Movie

🚀 Let's go study some brains!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m2datakit-0.1.97.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

m2datakit-0.1.97-py3-none-any.whl (79.9 kB view details)

Uploaded Python 3

File details

Details for the file m2datakit-0.1.97.tar.gz.

File metadata

  • Download URL: m2datakit-0.1.97.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m2datakit-0.1.97.tar.gz
Algorithm Hash digest
SHA256 5dc7fd94e8e2b0f60c171c28ca7b5e559725c968859fe75eef9224e44205133a
MD5 ae67492ba34159f045d21d3828fc32a4
BLAKE2b-256 81799f276a98e2275cd7e1d4cbcdeac41b2f683fe8ac1b135cb401266fbfd67c

See more details on using hashes here.

File details

Details for the file m2datakit-0.1.97-py3-none-any.whl.

File metadata

  • Download URL: m2datakit-0.1.97-py3-none-any.whl
  • Upload date:
  • Size: 79.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m2datakit-0.1.97-py3-none-any.whl
Algorithm Hash digest
SHA256 43751ebbd4146ff09d475e78e40e2e47533dd75e44f8f31bad89c98961f97048
MD5 ff1bd029738f978f463220b0155b68fc
BLAKE2b-256 eb8ef490dd2e3d63793d9aa109a5ab50d9c468e2bbf10f954822ad437cab3508

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page