Skip to main content

Data management and scoring tools for the M2C2 project

Project description

Mobile Monitoring of Cognitive Change (M2C2) Platform

📘 M2C2 DataKit (m2c2-datakit): Universal Loading, Assurance, and Scoring with LASSIE

This is the documentation for the M2C2 DataKit Python package 🐍, which is part of the M2C2 Platform. The M2C2 Platform is a comprehensive system designed to facilitate the collection, processing, and analysis of mobile cognitive data (aka, ambulatory cognitive assessments, cognitive activities, and brain games).

🚀 A set of R, Python, and NPM packages for scoring M2C2kit Data! 🚀

PyPI version

🔧 Installation

pip install m2c2-datakit
# or
pip3 install m2c2-datakit

### 🛠️ Setup for Developers of this Package
!make clean
!make dev-install

---

Developers: 
- [Dr. Nelson Roque](https://www.linkedin.com/in/nelsonroque/) | ORCID: https://orcid.org/0000-0003-1184-202X
- [Dr. Scott Yabiku](https://www.linkedin.com/in/scottyabiku) | ORCID: [Coming soon!]

---

## Changelog

[Source: https://github.com/m2c2-project/datakit](https://github.com/m2c2-project/datakit)

See [CHANGELOG.md](CHANGELOG.md)

---

## 🎯 Purpose

Enable researchers to plug in data from varied sources (e.g., MongoDB, UAS, MetricWire, CSV bundles) and apply a consistent pipeline for:

- Input validation

- Scoring via predefined rules

- Inspection and summarization

- Tidy export and codebook generation

---

## 🧠 L.A.S.S.I.E. Pipeline Summary

| Step | Method           | Purpose                                                                 |
|------|------------------|-------------------------------------------------------------------------|
| L    | `LASSIE.load()`         | Load raw data from a supported source (e.g., MongoDB, UAS, MetricWire). |
| A    | `LASSIE.assure()`       | Validate that required columns exist before processing.                 |
| S    | `LASSIE.score()`        | Apply scoring logic based on predefined or custom rules.                |
| S    | `LASSIE.summarize()`    | Aggregate scored data by participant, session, or custom groups.        |
| I    | `LASSIE.inspect()`      | Visualize distributions or pairwise plots for quality checks.           |
| E    | `LASSIE.export()`       | Save scored and summarized data to tidy files and optionally metadata.  |

---

## 🔌 Supported Sources

| Source Type   | Loader Class          | Key Arguments                            | Notes                                 |
|---------------|------------------------|-------------------------------------------|----------------------------------------|
| `mongodb`     | `MongoDBImporter`      | `source_path` (JSON)                      | Expects flat or nested JSON documents. |
| `uas`         | `UASImporter`          | `source_path` (URL)                       | Parses newline-delimited JSON.         |
| `metricwire`  | `MetricWireImporter`   | `source_path` (glob pattern or default)   | Processes JSON files from unzipped export. |
| `multicsv`    | `MultiCSVImporter`     | `source_map` (dict of CSV paths)          | Each activity type is its own file.    |

---

## 🧪 Example: Full Pipeline

### MetricWire
```python
mw = m2c2.core.pipeline.LASSIE().load(source_name="metricwire", source_path="data/metricwire/unzipped/*/*/*.json")
mw.assure(required_columns=m2c2.core.config.settings.STANDARD_GROUPING_FOR_AGGREGATION_METRICWIRE)
mw_scored = mw.score()
mw.inspect()
mw.export(file_basename="metricwire", directory="tidy/metricwire_scored")
mw.export_codebook(filename="codebook_metricwire.md", directory="tidy/metricwire_scored")

-----------------------------------------------------------------------------------------------------

MongoDB

mdb = m2c2.core.pipeline.LASSIE().load(source_name="mongodb", source_path="data/production-mongo-export/data_exported_120424_1010am.json")
mdb.assure(required_columns=m2c2.core.config.settings.STANDARD_GROUPING_FOR_AGGREGATION)
mdb.score()
mdb.inspect()
mdb.export(file_basename="mongodb_export", directory="tidy/mongodb_scored")
mdb.export_codebook(filename="codebook_mongo.md", directory="tidy/mongodb_scored")

-----------------------------------------------------------------------------------------------------

Understanding American Study (UAS) Datasets

uas = m2c2.core.pipeline.LASSIE().load(source_name="UAS", source_path= "https://uas.usc.edu/survey/uas/m2c2_ess/admin/export_m2c2.php?k=<INSERT KEY HERE>")
uas.assure(required_columns=m2c2.core.config.settings.STANDARD_GROUPING_FOR_AGGREGATION)
uas.score()
uas.inspect()
uas.export(file_basename="uas_export", directory="tidy/uas_scored")
uas.export_codebook(filename="codebook_uas.md", directory="tidy/uas_scored")

-----------------------------------------------------------------------------------------------------

MultiCSV

source_map = {
    "Symbol Search": "data/reboot/m2c2kit_manualmerge_symbol_search_all_ts-20250402_151939.csv",
    "Grid Memory": "data/reboot/m2c2kit_manualmerge_grid_memory_all_ts-20250402_151940.csv"
}

mcsv = m2c2.core.pipeline.LASSIE().load(source_name="multicsv", source_map=source_map)
mcsv.assure(required_columns=m2c2.core.config.settings.STANDARD_GROUPING_FOR_AGGREGATION)
mcsv.score()
uas.inspect()
mcsv.export(file_basename="uas_export", directory="tidy/uas_scored")
mcsv.export_codebook(filename="codebook_uas.md", directory="tidy/uas_scored")

💡 Contributions Welcome!

📌 Have ideas? Found a bug? Want to improve the package? Open an issue!.

📜 Code of Conduct - Please be respectful and follow community guidelines.


Acknowledgements

The development of m2c2-datakit was made possible with support from NIA (1U2CAG060408-01).


🌎 More Resources:

📌 M2C2 Official Website

📌 M2C2kit Official Documentation Website

📌 Pushing to PyPI

📌 What is JSON?


What is What? 🧠 Summary

Thing Type Description
m2c2_datakit Library/Package Top-level Python package
core/, loaders/, tasks/ Subpackages Contain logically grouped modules
log.py, export.py, etc. Modules Individual Python files
__init__.py Special Module Marks the directory as a package

🎬 Inspired by:

Inspiration for Package, Lassie Movie

🚀 Let's go study some brains!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m2c2_datakit-0.1.66.tar.gz (61.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

m2c2_datakit-0.1.66-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file m2c2_datakit-0.1.66.tar.gz.

File metadata

  • Download URL: m2c2_datakit-0.1.66.tar.gz
  • Upload date:
  • Size: 61.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for m2c2_datakit-0.1.66.tar.gz
Algorithm Hash digest
SHA256 37323c6acc02381c2e4c1ca0eff350a25d2d326a88be70a76cece2dbc590bda4
MD5 1391480520619b4801eddccc9ff89339
BLAKE2b-256 5ece9ef838bc468ace78dcbc2e38e3c3c0b511cd82b88f772419a87e461e0bfb

See more details on using hashes here.

File details

Details for the file m2c2_datakit-0.1.66-py3-none-any.whl.

File metadata

  • Download URL: m2c2_datakit-0.1.66-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for m2c2_datakit-0.1.66-py3-none-any.whl
Algorithm Hash digest
SHA256 319e354d88eb1558e4045d23da0d14a018f1e54fc5b0b7173f566113c02a147f
MD5 62dbbbd52c3eaf489073a6f1f30f8d74
BLAKE2b-256 6e99cc5d41beb5890fd3838d14f558f72f463d03164686b71c468b513387d967

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page