Skip to main content

A simple, intuitive, pandas-based database.

Project description

datanest

src PyPI - Version Documentation Status GitHub license

A simple, intuitive, pandas-based database.

Perfect for handling data such as time series, images, or any Python objects alongside their metadata. This tool encapsulates a pandas DataFrame containing metadata and Python objects. It provides an intuitive data and metadata retrieval syntax through keyword-arguments.


Installation

pip install datanest

Usage

datanest.Database is the core class that wraps a pandas.DataFrame object. Even before adding any data fields using the add_data_field method, the database can already be used to query rows from the encapsulated DataFrame with an intuitive keyword argument syntax.

import datanest

# Load example DataFrame with columns: 
# participant_id (int), age (float), surgery_performed (bool), notes (str)
db = datanest.get_example_database()

# Retrieve all metadata
db()

# Retrieve metadata for participant 3
db(participant_id=3)

# Retrieve metadata for participants aged 50 to 60 who have not had surgery
db(age_lim=(50, 60), surgery_performed=True)

# Retrieve metadata for participants where the notes string contains the word interesting
db(notes_has='interesting')

The add_data_field method can be used to add arbitrary python objects to the database, and we can retrieve relevant data entries using the same keyword argument syntax.

# Add heart rate data to the database, indexed by participant_id
db.add_data_field('heart_rate', datanest.get_example_data(), 'participant_id')

# Retrieve all heart rate time series data
db.heart_rate()

# Retrieve heart rate time series data for participant 3
db.heart_rate(participant_id=3)

# Retrieve heart rate time series for participants aged 50 to 60
db.heart_rate(age_lim=(50, 60))

# Retrieve heart rate time series for participants where the notes string contains the word interesting
db.heart_rate(notes_has='interesting')

Payload types — anything goes

The value side of add_data_field is unconstrained: datanest does not inspect the objects you attach, only the keys that index them. So a payload dict can hold NumPy arrays, images, custom dataclasses, pysampled.Data signals — anything that makes sense for your project.

import numpy as np

db = datanest.get_example_database()
db.add_data_field(
    "eeg",
    {pid: np.random.randn(1000) for pid in db()["participant_id"]},
    "participant_id",
)
db.eeg(age_lim=(50, 60))   # {participant_id: ndarray} for the matching rows

In the lab where datanest originated, time-series payloads are typically pysampled.Data objects, but this is a use convention, not a hard dependency — datanest does not import pysampled and works with whatever payload type you choose.

Hierarchical data: DatabaseContainer

When metadata lives at multiple levels (e.g. subjecttrialaction), wrap a set of Database instances in a DatabaseContainer. Each level is added with a key-derivation function that maps a child id to its parent id. The container provides the same keyword-argument query syntax as Database, resolving the right level automatically.

import datanest

dbc = datanest.DatabaseContainer()
dbc.add("subject", subject_db)
dbc.add("trial", trial_db, "subject", lambda trial_id: trial_id[:2])
dbc.add("action", action_db, "trial", lambda action_id: action_id[:3])

# Query at any level — subject metadata filters trials and actions too
dbc(subject=3)                       # all subject-3 trials/actions
dbc(action_phase='extension')        # subset of action rows
dbc.heart_rate(age_lim=(50, 60))     # data field added at any level

DatabaseContainer uses the same _lim / _has / _any suffix conventions as Database. Add data fields to the child databases directly (trial_db.add_data_field(...)); the container makes them queryable at any level via dbc.<field>(...).

Caching expensive computations

datanest.cache_me_if_you_can and cache_me_if_you_can_incremental are dill-backed file-cache decorators. Use them to skip recomputation when loading or summarizing data fields is expensive. Both accept an optional suffix callable that receives the wrapped function's (*args, **kwargs) and returns a string inserted between the cache file's stem and extension — handy for one cache file per input.

from datanest import cache_me_if_you_can, cache_me_if_you_can_incremental

# One cache file per subject (e.g. heart_rate_s03.pkl)
@cache_me_if_you_can("heart_rate.pkl", suffix=lambda subject_id: f"_s{subject_id:02d}")
def load_heart_rate(subject_id):
    return expensive_load(subject_id)            # runs once per subject_id

# Build up a {trial_id: metrics} dict across many calls, persisting on disk
@cache_me_if_you_can_incremental("trial_metrics.pkl", return_name="ret", return_default={})
def summarize_trial(trial_id, ret=None):
    if trial_id not in ret:
        ret[trial_id] = compute_metrics(trial_id)
    return ret

License

datanest is distributed under the terms of the MIT license.

Contact

Praneeth Namburi

Project Link: https://github.com/praneethnamburi/datanest

Acknowledgments

This tool was developed as part of the ImmersionToolbox initiative at the MIT.nano Immersion Lab. Thanks to NCSOFT for supporting this initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datanest-1.1.0.tar.gz (94.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datanest-1.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file datanest-1.1.0.tar.gz.

File metadata

  • Download URL: datanest-1.1.0.tar.gz
  • Upload date:
  • Size: 94.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datanest-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0ef88c89f13520a1f980d8d898a11daced0b4cee01b45bdfe158f20fe461217e
MD5 da076121084dcdcd3c96f6c2473e47b1
BLAKE2b-256 c926a20444e9cfbeebd7503b91b43c08551dd338cc9ef0127b14de5291ac8515

See more details on using hashes here.

File details

Details for the file datanest-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: datanest-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datanest-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4140abc5e08d06495b15fd01d3e7e99661666cdf3eca6e23f17962736fbd5408
MD5 e8742c007d2e17db88ebd48ebf066797
BLAKE2b-256 0b888439de245187a99c87c5bc6ef6bb1c29d031bf9b34b80a91ba05c742a696

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page