Skip to main content

A simple, intuitive, pandas-based database.

Project description

datanest

src PyPI - Version Documentation Status GitHub license

A simple, intuitive, pandas-based database.

Perfect for handling data such as time series, images, or any Python objects alongside their metadata. This tool encapsulates a pandas DataFrame containing metadata and Python objects. It provides an intuitive data and metadata retrieval syntax through keyword-arguments.


Installation

pip install datanest

Usage

datanest.Database is the core class that wraps a pandas.DataFrame object. Even before adding any data fields using the add_data_field method, the database can already be used to query rows from the encapsulated DataFrame with an intuitive keyword argument syntax.

import datanest

# Load example DataFrame with columns: 
# participant_id (int), age (float), surgery_performed (bool), notes (str)
db = datanest.get_example_database()

# Retrieve all metadata
db()

# Retrieve metadata for participant 3
db(participant_id=3)

# Retrieve metadata for participants aged 50 to 60 who have not had surgery
db(age_lim=(50, 60), surgery_performed=True)

# Retrieve metadata for participants where the notes string contains the word interesting
db(notes_has='interesting')

The add_data_field method can be used to add arbitrary python objects to the database, and we can retrieve relevant data entries using the same keyword argument syntax.

# Add heart rate data to the database, indexed by participant_id
db.add_data_field('heart_rate', datanest.get_example_data(), 'participant_id')

# Retrieve all heart rate time series data
db.heart_rate()

# Retrieve heart rate time series data for participant 3
db.heart_rate(participant_id=3)

# Retrieve heart rate time series for participants aged 50 to 60
db.heart_rate(age_lim=(50, 60))

# Retrieve heart rate time series for participants where the notes string contains the word interesting
db.heart_rate(notes_has='interesting')

Payload types — anything goes

The value side of add_data_field is unconstrained: datanest does not inspect the objects you attach, only the keys that index them. So a payload dict can hold NumPy arrays, images, custom dataclasses, pysampled.Data signals — anything that makes sense for your project.

import numpy as np

db = datanest.get_example_database()
db.add_data_field(
    "eeg",
    {pid: np.random.randn(1000) for pid in db()["participant_id"]},
    "participant_id",
)
db.eeg(age_lim=(50, 60))   # {participant_id: ndarray} for the matching rows

In the lab where datanest originated, time-series payloads are typically pysampled.Data objects, but this is a use convention, not a hard dependency — datanest does not import pysampled and works with whatever payload type you choose.

Hierarchical data: DatabaseContainer

When metadata lives at multiple levels (e.g. subjecttrialaction), wrap a set of Database instances in a DatabaseContainer. Each level is added with a key-derivation function that maps a child id to its parent id. The container provides the same keyword-argument query syntax as Database, resolving the right level automatically.

import datanest

dbc = datanest.DatabaseContainer()
dbc.add("subject", subject_db)
dbc.add("trial", trial_db, "subject", lambda trial_id: trial_id[:2])
dbc.add("action", action_db, "trial", lambda action_id: action_id[:3])

# Query at any level — subject metadata filters trials and actions too
dbc(subject=3)                       # all subject-3 trials/actions
dbc(action_phase='extension')        # subset of action rows
dbc.heart_rate(age_lim=(50, 60))     # data field added at any level

DatabaseContainer uses the same _lim / _has / _any suffix conventions as Database. Add data fields to the child databases directly (trial_db.add_data_field(...)); the container makes them queryable at any level via dbc.<field>(...).

Caching expensive computations

datanest.cache_me_if_you_can and cache_me_if_you_can_incremental are dill-backed file-cache decorators. Use them to skip recomputation when loading or summarizing data fields is expensive. Both accept an optional suffix callable that receives the wrapped function's (*args, **kwargs) and returns a string inserted between the cache file's stem and extension — handy for one cache file per input.

from datanest import cache_me_if_you_can, cache_me_if_you_can_incremental

# One cache file per subject (e.g. heart_rate_s03.pkl)
@cache_me_if_you_can("heart_rate.pkl", suffix=lambda subject_id: f"_s{subject_id:02d}")
def load_heart_rate(subject_id):
    return expensive_load(subject_id)            # runs once per subject_id

# Build up a {trial_id: metrics} dict across many calls, persisting on disk
@cache_me_if_you_can_incremental("trial_metrics.pkl", return_name="ret", return_default={})
def summarize_trial(trial_id, ret=None):
    if trial_id not in ret:
        ret[trial_id] = compute_metrics(trial_id)
    return ret

License

datanest is distributed under the terms of the MIT license.

Contact

Praneeth Namburi

Project Link: https://github.com/praneethnamburi/datanest

Acknowledgments

This tool was developed as part of the ImmersionToolbox initiative at the MIT.nano Immersion Lab. Thanks to NCSOFT for supporting this initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datanest-1.2.0.tar.gz (97.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datanest-1.2.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file datanest-1.2.0.tar.gz.

File metadata

  • Download URL: datanest-1.2.0.tar.gz
  • Upload date:
  • Size: 97.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datanest-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5f42008b0726a9984021a2ba08dc8283e7235f980b0e09603858bac4a565dec8
MD5 f4cfa23b46fe2c784fe0c5381bf76bfb
BLAKE2b-256 1d49f7fdf935b2b45f10b9f8633b6189f1c5d8fd7abf101ccc3b58b1e0af57b4

See more details on using hashes here.

File details

Details for the file datanest-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: datanest-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for datanest-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1361317a1c64faf1fdc3fc01d765be51942c3ac5f268df9fc56d5c57cc64862a
MD5 8a9cd9fceeebace95e06cc6bc498054b
BLAKE2b-256 6ea389d6349dfb2ca7ddeadf1a2e6bae1cf039c7d0a8e68623b9b830f48a60a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page