A simple, intuitive, pandas-based database.
Project description
datanest
A simple, intuitive, pandas-based database.
Perfect for handling data such as time series, images, or any Python objects alongside their metadata. This tool encapsulates a pandas DataFrame containing metadata and Python objects. It provides an intuitive data and metadata retrieval syntax through keyword-arguments.
Installation
pip install datanest
Usage
datanest.Database is the core class that wraps a pandas.DataFrame object. Even before adding any data fields using the add_data_field method, the database can already be used to query rows from the encapsulated DataFrame with an intuitive keyword argument syntax.
import datanest
# Load example DataFrame with columns:
# participant_id (int), age (float), surgery_performed (bool), notes (str)
db = datanest.get_example_database()
# Retrieve all metadata
db()
# Retrieve metadata for participant 3
db(participant_id=3)
# Retrieve metadata for participants aged 50 to 60 who have not had surgery
db(age_lim=(50, 60), surgery_performed=True)
# Retrieve metadata for participants where the notes string contains the word interesting
db(notes_has='interesting')
The add_data_field method can be used to add arbitrary python objects to the database, and we can retrieve relevant data entries using the same keyword argument syntax.
# Add heart rate data to the database, indexed by participant_id
db.add_data_field('heart_rate', datanest.get_example_data(), 'participant_id')
# Retrieve all heart rate time series data
db.heart_rate()
# Retrieve heart rate time series data for participant 3
db.heart_rate(participant_id=3)
# Retrieve heart rate time series for participants aged 50 to 60
db.heart_rate(age_lim=(50, 60))
# Retrieve heart rate time series for participants where the notes string contains the word interesting
db.heart_rate(notes_has='interesting')
Payload types — anything goes
The value side of add_data_field is unconstrained: datanest does not
inspect the objects you attach, only the keys that index them. So a
payload dict can hold NumPy arrays, images, custom dataclasses,
pysampled.Data signals — anything that makes sense for
your project.
import numpy as np
db = datanest.get_example_database()
db.add_data_field(
"eeg",
{pid: np.random.randn(1000) for pid in db()["participant_id"]},
"participant_id",
)
db.eeg(age_lim=(50, 60)) # {participant_id: ndarray} for the matching rows
In the lab where datanest originated, time-series payloads are
typically pysampled.Data objects, but this is a use
convention, not a hard dependency — datanest does not import
pysampled and works with whatever payload type you choose.
Hierarchical data: DatabaseContainer
When metadata lives at multiple levels (e.g. subject → trial → action), wrap a set of Database instances in a DatabaseContainer. Each level is added with a key-derivation function that maps a child id to its parent id. The container provides the same keyword-argument query syntax as Database, resolving the right level automatically.
import datanest
dbc = datanest.DatabaseContainer()
dbc.add("subject", subject_db)
dbc.add("trial", trial_db, "subject", lambda trial_id: trial_id[:2])
dbc.add("action", action_db, "trial", lambda action_id: action_id[:3])
# Query at any level — subject metadata filters trials and actions too
dbc(subject=3) # all subject-3 trials/actions
dbc(action_phase='extension') # subset of action rows
dbc.heart_rate(age_lim=(50, 60)) # data field added at any level
DatabaseContainer uses the same _lim / _has / _any suffix conventions as Database. Add data fields to the child databases directly (trial_db.add_data_field(...)); the container makes them queryable at any level via dbc.<field>(...).
Caching expensive computations
datanest.cache_me_if_you_can and cache_me_if_you_can_incremental are
dill-backed file-cache decorators. Use them to skip recomputation when
loading or summarizing data fields is expensive. Both accept an optional
suffix callable that receives the wrapped function's (*args, **kwargs)
and returns a string inserted between the cache file's stem and extension —
handy for one cache file per input.
from datanest import cache_me_if_you_can, cache_me_if_you_can_incremental
# One cache file per subject (e.g. heart_rate_s03.pkl)
@cache_me_if_you_can("heart_rate.pkl", suffix=lambda subject_id: f"_s{subject_id:02d}")
def load_heart_rate(subject_id):
return expensive_load(subject_id) # runs once per subject_id
# Build up a {trial_id: metrics} dict across many calls, persisting on disk
@cache_me_if_you_can_incremental("trial_metrics.pkl", return_name="ret", return_default={})
def summarize_trial(trial_id, ret=None):
if trial_id not in ret:
ret[trial_id] = compute_metrics(trial_id)
return ret
License
datanest is distributed under the terms of the MIT license.
Contact
Project Link: https://github.com/praneethnamburi/datanest
Acknowledgments
This tool was developed as part of the ImmersionToolbox initiative at the MIT.nano Immersion Lab. Thanks to NCSOFT for supporting this initiative.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datanest-1.2.0.tar.gz.
File metadata
- Download URL: datanest-1.2.0.tar.gz
- Upload date:
- Size: 97.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f42008b0726a9984021a2ba08dc8283e7235f980b0e09603858bac4a565dec8
|
|
| MD5 |
f4cfa23b46fe2c784fe0c5381bf76bfb
|
|
| BLAKE2b-256 |
1d49f7fdf935b2b45f10b9f8633b6189f1c5d8fd7abf101ccc3b58b1e0af57b4
|
File details
Details for the file datanest-1.2.0-py3-none-any.whl.
File metadata
- Download URL: datanest-1.2.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1361317a1c64faf1fdc3fc01d765be51942c3ac5f268df9fc56d5c57cc64862a
|
|
| MD5 |
8a9cd9fceeebace95e06cc6bc498054b
|
|
| BLAKE2b-256 |
6ea389d6349dfb2ca7ddeadf1a2e6bae1cf039c7d0a8e68623b9b830f48a60a9
|