Skip to main content

Generated from aind-library-template

Project description

ZOMBIE Squirrel

License Code Style semantic-release: angular Interrogate Coverage Python

Logo (image from ChatGPT)

zombie-squirrel is a set of one-line functions that handle the entire process of caching and retrieving data (and metadata) from AIND data assets.

In the background, the ZOMBIE squirrel repackages data/metadata into dataframes and stores them on S3 in versioned folders (data-asset-cache/zs-v{version}/), or in memory for testing. Each release writes to its own versioned folder, so older versions of the website remain accessible while new versions are deployed. A top-level data-asset-cache/zombie-squirrels.json index lists all available version folders.

Installation

pip install zombie-squirrel

Usage

Set backend

export FOREST_TYPE='S3'

Options are 'S3', 'MEMORY'.

Scurry (fetch) data

from zombie_squirrel import unique_project_names

project_names = unique_project_names()

Acorns

get_squirrel_info returns the following information about all available acorns. Paths are versioned — {version} is the installed zombie-squirrel package version (e.g. 0.27.3).

Acorn Description Location Type Partitioned Columns
unique_project_names Unique project names across all assets s3://allen-data-views/data-asset-cache/zs-v{version}/unique_project_names.pqt metadata False project_name
unique_subject_ids Unique subject_ids across all assets s3://allen-data-views/data-asset-cache/zs-v{version}/unique_subject_ids.pqt metadata False subject_id
unique_genotypes Unique genotypes across all assets where subject.subject_details.genotype is present s3://allen-data-views/data-asset-cache/zs-v{version}/unique_genotypes.pqt metadata False genotype
asset_basics Commonly used asset metadata, one row per data asset s3://allen-data-views/data-asset-cache/zs-v{version}/asset_basics.pqt metadata False _id, _last_modified, modalities, project_name, data_level, subject_id, acquisition_start_time, acquisition_end_time, code_ocean, process_date, genotype, age, acquisition_type, location, name, experimenters, experimenters_normalized, instrument_id, instrument_id_normalized, investigators, investigators_normalized
source_data Mapping from derived asset names to their source raw asset names s3://allen-data-views/data-asset-cache/zs-v{version}/source_data.pqt metadata False name, source_data, pipeline_name, processing_time
quality_control Quality control table with one row per QC metric, partitioned by subject_id s3://allen-data-views/data-asset-cache/zs-v{version}/qc/ asset True (by subject_id) name, stage, modality, value, status, asset_name
platform_qc Tag-level QC statuses aggregated per platform, one row per asset/tag combination s3://allen-data-views/data-asset-cache/zs-v{version}/platform_qc/ platform True (by platform) asset_name, tag, status, timestamp, instrument_id_normalized, experimenters_normalized
assets_smartspim SmartSPIM assets with processing status and neuroglancer links, one row per (asset, channel) s3://allen-data-views/data-asset-cache/zs-v{version}/assets_smartspim.pqt metadata False name, raw_name, processed, institution, processing_end_time, stitched_link, raw_link, channel, segmentation_link, quantification_link, alignment_link
platform_fib Fiber photometry assets in long form, one row per asset/fiber/channel combination s3://allen-data-views/data-asset-cache/zs-v{version}/platform_fib.pqt metadata False asset_name, fiber, patch_cord, channel, intended_measurement, targeted_structure
foraging_sessions Foraging behavior sessions with key performance metrics, one row per session s3://allen-data-views/data-asset-cache/zs-v{version}/foraging_sessions.pqt metadata False subject_id, session_date, session, nwb_suffix, rig, trainer, trainer_normalized, task, curriculum_name, curriculum_version, current_stage_actual, foraging_eff, foraging_eff_random_seed, finished_trials, finished_rate, total_trials, bias_naive
behavior_curriculum Behavior assets with curriculum name and stage, one row per behavior asset s3://allen-data-views/data-asset-cache/zs-v{version}/behavior_curriculum.pqt asset False asset_name, curriculum_name, stage_name, stage_node_id

Hive-partitioned acorns use key=value directory segments, enabling DuckDB queries like:

import duckdb
duckdb.query("""
    SELECT * FROM read_parquet(
        's3://allen-data-views/data-asset-cache/zs-v0.27.3/qc/**',
        hive_partitioning=true,
        union_by_name=true
    )
""")

The squirrel.json registry lives at s3://allen-data-views/data-asset-cache/zs-v{version}/squirrel.json. The top-level s3://allen-data-views/data-asset-cache/zombie-squirrels.json lists all available version folders as a JSON array.

The raw_to_derived function is not a table stored in S3, instead it is used by passing an asset_name (or list of asset names) and a modality. The function returns the latest derived asset matching the requested pattern.

Custom acorn

The custom function allows you to store and retrieve your own user-defined DataFrames in the cache by name. This requires write authentication to the active backend.

from zombie_squirrel import custom
import pandas as pd

df = pd.DataFrame({"col": [1, 2, 3]})
custom("my_data", df)

retrieved_df = custom("my_data")

Hide all the acorns

We run a nightly capsule on Code Ocean with this code to hide all acorns (not the custom ones).

from zombie_squirrel.sync import hide_acorns
hide_acorns()

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zombie_squirrel-0.28.4.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zombie_squirrel-0.28.4-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file zombie_squirrel-0.28.4.tar.gz.

File metadata

  • Download URL: zombie_squirrel-0.28.4.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zombie_squirrel-0.28.4.tar.gz
Algorithm Hash digest
SHA256 0674c899d79b22f480a55ebf58204eb9bfa71787540ae97325d4057ab836e73d
MD5 a9de391d92c65b2c11f89ad44945cda3
BLAKE2b-256 48bf5a3e5f64f5e0f31208d3c362e1d90057395947c63c46ded35fc986660c52

See more details on using hashes here.

File details

Details for the file zombie_squirrel-0.28.4-py3-none-any.whl.

File metadata

File hashes

Hashes for zombie_squirrel-0.28.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dd225b4b0f43382950ec50fd561bbd553b855fe6eba6832f0582b6ce209242c2
MD5 551edffd5ee69c17b7a5672ac83b5b0d
BLAKE2b-256 9398026e471522ceaf0461d56e049dee904b050d1faa9b31a2d3d9756ca082a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page