Skip to main content

Python library to batch scripts on a file-system workspace.

Project description

Sinagot

Sinagot is a Python library to batch multiple scripts on a file-system dataset with a simple API. Parallelization of data processing is made possible by Dask.distributed.

Installation

Sinagot is available on PyPi:

pip install sinagot

Full Documentation

https://sinagot.readthedocs.io

Concept

Sinagot main class is build around the sinagot.Workspace class. To create an instance, you must provide 3 pathes to :

  • A configuration file in .toml format.
  • A data folder.
  • A scripts fodler.

Dataset is structured as a collection of records. A record is identified by an unique ID but many files can be generated for a single record. Those files are processed with scripts which generate other files as results.

Basic example

Harbor workspace

You can find in "example" folder of the git the harbor workspace that has a record per day of a harbor occupancy. In this example, a record is created each day to count the boats that occupy the harbor. The record ID include a timestamp for the day of recording.

In Unix environment, you can that type this to get the workspace :

wget -qO- https://github.com/YannBeauxis/sinagot/raw/master/example/harbor.tar.gz | tar xvz

To create the workspace instance :

>>> from sinagot import Workspace
>>> ws = Workspace('/path/to/harbor/workspace/folder')
>>> ws
<Workspace instance>

Explore records

You can list all records ids:

>>> list(ws.records.iter_ids())
['REC-20200602', 'REC-20200603', 'REC-20200601']

Create a Record instance. For a specific record:

>>> ws.records.get('REC-20200603')
<Record instance | id: REC-20200603>

Or the first record found:

>>> ws.records.first()
<Record instance | id: REC-20200602>

Records are not sort by their ids.

run scripts

You can run all scripts for each record of the dataset:

>>> ws.steps.run()
REC-20200602 | 2020-08-20 11:19:11,530 | count : Init run
REC-20200602 | 2020-08-20 11:19:11,531 | count : Processing run
REC-20200602 | 2020-08-20 11:19:11,556 | count : Run finished
...
REC-20200601 | 2020-08-20 11:19:11,625 | mean : Init run
REC-20200601 | 2020-08-20 11:19:11,626 | mean : Processing run
REC-20200601 | 2020-08-20 11:19:11,634 | mean : Run finished

Or for a single record:

>>> ws.records.get('REC-20200603').steps.run()
REC-20200603 | 2020-08-20 11:28:32,588 | count : Init run
REC-20200603 | 2020-08-20 11:28:32,590 | count : Processing run
REC-20200603 | 2020-08-20 11:28:32,616 | count : Run finished
REC-20200603 | 2020-08-20 11:28:32,619 | mean : Init run
REC-20200603 | 2020-08-20 11:28:32,621 | mean : Processing run
REC-20200603 | 2020-08-20 11:28:32,637 | mean : Run finished

More complex dataset

You can handle more complexity of dataset structure with task and modality concepts. During a recording session for a single record, data can be generate for differents task and each task can generate different kind of data called modality.

SoNeTAA usecase

The idea of Sinagot emerged for the data management of an EEG platform called SoNeTAA : https://research.pasteur.fr/en/project/sonetaa/ .

For documentation purpose SoNeTAA workspace structure will be used as example.

On SoNeTAA, a record with an ID with timestamp info in this format REC-[YYMMDD]-[A-Z], for example "REC-200331-A".

For a record, 3 tasks are performed:

  • "RS" for Resting State
  • "MMN" for MisMatch Negativity
  • "HDC" for Human Dynamic Clamp.

3 modalities handle data depending of the tasks

  • For each tasks, "EEG" modality create data from ElectroEncephalogram .
  • A "behavior" modality create date only for HDC task.
  • A "clinical" modality handle data used for every task.

Explore by task or modality

Each record collection or single record has subscopes corresponding to their tasks and modalities accessible as attribute.

For example to select only the task RS of the dataset:

>>> ws.RS
<RecordCollection instance | task: RS, modality: None>

A dataset subscope is a RecordCollection.

Or the EEG modality of a record:

>>> rec.EEG
<Record instance | id: REC-200331-A, task: None, modality: EEG>

You can select a specific couple of task and modality (called unit):

>>> ws.RS.EEG
<RecordCollection instance | task: RS, modality: EEG>
>>> ws.EEG.RS
<RecordCollection instance | task: RS, modality: EEG>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinagot-0.2.5.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinagot-0.2.5-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file sinagot-0.2.5.tar.gz.

File metadata

  • Download URL: sinagot-0.2.5.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.6.12 Linux/5.4.0-48-generic

File hashes

Hashes for sinagot-0.2.5.tar.gz
Algorithm Hash digest
SHA256 bcdfc1bb31fa1e0310662f7cd0564d1d3fe5cf4d62b09667681c11aee2884817
MD5 701c744e0cf924d5690e2aab3d7bc5b6
BLAKE2b-256 5535fc60e7a29e9ba7ea763f9760768ff80c59440534c566048ce36326421068

See more details on using hashes here.

File details

Details for the file sinagot-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: sinagot-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.6.12 Linux/5.4.0-48-generic

File hashes

Hashes for sinagot-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cb31fdb2d258a86aae649763d6450ff4f03bc19648ceb01ae622f6b07f7804c9
MD5 7f8faeeacec4b6b7a77fb82ac824f73b
BLAKE2b-256 d7668f606a224a4985bce9a82fd662d8ab61e0db986eaf40377dd3d26ad8636b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page