Skip to main content

eons sample analyzer and manager

Project description

eons Sample Analysis and Manager

build

Generalized framework for scientific data analysis.

Design in short: Self-registering functors with reflection to and from json for use with arbitrary data structures.

Consider if you would like to design an analysis pipeline to share with your colleagues. All you have to do is create the functors and have your colleagues place them in their respective folders (no code change necessary on their part, since the new files will be automatically picked up). You can then pass your data as json between each other, potentially creating your own analysis steps, report outputs, etc., all of which could be shared later or kept as personalized as you'd like.

Built with eons using the eons build system

Installation

pip install esam

Usage

Quickstart: just go copy the example folder somewhere and run esam from that directory; then start hacking!

To use esam (or your own custom variant), you must first invent the universe. Once that's done and you've installed the program on your computer, you'll need to create a workspace. A workspace is any folder you'd like to store your data in, which also contains a sam folder. In the sam folder should be the following sub-folders:

  • analysis
  • data
  • format/input
  • format/output

These folders will then be populated by your own data structures (Datum), parsers (InputFormatFunctor), report templates (OutputFormatFunctor), and analysis steps (AnalysisFunctor).

NOTE: it is not necessary to do anything besides place your files in these directories to use them. See below for more info on design (and technically, it doesn't matter which folder what file is in but the organization will help keep things consistent when publishing or sharing your work)

Example

Let's break down this command: esam -v -i *.txt -if in_ms -f mass --ignore 86 -s saved_ms-data.json -o out_ms-data_1.xlsx -of out_excel

The start is straightforward. esam -v runs this program in verbose mode, which will print debug messages.
The next part might be a bit arcane. This is intended to be run on a unix system where * will be expanded to all matches. So, in this case, -i *.txt will take all files in the current directory that end in ".txt". That trick will be invaluable when working with datasets containing lots of raw files.
Continuing with the inputs, we have -if in_ms. This is the esam magic. What we are doing is telling esam that all of our .txt files are in a format which can be parsed by the "in_ms" object, which is located in the format/inputs folder and self-registers with our program when you run the command. Thus, this translates to "the input format for my input files is the in_ms format class". When working with your own data, you might want to take a look at in_ms and write your own version.
Next we use -f mass --ignore 86 to filter our data, by removing all entries with a mass of 86. Similar to how -if is used to find the right class to parse our inputs, -f must be used to find the right field in our data to filter by. Specifying -f is necessary if you want to use --only or --ignore. Filtered data will not be saved or output. If you wish to consolidate inputs into a single dataset then filter, run esam twice: once to generate a saved file with all data and again to load that saved file, filter, and generate another saved file with the data you want. You can also do this with analysis steps so that you don't lose progress along the way.
You might have guessed what we do next. We save the file! Using -s saved_ms-data.json generates a reusable json output. If you wanted to add data to this saved file, you could run this same command on different *.txt files and add -l saved_ms-data.json, to load the previously saved data and then add to it.
Lastly, we generate our desired output with -o out_ms-data_1.xlsx -of out_excel. This is exactly analogous to input parsing, except instead of reading in a file and populating data structures from the contents, we take data structures and write their fields in a way that shows us usable information.

Hopefully, that's enough to get you started. If you need more help, start with esam --help and if you're still having trouble, reach out to us at support@eons.dev. We'll try our best to get back to you quickly :)

NOTE: out_excel requires that you pip install pandas openpyxl and name your output file " SOMETHING .xlsx " (no spaces)

If you're curious, all the files names here follow the eons naming scheme

Design

Saving and Loading

In addition to having self-registering functors, provided by eons, esam provides reflection between python and json. Saving files thus allows you to retain everything from your original data, no matter how complex the initial analysis was. As long as your Data and Functors (the classes you derive from esam.Datum and eons.UserFunctor or their children), have been placed in the proper folders, you'll be able to save, load, and thus, work with your data through json.

Saving and loading is handled by esam, rather than the downstream application. Saved files will always be .json (unless you fork this repository and change the ESAM base class).

Currently, jsonpickle is used for json reflection.

Functors

Functors are classes (objects) that have an invokable () operator, which allows you to treat them like functions. esam uses functors to provide input, analysis, and output functionalities, which are made simple by classical inheritance.

The primary ways functors are used are:

  1. To digest input and store the contents of a file as workable data structures.
  2. To mutate stored data and do analytical work.
  3. To output stored data into a user-friendly report format.

Functors are also used to provide save and load functionality.

For extensibility, all functors take a **kwargs argument. This allows you to provide arbitrary key word arguments (e.g. key="value") to your objects.

Self Registration

Normally, one has to import the files they create into their "main" file in order to use them. That does not apply when using esam. Instead, you simply have to derive from an appropriate base class and then call eons.SelfRegistering.RegisterAllClassesInDirectory(...) (which is done for you on the folder paths detailed above). Providing the directory of the file as the only argument, this will essentially import all files in that directory and make them instantiable via eons.SelfRegistering("ClassName").

Example

For example, in some MyDatum.py in a MyData directory, you might have:

import logging
from esam import Datum
class MyDatum(Datum): #Datum is a useful child of SelfRegistering
    def __init__(self, name="only relevant during direct instantiation"):
        logging.info(f"init MyDatum")
        super().__init__()

From our main.py, we can then use eons to call:

import sys, os
from eons import SelfRegistering
SelfRegistering.RegisterAllClassesInDirectory(os.path.join(os.path.dirname(os.path.abspath(__file__)), "MyData"))

Here, we use os.path to make the file path relevant to the project folder and not the current working directory.
Then, from main, etc. we can call:

myDatum = eons.SelfRegistering("MyDatum")

and we will get a MyDatum object, derived from esam.Datum, fully instantiated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esam-1.1.2.tar.gz (16.5 kB view hashes)

Uploaded Source

Built Distribution

esam-1.1.2-py3-none-any.whl (13.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page