eons sample analyzer and manager
Project description
eons Sample Analysis and Manager
Generalized framework for scientific data analysis.
Design in short: Self-registering functors with reflection to and from json for use with arbitrary data structures.
Consider if you would like to design an analysis pipeline to share with your colleagues. All you have to do is create the functors and have your colleagues place them in their respective folders (no code change necessary on their part, since the new files will be automatically picked up). You can then pass your data as json between each other, potentially creating your own analysis steps, report outputs, etc., all of which could be shared later or kept as personalized as you'd like.
Built with eons using the eons build system
Installation
pip install esam
Usage
Quickstart: just go copy the example folder somewhere and run esam from that directory; then start hacking!
To use esam (or your own custom variant), you must first invent the universe.
Once that's done and you've installed the program on your computer, you'll need to create a workspace.
A workspace is any folder you'd like to store your data in, which also contains a sam
folder.
In the sam
folder should be the following sub-folders:
- analysis
- data
- format/input
- format/output
These folders will then be populated by your own data structures (Datum
), parsers (InputFormatFunctor
), report templates (OutputFormatFunctor
), and analysis steps (AnalysisFunctor
).
NOTE: it is not necessary to do anything besides place your files in these directories to use them. See below for more info on design (and technically, it doesn't matter which folder what file is in but the organization will help keep things consistent when publishing or sharing your work)
Example
Let's break down this command: esam -v -i *.txt -if in_ms -f mass --ignore 86 -s saved_ms-data.json -o out_ms-data_1.xlsx -of out_excel
The start is straightforward. esam -v
runs this program in verbose mode, which will print debug messages.
The next part might be a bit arcane. This is intended to be run on a unix system where *
will be expanded to all matches. So, in this case, -i *.txt
will take all files in the current directory that end in ".txt". That trick will be invaluable when working with datasets containing lots of raw files.
Continuing with the inputs, we have -if in_ms
. This is the esam magic. What we are doing is telling esam that all of our .txt files are in a format which can be parsed by the "in_ms" object, which is located in the format/inputs
folder and self-registers with our program when you run the command. Thus, this translates to "the input format for my input files is the in_ms format class". When working with your own data, you might want to take a look at in_ms and write your own version.
Next we use -f mass --ignore 86
to filter our data, by removing all entries with a mass of 86. Similar to how -if
is used to find the right class to parse our inputs, -f
must be used to find the right field in our data to filter by. Specifying -f
is necessary if you want to use --only
or --ignore
. Filtered data will not be saved or output. If you wish to consolidate inputs into a single dataset then filter, run esam twice: once to generate a saved file with all data and again to load that saved file, filter, and generate another saved file with the data you want. You can also do this with analysis steps so that you don't lose progress along the way.
You might have guessed what we do next. We save the file! Using -s saved_ms-data.json
generates a reusable json output. If you wanted to add data to this saved file, you could run this same command on different *.txt files and add -l saved_ms-data.json
, to load the previously saved data and then add to it.
Lastly, we generate our desired output with -o out_ms-data_1.xlsx -of out_excel
. This is exactly analogous to input parsing, except instead of reading in a file and populating data structures from the contents, we take data structures and write their fields in a way that shows us usable information.
Hopefully, that's enough to get you started. If you need more help, start with esam --help
and if you're still having trouble, reach out to us at support@eons.dev. We'll try our best to get back to you quickly :)
NOTE: out_excel requires that you pip install pandas openpyxl
and name your output file " SOMETHING .xlsx " (no spaces)
If you're curious, all the files names here follow the eons naming scheme
Design
Saving and Loading
In addition to having self-registering functors, provided by eons, esam provides reflection between python and json. Saving files thus allows you to retain everything from your original data, no matter how complex the initial analysis was.
As long as your Data
and Functors
(the classes you derive from esam.Datum
and eons.UserFunctor
or their children), have been placed in the proper folders, you'll be able to save, load, and thus, work with your data through json.
Saving and loading is handled by esam, rather than the downstream application. Saved files will always be .json (unless you fork this repository and change the ESAM base class).
Currently, jsonpickle is used for json reflection.
Functors
Functors are classes (objects) that have an invokable ()
operator, which allows you to treat them like functions.
esam uses functors to provide input, analysis, and output functionalities, which are made simple by classical inheritance.
The primary ways functors are used are:
- To digest input and store the contents of a file as workable data structures.
- To mutate stored data and do analytical work.
- To output stored data into a user-friendly report format.
Functors are also used to provide save and load functionality.
For extensibility, all functors take a **kwargs
argument. This allows you to provide arbitrary key word arguments (e.g. key="value") to your objects.
Self Registration
Normally, one has to import
the files they create into their "main" file in order to use them. That does not apply when using esam. Instead, you simply have to derive from an appropriate base class and then call eons.SelfRegistering.RegisterAllClassesInDirectory(...)
(which is done for you on the folder paths detailed above). Providing the directory of the file as the only argument, this will essentially import
all files in that directory and make them instantiable via eons.SelfRegistering("ClassName")
.
Example
For example, in some MyDatum.py
in a MyData
directory, you might have:
import logging
from esam import Datum
class MyDatum(Datum): #Datum is a useful child of SelfRegistering
def __init__(self, name="only relevant during direct instantiation"):
logging.info(f"init MyDatum")
super().__init__()
From our main.py, we can then use eons
to call:
import sys, os
from eons import SelfRegistering
SelfRegistering.RegisterAllClassesInDirectory(os.path.join(os.path.dirname(os.path.abspath(__file__)), "MyData"))
Here, we use os.path
to make the file path relevant to the project folder and not the current working directory.
Then, from main, etc. we can call:
myDatum = eons.SelfRegistering("MyDatum")
and we will get a MyDatum
object, derived from esam.Datum
, fully instantiated.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file esam-1.1.2.tar.gz
.
File metadata
- Download URL: esam-1.1.2.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c3d21c09fbd9964cd10f0e01203031a4fd94ec5de3ef48ab128f49248f2d176 |
|
MD5 | eab6a43362666bec7b6592de2957d895 |
|
BLAKE2b-256 | e29ec4704e415645c2149d8e5709f8c680eb3faa6d7b633365dda55c2d7a497d |
File details
Details for the file esam-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: esam-1.1.2-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a747247aa1a780851b16ab3fdc78bb8b92cbb6b11047730cc149caaec83a693 |
|
MD5 | 6c1c967b9cab71076a474d3b4fa92308 |
|
BLAKE2b-256 | 4a00f69b1217e8b15fdb3efade439e4dd568a49b44d345c695ef0e2a03029691 |