Skip to main content

Command-line client tools for EPICS-CA data processing

Project description

About CAmagick / CAspy

pipeline status coverage report Latest Release

[TOC]

CAspy is the Swiss Army-Knife in your SCADA toolbox; your silver bullet intended for Monday-morning Integration, i.e. rapid, ad-hoc, last-minute installation of features and equipment into an existing setup:

  • 3rd-party measurement devices
  • superficial data processing and enhancements
  • live visualizaion and storage options.

It is focused primarily on interaction with the EPICS Channel-Access protocol, but there is nothing fundamentally tying CAspy to EPICS. In fact, it's rapidly expanding to support transfer protocols like Tango and various data backends like HDF5, Zarr, Tiled etc.

The intended audience are beamline scientists, engineers, researchers tasked with operating and maintaining experimental endstations at synchrotrons, accelerators, universities, reactors and other research facilities.

CAmagick is the name of the Python package of which the CAspy application is part of.

Quick Introduction

Download & First Start

Either through PyPI:

$ pip install camagick[all]
$ caspy -h

Or by downloading the Docker/Podman image:

podman run -ti --rm registry.gitlab.com/kmc3-xpp/camagick:latest -h

This would produce an output like the following:

usage: caspy [options] <pipe>

Swiss Army Knife for scientific data collection.

options:
  -h, --help  show this help message and exit
  -version    prints version information and exits
  -list-all   list available data flow and processing pipe modules
  -save SAVE  don't execute the pipeline, just dump command line arguments as YAML to stdout
  -yaml YAML  load processing pipeline from YAML file

Try `-list-all` for a list of processors.

CAspy has an extensive (if yet a bit unpolished) built-in help system. Feel free to explore, starting with its entry point, the -list-all option.

Quick And Simple Examples to Try Out

CAspy needs data, for instance from EPICS process variables (PVs) available in your local network. If you don't have any EPICS PVs readily available, explore the Ad-Hoc Random Data IOC chapter of this document for information on how to rapidly spin up an IOC with test values.

In these examples, we'll use the following variables for demonstration (feel free to replace by your own PVs):

  • EXAMPLE:iffy - a floating-point scalar value, e.g. as you'd receive from a sensor or a single-value detector,

  • EXAMPLE:jedda - an array of 2048 floating-point values, e.g. as you'd receive them from an image detector, an oscilloscpoe or any other of array-based detector.

Visualization of EPICS Data

  • Ploting the array data:

    caspy --from-epics EXAMPLE:jedda --to-plot
    

    (Press Ctrl+C in the terminal to make it stop.)

  • Storing the evolution of the scalar value in a loop-buffered array and plotting:

    caspy --from-epics iffy prefix=EXAMPLE: --stack --to-plot
    
  • Reshaping array data as a 64x32 image and 2D plotting:

    caspy --from-epics EXAMPLE:jedda --reshape 64 32 --to-plot
    

    The --reshape filter expects as many unnamed arguments as dimensions the dataset should be reshaped to.

  • Screenshots for the above

    Ploting Jedda{width=30%} Jedda as Image{width=30%} Plottin Iffy Evolution{width=30%}

Data Processing and Filtering

  • You've already been implicitly introduced to numeric data processing with numpy in an earlier example. Here we're showing more examples of using numpy functions, cortesy of the --npfun processing filter. The hard rules are:

    • only functions in the main numpy. Python namespace are allowed,
    • they must accept one array as their first argument and deliver one array as their return value,
    • they may accept other optional arguments (named or direct). For instance, summing over an array:
    $ caspy --from-epics EXAMPLE:jedda --npfun sum --to-summary
    EXAMPLE_jedda: 1037.1276589787378
    EXAMPLE_jedda: 1049.6264480584964
    ...
    

    Or summing over one particular axis of a 2D array:

    $ caspy --from-epics EXAMPLE:jedda --reshape 64 32 --npfun sum axis=1 --to-summary
    EXAMPLE_jedda: shape=(64,) <class 'float'>
    EXAMPLE_jedda: shape=(64,) <class 'float'>
    ...
    

    Or a different operation -- say, building a gradient:

    $ caspy --from-epics EXAMPLE:jedda --npfun gradient --to-summary
    EXAMPLE_jedda: shape=(2048,) <class 'float'>
    EXAMPLE_jedda: shape=(2048,) <class 'float'>
    EXAMPLE_jedda: shape=(2048,) <class 'float'...
    

    Note that in the last example, numpy.gradient might return a single numpy.ndarray, or a tuple of arrays -- depending on the dimensionality. Using the --npfun gradient filter on a multi-dimensional array is undefined behavior in CAspy.

    By the way, here we've introduced one of the simplest, yet most useful, built-in debugging tools of CAspy -- the --to-summary data sink. The name gives it away: it produces a summary of each of the data containers within the processing pipeline. For scalar values, it prints the value itself. For arrays, it prints quick info on the shape and data type.

  • Array slicing: calculating region of interests (ROIs).

    Consider this:

    $ caspy --from-epics EXAMPLE:jedda --reshape 64 32 --slice 0 --to-summary
    EXAMPLE_jedda: shape=(32,) <class 'float'>
    EXAMPLE_jedda: shape=(32,) <class 'float'> ...
    

    Or this:

    $ caspy --from-epics EXAMPLE:jedda --reshape 64 32 --slice 0:10 3:8 --to-summary
    EXAMPLE_jedda: shape=(10, 3) <class 'float'>
    EXAMPLE_jedda: shape=(10, 3) <class 'float'> ...
    

    What happened here is we selected a subset of the 64x32 "jedda" image, and this results in corresponding 1D or 2D sub-arrays -- or "regions of interest" (ROIs), as physicists love to call them.

    Adding an --npfun sum filter...

    $ caspy --from-epics EXAMPLE:jedda \
            --reshape 64 32 \
            --slice 0:10 3:8 \
            --npfun sum 
            --to-summary
    

    actually produces the total intensity value within the respective ROI:

    EXAMPLE_jedda: 14.627766643611942
    EXAMPLE_jedda: 13.962308251634408
    EXAMPLE_jedda: 14.93059507458529
    ...
    

At the end of this subsection we'd like to take a moment and emphasize how CAspy is working internally: it is repeating the processing pipe specified on the command line, over and over again. See also this systematic approach for details. In all the examples -- the ones we've seen and the ones that still follow -- we've eventually

Data Storage to Various Formats

  • Saving to HDF5
    $ caspy --from-epics EXAMPLE:iffy EXAMPLE:jedda \
            --to-hdf5 my-data.h5#group/{tag} \
            --to-summary
    

The trailing --to-summary parameter is optional. But most output pipes, including the --to-hdf5 sink, produce the full path (outside and inside the HDF5 file) as a string, so the --to-summary will produce some output, showing that and where the data is going to:

./my-data.h5#group/EXAMPLE_jedda[None]  ./my-data.h5#group/EXAMPLE_iffy[None]
./my-data.h5#group/EXAMPLE_jedda[None]  ./my-data.h5#group/EXAMPLE_iffy[None]
...

You can inspect the HDF5 file for what's inside:

$ h5ls -r my-data.h5
/                        Group
/group                   Group
/group/EXAMPLE_iffy      Dataset {83/Inf}
/group/EXAMPLE_jedda     Dataset {83/Inf, 2048}
  • Saving to Zarr, analogously to the HDF5 example:

    $ caspy --from-epics EXAMPLE:iffy EXAMPLE:jedda \
            --to-zarr my-data.zarr#group/{tag}
    
  • Selecting a ROI and printing scalar values in a table:

    $ caspy --from-epics EXAMPLE:jedda \
            --reshape 64 32 \
            --slice 16:48 8:24 \
            --npfun sum \
            --to-summary mode=table
    

    Similarly to the ROI example this will integrate over a designated areay (this timeroughly 500 pixels) in the middle of the 64x32 image, producing numbers randomly fluctuating around the value 250:

    # EXAMPLE_jedda
    260.61935388471227
    246.3464424926541
    262.830143753871
    ...
    

    Remember that --to-summary mode=table is just a poor man's tabublar output. When used in lieu of a data storage backend, it will deliver expected results for scalar data only. For arrays, it will still just print a hint as to the shape and data type of the array, no actual data.

    See also the selective data processing and the more complex calculating ROIs examples.

Ad-Hoc-IOC: Receiving and Broadcasting Results

"Integration" only begins to be really fun when we can inject our own data into the EPICS network of your beamline.

  • Broadcasting ROIs of images as new PVs -- revisiting the previous example:

    $ caspy --from-epics EXAMPLE:jedda \
            --reshape 64 32 \
            --slice 16:48 8:24 \
            --npfun sum \
            --to-ioc prefix=RESULT:
    

    This will create an ad-hoc IOC, exporting RESULT:EXAMPLE_jedda as a new EPCIS variable, as we can easily check in another terminal:

    $ camonitor RESULT:EXAMPLE_jedda
    RESULT:EXAMPLE_jedda           2025-05-18 12:20:52.138180 248.042
    RESULT:EXAMPLE_jedda           2025-05-18 12:20:53.165549 243.405
    RESULT:EXAMPLE_jedda           2025-05-18 12:20:54.089684 245.534
    
  • Receiving specific data from an EPICS client is just as much fun, but it involves spinning up an ad-hoc IOC as a data source:

    $ caspy --from-ioc var="ene(0.0)" prefix=EXAMPLE: --to-summary
    var: 0.0
    

    The output is owing to the --to-summary data sink; but the IOC itself, exporting EXAMPLE:ene as a variable with initial value 0.0, exists with or without --to-summary to testify. As a next step, we can now interact with the IOC using any EPICS client, e.g. from another terminal:

    $ caget 
    $ caput EXAMPLE:ene 3.14
    $ caput EXAMPLE:ene 6.28
    

    All the while, the original caspy instance we used to spawn the IOC will testify to data entering the processing pipe:

    ...
    var: 3.14
    var: 6.28
    
  • A common pattern is receiving data e.g. as session or "scan" ID information, and then using it to build storage paths:

    $ caspy --from-ioc scan="scan(0)" prefix=EXAMPLE: hang=False \
            --from-epics EXAMPLE:jedda \
            --demote scan \
            --to-hdf5 my-data.h5#scan-{scan:03d}/data
    

    The IOC starts running with initial scan=0 input data, in addition to retrieving EPICS data from EXAMPLE:jedda. The hang=False parameter here instructs CAspy to not wait for a new value on each run of pipeline. Instead, as in the examples before, it's still only fresh values of EXAMPLE:jedda that trigger a new run through the pipeline (and thus a saving operation).

    After a brief while we can check the contents of my-data.h5. In our example, we'll realize that the /scan-000/data dataset now has 21 data points:

    $ h5ls -r my-data.h5
    /                        Group
    /scan-000                Group
    /scan-000/data           Dataset {21/Inf, 2048}
    

    A real-life scenario would now involve starting new scans every once in a while. We simulate this by incrementing the scan value every 10 seconds by virtue of a small Bash loop (in reality this would be done by your experiment orchestration system, e.g. Blyesky or Spec):

    $ for ((s=1; ;s++)); do caput $s; sleep 10; done
    

    If we check the the contents of the HDF5 target file later, we'll find a number of "scans", each wihin its own HDF5 group, with each dataset containing 10 data points (because our EXAMPLE:jedda spits out a new data point every second) -- except for the most recent "scan", that is, which will be somewhere in the middle of the acquisition process:

    $ h5ls -r my-data.h5
    /                        Group
    /scan-000                Group
    /scan-000/data           Dataset {381/Inf, 2048}
    /scan-001                Group
    /scan-001/data           Dataset {10/Inf, 2048}
    /scan-002
    /scan-002/data           Dataset {10/Inf, 2048}
    /scan-003
    /scan-003/data           Dataset {7/Inf, 2048}
    

Other Data Sources

  • Receiving data from Tango devices: CAspy implements a (as of now very simple, polling-based) Tango data source, using pytango. This in turn requires TANGO_HOST to be set to the IP and port of a central "Tango server" to connect to:

    $ export TANGO_HOST=10.128.14.84:1000
    $ caspy --from-tango devices/keithley/1 cur=current vol=voltage
    

    In this example, a device called "devices/keithley/1" is exposed by the Tango server at 10.128.14.84, and it offers the channels "voltage" and "current". These channels will be known as "vol" and "cur" in CAspy.

  • Receiving data from UNIX pipes: (WORK-IN-PROGRESS -- the idea here is to receive data via stdin, or from other processes; this would hugely enhance integration potential, as any device that can be read or written to with e.g. a simple shell command could immediately start to participate in the local EPICS network!)

  • Ophyd signals: CAspy can import and use Ophyd classes in subscription mode:

    $ caspy --from-ophyd ophyd.signal:EpicsSignalRO EXAMPLE:jedda --to-summary
    EXAMPLE_jedda: shape=(2048,) <class 'float'>
    EXAMPLE_jedda: shape=(2048,) <class 'float'>
    EXAMPLE_jedda: shape=(2048,) <class 'float'>
    ...
    

    Currently, only Ophyd EpicsSignal classes are supported simply because they are the only ones that offer a data event to begin with.

    (Ophyd Device classes are a WORK-IN-PROGRESS, and definitely fairly high up on the wish list, because the ability to support a cleanly written Ophyd device out of the box for data acquisition bears tremendous integration potential. If you have a use case, we're interested in hearing from you!)

Complex-Flow Examples

This was already a lot of fun! But up until this point, all data processing flow was linear: retrieve data, push it through various transformation pipes, sink it somewhere into storage. But the more interesting use cases involve splitting up the processing pipeline, by various criteria.

Out of the box CAspy supports two flow elements: the fanout alignment { ... } and the chaining [ ... ] alignment.

Chaining is what we've been doing all along: all elements in a chain are executed sequentially, with the output of the previous element being fed as input to the current one. The typical caspy --from-epicss ... --to-summary is, in fact, just a short version of:

$ caspy [ --from-epics ... --to-summary ]

Fanning out is parallel processing: it involves making N copies of its input data, and sending each of those into its own processing element; then, at the end of the fanout, all results are being reunited.

Parallel Operations on the Same Dataset

Multiple operations can be executed on the same input data by invoking CAspy like this:

$ caspy --from-epics ... { --npfun sum --npfun min --npfun max } ...

...would calculate all of sum, minimum and maximum value of whatever input data it receives.

Of course, chains and fanouts can -- and usually have -- to be combined:

$ caspy --from-epics EXAMPLE:jedda \
		{ [ --npfun sum --rename {}_s ] \
		  [ --npfun min --rename {}_m ] \
		  { --npfun max --rename {}_M ] } \
		--to-summary mode=table

This will generate the following output. Note the renaming of each variable, facilitated by the --rename filter, locally bound to its own subchain:

# EXAMPLE_jedda_s       EXAMPLE_jedda_m EXAMPLE_jedda_M
1046.1680891731307      0.0016799877355430093   0.9994537778760137
1043.0111802458446      0.0006056954049933339   0.9999756262387742
1015.1539282981603      1.4248315970077918e-05  0.9992211875197418
...

Selective Data Processing

Using fanout / chain combinations with appropriate filters, we can restrict processing steps to a subset of the input data:

$ caspy --from-epics EXAMPLE:jedda EXAMPLE:iffy \
        {                                \
          [ --only .*jedda --npfun sum ] \
          --only .*iffy                  \
        }                                \
		--to-summary mode=table
# EXAMPLE_jedda EXAMPLE_iffy
1033.2037471761835      0.932708565100116
1037.5297482365136      0.5966679885948641
1019.5874831187512      0.3284404328958187
...

Note that the 2nd branch of the fanout (--only .*iffy), being just a single pipe, doesn't need to be enclosed in chain [ ... ] brackets. (It wouldn't hurt if you did, though.)

Calculating and Broadcasting ROIs Alongside Other Values

An example close to a real-world application is receiving a data array, selecting a sub-region (a.k.a. ROI), publishing the ROI sum as an EPICS variable, and saving everything to a HDF5 file, while showing a plot of the data:

$ caspy --from-epics EXAMPLE:jedda     \
        {                              \
          [                            \
            --reshape 64 32            \
            --slice 16:48 8:24         \
            --npfun sum                \
            --rename {}_roi            \
          ]                            \
          --idem                       \
        }                              \
        {                              \
          [                            \
            --to-hdf5 my-data.h5#{tag} \
          ]                            \
          [                            \
            {                          \
              [ --only .*roi --stack ] \
              [ --exclude .*roi ]      \
            }                          \
            --to-plot                  \
          ]                            \
          [                            \
            --only .*roi               \
            --to-ioc prefix=EXAMPLE:   \
          ]                            \
        }

There are various tricks in the setup above. For instance:

  • the use of --idem, the "pass everythong through" filter, which in combination with { ... } (fanout) actually generates a copy of the data
  • the use of --stack to transform the ROI sum ino a a plottable 1D array
  • the strategy of breaking the whole operation in two consecutive "high-level" fanout structures: one that "sources" data, the other one that sinks it to two separate channels.

You can try to camonitor EXAMPLE:jedda_roi and convince yourself that the data is, indeed, getting published.

This would almost be a real-world example if it wasn't for one minor detail: typically, not all data needs to end up in a HDF5 file. In reality, we'd only want to save data when "going live", i.e. actually measuring data for posterity. But many operations that transmit data through EPICS are only ephemeral -- e.g. calibrating the equipment. This is demonstrated further below.

An Ad-Hoc Random Data IOC

CAspy can be a good tool to quickly spin up complete IOCs, from scratch, in nothing but a few lines of Bash code. For such an example, we'll define the following requirements:

  • we need 2 measurement signals ("iffy" and "jedda" -- why yes, the same ones we've been using in our example all along! :-)
  • for the sake of demonstration we'll use randomly generated data, but keep in mind that we could've received this data by any means supported by CAspy -- e.g. through Tango, a Unix pipe, or just read it from a text file via stdin
  • we want a "busy" marker PV, which should be set to 1 while the system is "working", and blink at least once to 0 once the system has published a new set of (randomly generated) data
  • we want this whole IOC to be written "in CAspy".

Consider this:

$ caspy --from-scratch random iffy=1 jedda=2048  \
        --to-ioc prefix=EXAMPLE:                 \
        --from-ioc prefix=EXAMPLE: mark="mark"   \
        --from-scratch zeros idle                \
        --to-epics idle=EXAMPLE:mark             \
        --pace Hz=1.0                            \
        --from-scratch ones busy                 \
        --to-epics busy=EXAMPLE:mark             \

This the output that is produced, for instance, when reading the EXAMPLE:iffy and EXAMPLE:mark PVs. We notice a clean "busy-data-idle" sequence repeating itself about once per second:

$ camonitor EXAMPLE:iffy EXAMPLE:mark
EXAMPLE:mark                   2025-05-18 20:43:38.685545 1
EXAMPLE:iffy                   2025-05-18 20:43:38.697051 0.464235
EXAMPLE:mark                   2025-05-18 20:43:38.708127 0
EXAMPLE:mark                   2025-05-18 20:43:39.687370 1
EXAMPLE:iffy                   2025-05-18 20:43:39.698541 0.194035
EXAMPLE:mark                   2025-05-18 20:43:39.709579 0
...

Let's understand how this works:

As to the other components, this is what makes the ad-hoc IOC tick:

  • The first two lines --from-scratch ... --to-ioc ... generate and publish the data payload. The --from-scratch source could actually have been replaced by a real data source, e.g. --from-pipe ....
  • The two lines acattered throughout, generating --from-scratch zeros idle, respectively --from-scratch ones busy, prepare our data for the :mark variable. You see, CAspy doesn't have the notion of "constants", "variables" or "operations"; it only has a data stream, and processors to work on it. The 1 and 0 to mark "busy" or "idle" need to come from somewhere -- this is where they come from! (Their placement within the installment is strategic, immediately before the --to-epics sink they're needed in. to avoid having to use selective filters.)
  • The 3rd line --from-ioc ... mark="mark" is how we're hosting the EXAMPLE:mark variable: as an IOC data source. The reason why it's not something else (e.g. an IOC sink) is because the way we write the busy-signal to it: we're sinking data into it while we pretend to be an EPICS client, during the final 3 lines (--to-epics ...).
  • The --pace Hz=1.0 is designed to slow down the loop to about 1/second. This is for demonstration purposes. The fact that it's placed exactly between the "idle" and "busy" write operations is mostly cosmetics.

There are also other possibilities of arranging the commands, in particular if making use of fanout { ... } ,chain [ ... ], and filtering (--only ..., --exclude ...) commands.

The file ./examples/ad-hoc.spy implements exactly this IOC. You can also start it with caspy -y ./examples/ad-hoc.spy :-)

Real-World Examples

Two-Way EPICS-Tango Bridge

Consider a Tango server on 10.128.14.84 port 10000, exposing a device device/keithley/1, with the following properties voltage, current and output. The first two are floating-point numbers we wish to be able to read; additionally, we want to control voltage from our EPICS-based experimental control system; and finally output is to toggle the main operation switch on/off, depending on the value it receives (1 or 0).

In CAspy, we'd implement this in two steps:

  • the reading part (Tango to EPICS):
    $ TANGO_HOST=10.128.14.84:10000 \
      caspy --from-tango devices/keithley/1 cur=current vol=voltage \
      	  --to-ioc prefix="KE:keith:" suffix=".RBV" 
    
  • the writing part, in a different terminal (EPICS to Tango):
    $ TANGO_HOST=10.128.14.84:10000 \
      caspy --from-ioc prefix="KE:keith:" vol="vol(0.0)" output="output(0)" \
            --to-tango dev="devices/keithley/1"  vol="voltage" output="output"
    

This allows us to interact with the Tango device, via EPICS, by reading the variables KE:keith:vol.RBV and KE:keith:cur.RBV, or by writing to KE:keith:output, respectively ...:vol or ...:cur:

$ caput KE:keith:output 0
Old : KE:keith:output                1
New : KE:keith:output                0

$ camonitor KE:keith:vol KE:keith:vol.RBV KE:keith:cur.RBV
KE:keith:vol                   2025-05-16 16:12:16.999676 0
KE:keith:vol.RBV               2025-05-16 16:12:18.427048 1.51175e-06
KE:keith:cur.RBV               2025-05-16 16:12:18.426978 -5.67967e-09
KE:keith:vol                   2025-05-16 16:49:49.793313 0.05
KE:keith:cur.RBV               2025-05-16 16:49:51.058931 -9.45977e-09
KE:keith:vol.RBV               2025-05-16 16:49:51.059145 -2.08196e-07
KE:keith:vol                   2025-05-16 16:56:41.466383 0.1
KE:keith:cur.RBV               2025-05-16 16:56:42.897010 -1.3475e-08
KE:keith:vol.RBV               2025-05-16 16:56:42.897155 2.11221e-08
...

The important part here is that caspy (currently) cannot do this in one single instance. This is owing to the nature of the problem: CAspy is intended to process its entire pipeline, over and over again. But the problem requires frequent periodic runs while retrieving Tango data, and only sporadic runs when setting parameters.

Storing Images and ROIs in "Scan-Mode" Only

We're revisiting the ROI calculating / IOC example from earlier. As demonstrated it is useful for live visualization purposes, but saving data in a production scenario is foiled by the fact that every image -- "true" measurements and alignment-only runs alike -- would end up in the HDF5 file.

To avoid this, we're extending the script by an ad-hoc IOC feature, expecting two variables as input:

  • scan: this is a numerical scan ID (simetimes also called "run", or "measurement plan"), as demonstrated earlier
  • frame: this is another numerical ID, only not for the scan, but for the data point (some people also call this "index") The idea is now that before every run, we need to update through EPICS the ...:scan PV; and before every single data point we acquire, we need to increment ...:frame. If frame is not set, the pipeline will not advance, and the data will not end up in the HDF5 file.

In addition to requesting our fake detector signal EXAMPLE:jedda, we'll wait for the EXAMPLE:mark value to jump to 0 before retrieving data. This will ensure that we don't prematurely save bogus data when frame is written to.

Let's look at the CAspy script:

$ caspy --from-ioc 'scan=scan(1)' 'frame=frame(0)' prefix=EXAMPLE:              \
        --demote scan frame                                                     \
        --from-epics EXAMPLE:jedda when=\"EXAMPLE:mark==0\"                     \
        --reshape 64 32                                                         \
		--to-hdf5 my-data.h5#scan-{scan:03d}/{tag} index=frame mode=a+ strict=0 \
		--to-summary mode=table

On the first run, this will generate the following output:

# EXAMPLE_jedda
my-data.h5#scan-001/EXAMPLE_jedda[0]

This means that the first available image has been written to disk. Now subsequent calls to caput EXAMPLE:frame <nr>, with increasing scan indices (1, 2, 3, ...), will let CAspy continue writing as frames pour in:

...
my-data.h5#scan-001/EXAMPLE_jedda[1]
my-data.h5#scan-001/EXAMPLE_jedda[2]
my-data.h5#scan-001/EXAMPLE_jedda[3]
...

If we change the scan number: caput EXAMPLE:scan 2, CAspy will continue writing to the same "frame index", but to a different data set:

my-data.h5#scan-002/EXAMPLE_jedda[3]
...

If we attempt to reset the frame index to a value that would overwrite existing data (e.g. caput EXAMPLE:frame 0), in the configuration above CAspy will cowardly refuse to do it, but will not crash (as it was instructed not to crash):

ERROR:camagick.sink.hdf5:msg="Overwrite not permitted" index=0 file=my-data.h5 frames=3 dataset="/scan-002/EXAMPLE_jedda" file=""
ERROR:camagick.sink.hdf5:msg="Ignoring frame index error as instructed" tag=None

Most of this functionality is owing to the specifics of the HDF5 saving module. Familiarize yourself with that -- it has many features that will happily load the gun for you if you wish to shoot yourself in the foot :-) But was designed with what we believed were "useful defaults" in mind.

Now that the CAspy command saves data as it's supposed to, we can extend it to generate a ROI, too, by inserting a suitably-placed fanout in the pipeline: { [ --slice 16:48 8:26 --npfun sum --rename {}_roi ] [ ] }

The full command then looks like this:

$ caspy --from-ioc 'scan=scan(1)' 'frame=frame(0)' prefix=EXAMPLE:              \
        --demote scan frame                                                     \
        --from-epics EXAMPLE:jedda when=\"EXAMPLE:mark==0\"                     \
        --reshape 64 32                                                         \
        { [ --slice 16:48 8:26 --npfun sum --rename {}_roi ]  [ ] }             \
        --to-hdf5 my-data.h5#scan-{scan:03d}/{tag} index=frame mode=a+ strict=0 \
        --to-summary mode=table

An empty chain [ ] works exacly like --idem: just passes data through, effectively creating a copy if placed alone in a fanout branch.

Concept and Design Principles

CLI Usage

  • command line, ... possibly GUI..., built-in documentation

Data Model

Processing Model

Command Line Structure

YAML Recipes

API Based Operation

A Simple Example

CAmagick can equally well be used inside Python scripts, by directly invoking its API. The API structure mimics the command line structure fairly directly, i.e. there are Python objects for:

  • data sources, by default in camagick.source.*
  • data sinks, by default in camagick.sink.*
  • data processing nodes, by default in camagick.pipe.*
  • flow-control nodes, by default in camagick.flow.*.

Every submodule in the respective category (e.g. camagick.source.epics, or camagick.pipe.slice) corresponds to a processing module available on the command line or the YAML specification, as described in earlier chapters (e.g. --from-epics, or --slice).

Within every module there is a class called Processor. After it is instantiated with the same options documented for its command line usage, they can be used out of the box. Here is a "Hello, World!"-class example:

#!/usr/bin/python3

from camagick.source.ioc import Processor as IocSource
from camagick.sink.summary import Processor as SummarySink
from camagick.flow.chain import Processor as ChainFlow
from camagick.flow.fanout import Processor as FanoutFlow

import asyncio

async def main():
    chain = ChainFlow(
        IocSource(foo="foo(0)", prefix="TEST:"),
        SummarySink()
    )

    await chain.startup()
	while True:
	    await chain(data={}, context={})
	await chain.shutdown()

asyncio.run(main())

In this example, we've demonstrated the main Processor features directly, namely:

  • initialization of every processing and/and flow control element, with the necessary parameters,
  • a .startup() async method, called only once for preparatory work (the flow control processors will recursively call the startup methods of their children nodes)
  • a .__call__() method, called once per loop, intendend to do the data-based work as advertised,
  • a .shutdown() method, to be called once before application shutdown.

A Slightly More Complex Example

The CAmagick API also offers a general-purpose object camagick.executor.PipeExecutor, which implements a very similar of this startup-call-shutdown loop -- just with more precautions and adequate error handling. Here's an example on how to use it, together with a slightly more complex processing chain, involving other flow elements:

#!/usr/bin/python3

from camagick.source.scratch import Processor as ScratchSource
from camagick.sink.plot import Processor as PlotSink
from camagick.pipe.reshape import Processor as ReshapePipe
from camagick.pipe.npfun import Processor as NpfunPipe
from camagick.pipe.stack import Processor as StackPipe
from camagick.pipe.slice import Processor as SlicePipe
from camagick.pipe.rename import Processor as RenamePipe
from camagick.flow.chain import Processor as ChainFlow
from camagick.flow.fanout import Processor as FanoutFlow

from camagick.executor import PipeExecutor

import asyncio

async def main():
    chain = ChainFlow(
        ScratchSource("random", foo=2048),
        FanoutFlow(
            RenamePipe("{}_original"),
            ChainFlow(
                ReshapePipe(64, 32),
                RenamePipe("{}_image")
            ),
            ChainFlow(
                SlicePipe(":"),
                NpfunPipe("sum"),
                RenamePipe("{}_roi"),
                StackPipe(),
            ),
        ),
        PlotSink()
    )

    await PipeExecutor(chain).run()

asyncio.run(main())

This would result in an application that resembles some version of our initial examples, only this time with a graphical Matplotlib output similar to the following:

API Example Screenshot{width=50%}

Internal Customization

The real beauty of the API usage comes into play when one starts defining their own sourcing, sinking or processing nodes. The easiest way to do that is by virtue of an async function, e.g. like this:

#!/usr/bin/python3

from camagick.sink.plot import Processor as PlotSink
from camagick.flow.chain import Processor as ChainFlow
from camagick.executor import PipeExecutor

import asyncio, time, numpy

## Some oscillation parameters
om = 0.5 * numpy.pi*2
kx = numpy.arange(0, 5*numpy.pi, 0.01)

async def my_source(data=None, context=None):
    return {
        'custom': numpy.sin(kx + time.time()*om)
    }

async def main():
    chain = ChainFlow(
        my_source,
        PlotSink()
    )
    await PipeExecutor(chain).run()

asyncio.run(main())

Here we've used a customized data source my_source. Normally this would be defined by subclassing camagick.processor.ProcessorBase, but in its most simple incarnation, this can also be defined as a simple async function. It is expected to return a dictionary of data.

Note that the custom source is listed in chain by its name only -- it is not called. The calling will be done by the PipeExecutor during its main loop behind .run(), repeatedly.

Here, it generates a time-dependent sinus with hard-coded parameters.

The output of the program should look like this, with the sinus plot gently sliding from right to left, with a speed of 2 seconds per period:

Sliding Sinus Screenshot{width=50%}

Bluesky Callbacks

[FIXME: to be completed, missing examples!]

A good application scenario for CAmagick's API use is to rapidly build Bluesky Callbacks that do more sophisticated data processing -- e.g. take live measurement data and publish rapid processing results and information to the EPICS network, or show rapid previews.

Miscellaneous

What is "Monday-Morning Integration"?

Everyone will agree: cutting-edge research is messy, from a technical point of view.

Yes, the typical experimental setup is in a shape that could be roughly describen as "stable, mostly". But the accepted reality of research is that minor adjustments are required constantly: more often than not, new equipment and measurement methods need to be (re-)introduced, adjusted, adapted, in order to accomodate new ideas, methods, and guest fellow scientists. Of course, all of this is for good reason -- it simply is a time-proven concept, as far as good science is concerned.

Also accepted as a reality of research is that, as a consequence of necessary ad-hoc modifications of an existing experimental setup, two circumstances are largely inevitable:

  • that integration of new equipment and measurement details often takes up many days at the beginning of a new experimental session -- the bulk of a typical week-long beamtime;
  • that afterwards the scientist needs to spend a considerable amount of their productive time piecing together their beamtime spoils: data from various .txt, .ini, .png files, scattered all over /tmp, ~/data or ~/user folders.

But we'd like to argue otherwise.

Planning and performing good science is daunting enough already. Purely technical challenges should be solved by engineers, not scientists. In particular:

  • new features and equipment should be "integrated" into the existing setup: notwithstanding the ad-hoc character, they should be 1st-class citizens with respect to control, experimental orchestration, and data storage;

  • for integration to be the norm rather than the exception, it should be finished swiftly on "monday-morning": not only "faster" than before, but actually on the first day of your endeavor; actually, even before your coffee gets cold.

The tongue-in-cheek "before coffee gets cold" remark can obviously be understood as a bit of an exaggeration... but can it, really? And why should it?

This is the question that drives us.

Obviously, there will be limits to how complex a method can be dealt with in an ad-hoc way. But in reality, many adaptations are very simple, boiling down to usage of a standard piece of equipment, in a standard way, or an additional step of simple post-processing yielding data that needs to be saved along with the main measurement signal. These are the low-hanging fruit that CAspy is trying to pluck.

Its usage paradigm, and its data & and processing models, are carefully designed to make simple tasks easy while keeping complex operations possible.

Limits

a.k.a. "what CAspy is not"

  • visualization framework
  • EPICS GUI / panel designer
  • experimental control or orchestration interface

Bugs & Caveats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camagick-0.5.1.tar.gz (738.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

camagick-0.5.1-py3-none-any.whl (131.4 kB view details)

Uploaded Python 3

File details

Details for the file camagick-0.5.1.tar.gz.

File metadata

  • Download URL: camagick-0.5.1.tar.gz
  • Upload date:
  • Size: 738.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for camagick-0.5.1.tar.gz
Algorithm Hash digest
SHA256 3fb60dfae1f8bda38a19b1280e6d5fcd6d35cd658ed5f487e45d5a930f1ba834
MD5 87e6138bd41ca28859942d60aeb36d16
BLAKE2b-256 4411d6e46acad96388f8f6657a3b747361ea46a84c9dd321713769535e7f81f5

See more details on using hashes here.

File details

Details for the file camagick-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: camagick-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 131.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for camagick-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f760955fd80250ffd1d1cf971246103accf0ea079cf814d0d17d5a2b0879a82
MD5 e52cb890ec3028c2715c6c622cafde03
BLAKE2b-256 d257daee12828735c244d6a033954ea7ee5fde6dbf270f0115edf087fe8dfc59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page