Skip to main content

Scientific instruments produce proprietary binary data that contains a multitude of primary and metadata. This project aims to create a software that supports the domain scientist in deciphering this data and metadata as well as taking full advantage of all instrument measurements.

Project description

Documentation on software for deciphering proprietary binary data-files

Scientific instruments produce proprietary binary data that contains a multitude of primary and metadata. This project aims to create a software that supports the domain scientist in deciphering this data and metadata as well as taking full advantage of all instrument measurements.

MARBLE is open und free software and can be found at a Repository

Contributors

  • Steffen Brinckmann (IEK-2, FZJ) [Principal investigator]
  • Volker Hofmann (IAS-9 and HMC, FZJ)
  • Fiona D.Mello (IAS-9 and HMC, FZJ)

Introduction into proprietary binary data-files and MARBLE

All data in all files is stored sequential, similar to a book in which chapters are sequential. In proprietary binary files, the sections can have very different lengths: some section only contains the name of the operator while another section contains thousands of temperature values. These files are called binary because they are not human readable but are a list of 1s and 0s and they are called proprietary because the instrument vendor has designed them particularly for this company or even instrument. As such, these files cannot be deciphered manually and MARBLE supports the scientist in this task.

MARBLE reads the proprietary binary files and - with the help of the scientist - outputs a python converter. This python converter can then be used to translate all proprietary binary files from this instrument into an hdf5-file format, which can be easily read by any computer language. The python converter also acts as verification tool: if a binary file A can be converted by this specific converter, then this file A comes from this instrument. This verification ability is helpful in finding files from a particular instrument.

In MARBLE, data in proprietary binary files is grouped into classes:

  • Metadata is data that describes the experiment. Examples are the name of the operator, the instrument vendor, the start time of the experiment. This metadata is commonly stored in key-value-pairs. For instance, "operator" is the key and "Peter Smith" is the value as both form an inseparable pair. Generally, the first parts of proprietary binary files contain lots of metadata.
  • Primary data is the data that the operator wanted to measure and this primary data has a form of a list. Lets say we want to measure the temperature at our house every 1min and store this information; then temperature is a primary data and stored in a long list. Generally, the instrument also saves the time increment after the measurement start and stores these time increments in a separate list, which is also primary data. Primary data can be of two types floating point numbers with normal or high precision.
  • Undefined sections are those sections of the file which the scientist and MARBLE have not identified yet. Some of these sections might be important or unimportant. Unimportant sections are those where the programmer at the instrument vendor was lazy and did included garbage or empty space. These might also be linked to specific languages the instrument vendor used for programming.

How to install MARBLE

Certain versions of MARBLE will be uploaded to the pypi repository and one can install MARBLE with

pip install pymarble

Alternatively, MARBLE is under development and one can install the latest version by following these steps in a terminal.

cd <new directory>
git pull https://jugit.fz-juelich.de/marble/software.git
cd software
poetry shell

To start graphical user interface (GUI), run the following command in the terminal:

marble-gui

How to use the GUI

  1. Open file by using the first button. File opening takes some time for large files as the file content is automatically analysed.
  2. After automatic analysis, there are lots of undefined sections and it is good to filter these sections out by presssing the filter button (one but last button).
  3. Now go through the sections and label them by entering a "key" and "unit" where applicable
    • Use the "draw" button for primary data, aka lists, because it helps you to identify them.
    • If you want that the converter uses certain data, ensure that the "important" checkbox is ticked.
    • For keeping track of your own progress, you can use the "certainty" traffic-light. If you are unsure use red, medium-sure is yellow and very sure is green.
  4. Especially, for primary data, you can move the beginning and end of a section by clicking the up-down-button and then changing the start and length of the section. This dialog is aware of the binary structure and helps the user make sensible changes.
  5. Once you are done, click the save button (last one) to save a python converter into the directory of the proprietary binary file that you analysed.
  6. Go into the directory and use python converter to convert all files from this instrument by executing the command "python <converter.py> <proprietary_binary_file.dat>". You can add the "-v" option to the command to make the converter more verbose during tranlation. If all the conversions are successful, you deciphered this proprietary binary file successfully.

How to use the command line interface (CLI)

marble-cli

There is a number of tutorials for the CLI:

You can read them and follow those commands. Howevery, you can also just execute them with the argument "m", without the quotation marks. All of these tutorials are in the form of linux scripts, which are used for verification of the code at each development step.

Future features

  1. Add a GUI indicator that primary section has a count & shape property
  2. Change GUI to allow multiple tests in one file
  3. GUI for file-type metadata: instrument, software
    • More metadata from user (more than instrument, software) key-values
  4. Change GUI and backend to identify/check images in files
  5. Directly push translater to repository
  6. Use the comparison of files to identify more metadata
    • ensure that the python import works by doing a diff of present and imported data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymarble-0.9.2.tar.gz (289.9 kB view details)

Uploaded Source

Built Distribution

pymarble-0.9.2-py3-none-any.whl (293.5 kB view details)

Uploaded Python 3

File details

Details for the file pymarble-0.9.2.tar.gz.

File metadata

  • Download URL: pymarble-0.9.2.tar.gz
  • Upload date:
  • Size: 289.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.6 Linux/5.4.0-58-generic

File hashes

Hashes for pymarble-0.9.2.tar.gz
Algorithm Hash digest
SHA256 b7c774977a8038fbc69cec3478f9f103afa1e418bdd7acf51a5ae82993ae5b9e
MD5 0ae684c5ee9f31e0dbe9aa40f4597fc2
BLAKE2b-256 ba3510db7767fd63fdede1dcfe3b59a9aedeb91a4e1f622ff4163ce44ee2721a

See more details on using hashes here.

File details

Details for the file pymarble-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: pymarble-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 293.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.6 Linux/5.4.0-58-generic

File hashes

Hashes for pymarble-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c55e8ce1e7355d7f8deaf11bcb1b115921ee2cb7540890e6d872c362a647aaad
MD5 cb7c201f173f7510728334debe086949
BLAKE2b-256 f9de8ba8475d3e202c2d212ece102f55550c15aa49cd1a7055e9a9ac1d0dde99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page