A GW data manager package and more
Project description
This package aims at providing a unified and easy to use interface to access Gravitational Wave (GW) data and output some well organised datasets, ready to be used for Machine Learning projects or Data Analysis purposes (source properties, noise studies, etc.).
Data Preparation Workflow
The typical use case of this package is data acquisition and preparation, which can be seen, in a pipeline, as the preliminary stage for Data Analysis. Although it is primarily meant for GW data, it is built to be sufficiently generic to handle any data type. The basic workflow that can be implemented with this package is the following:
Data acquisition: Data from GW detectors can be fetched from different sources, such as the local storage or, from remote, from the Gravitational Wave Open Science Center (GWOSC). Local data can be natively read in GW frame file format (gwf) or in hdf5 format. Preprocessed datasets by GWdama are saved in the latter format. When combined with other Python packages, for example Pandas, other kind of data can be read and manipulated with GWdama. In practice, evrything that can be mapped to a NumPy type is a valid data format;
Organisation into groups: Raw and processed data, from various acquisition channels can be organised in a hierarchical way in groups and subgroups, each containing their own metadata and methods;
Data visualization and pre-processing: This package includes some functions commonly used to visualise and pre-process data, such as filtering and spectral analysis methods. These operations can be performed in the “data preparation” stage of a pipeline, before storing the final dataset and/or forwarding it to the subsequent Data Analysis. This processed data is conveniently organised into new groups, and the “raw” ones can be removed to save memory;
Reading and Writing: Once the dataset specific for a task has been created, this can be saved to disk into hdf5 format, preserving all the hierarchical group and sub-group structure and the metadata. This can be readily read back by GWdama for further data manipulation and preparation.
GW data manager package overview
GWdama currently comprises the main class GwDataManager, which behaves as a multi-purpose and multi-format container for data. This is based on the h5py.File class with the addition of the methods and the attributes to import and manipulate GW data. Differently from the common application of h5py.File objects, a GwDataManager instance is, by default, set to occupy only a temporary file or some space in the RAM, which is authomatically deleted by python once the program is closed. Refer to the full documentation for further details.
Inside GwDataManager objects, data is stored into Dataset objects, organised into a hierarchical structure of h5py.Groups and sub-groups. These Datasets are created within an instance of GwDataManager with the usual methods of h5py: create_dataset(name, shape, dtype). They contain data, typically of numeric type, and some attributes (or metadata). For example, for GW data, and in general all time series, it is important the information of when they have been recorded, and at which sampling frequency. A name and a unit are also useful. These can be conveniently added and customised. Also, a GwDataManager object contains attributes for itself.
Installation
GWdama can be installed via pip:
$ pip install gwdama
and requires Python 3.6.0 or higher. The previous command automatically fulfils all the required dependencies (like on numpy, matplotlib), so you are ready to start generating datasets and making plots.
Further details can be found in the documentation.
Alternatively, it can be installed via Conda, specifying fdirenzo as channel (soon it will available via conda-forge):
$ conda install -c fdirenzo gwdama
Quick start
A dataset of, say, random numbers can be readily created with the aid of numpy.random routines:
>>> from gwdama.io import GwDataManager >>> import numpy as np >>> dama = GwDataManager("my_dama") >>> dama.create_dataset('random', data=np.random.normal(0, 1, (10,)))
The string representation of the GwDataManager class provides a quick look at its structure and its attributes:
>>> print(dama) my_dama: └── random Attributes: dama_name : my_dama time_stamp : 20-07-28_19h36m47s
Other attributes can be added to both the GwDataManager object and the Datasets therein:
>>> dama.attrs['owner'] = 'Francesco' # The new attribute "owner" is added with value "Francesco" >>> dama.show_attrs my_dama: └── random Attributes: dama_name : my_dama owner : Francesco time_stamp : 20-07-28_19h36m47s
Datasets can be accessed from their keys, as reported in the structure shown above, with a syntax similar to that for Python dictionaries:
>>> dset = dama['random'] # 'random' is the dataset key >>> dset.attrs['t0'] = 0 # It is conveninet to use gps times >>> dset.attrs['fsample'] = 10 # measured in Hz >>> dset.show_attrs fsample : 10 t0 : 0
To get the data contained in this dataset, call its attribute data:
>>> dset.data array([-0.73796689, -1.34206706, -0.97898291, -0.19846702, -0.85056961, 0.20206334, 0.84720009, 0.19527366, -0.9246727 , -0.04808732])
Writing and reading datasets
So far, data is stored on temporary or volatile memory. To secure it to disk, we can call the write method of our GwdataManager object:
>>> out_f = 'out_dataset.h5' >>> write_gwdama(out_f)
Then remember to close your previous file before leaving the session:
>>> dama.close() >>> del dama # Redundant...
To read back the data:
>>> new_dama = GwDataManager(out_f) # Same namse as the line above Reading dama >>> print(new_dama) my_dama: └── random Attributes: dama_name : my_dama owner : Francesco time_stamp : 20-07-30_12h19m32s
Read open data
Open data can be accessed from both online and local virtual disks provided by CVMFS.
From online GWOSC
GW strain data can be read by means of the .read_gwdata() method. This basically takes as input an interval of time, which can be provided as a float in gps units or in UTC, in a human readible format (see next example), besides the label of the detector (H1, L1 or V1):
>>> event_gps = 1186746618 # GW170814 >>> dama = GwDataManager() # Default name 'mydama' assigned >>> dama.read_gwdata(event_gps - 50, event_gps +10, ifo='L1', # Required params data_source="gwosc-online", # data source (optional, already implicit) dts_key='online') # group key (optional, but useful)
From local CVMFS
CernVM-FS must be installed and configured on your computer. Refer to its description on the GWOSC website or to this Quick start guide.
Assuming your data are stored at the following path (you can always modify it by passing it as a parameter to read_gwdata()):
cvmfs_path = '/data2/cvmfs/gwosc.osgstorage.org/gwdata/'
data can be read with:
>>> start='2017-06-08 01:00:00' # starting UTC time as a string >>> end='2017-06-08 02:00:00' # ending time as a string >>> ifo='H1' # interfereometer tag >>> rate='4k' # sample rate: 4k or 16k >>> frmt='hdf5' # format of the data: gwf or hdf5 >>> dama.read_gwdata(start, end, data_source="gwosc-cvmfs", ifo=ifo, data_format=frmt)
Changelog
0.5.2
Added the optional return_output parameter to .read_gwdata(...) to allow (if True) aving a Dataset or a Group as the output of this method. The corrisponding data is added in any case to the GwDataManager object.
0.5.1
.plot() method for Dataset class. Mainly aimed at time sereis data, with t0 and sample_rate attributes;
0.5.0
If one passes ffl_spec or ffl_path or gwf_path parameter to read_gwdata, then data_source is automatically set to local;
Some parameter names have been slightly simplified. E.g.: m_data_source -> data_source;
hist method of Dataset <https://gwnoisehunt.gitlab.io/gwdama/dataset.html>`_s now has a ``ax` parameter to specify an existing matplotlib axes.
0.4.5
Added interface with GWpy;
Multi-Taper Method.
0.4.1
Methods: hist, duration;
Attributes: groups;
Preprocessing functions: PSD, whiten, taper.
0.4.0
Implemented support for data on Virgo Farm.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gwdama-0.5.3.tar.gz
.
File metadata
- Download URL: gwdama-0.5.3.tar.gz
- Upload date:
- Size: 238.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3097556ced9f78685fe36a44330545ba759dbc231e47503c7d46ff10f0307f20 |
|
MD5 | f84985c68a8cb584e8f1319590aef245 |
|
BLAKE2b-256 | 410431818c2737a3cf28871904dceb972ed77c7b3d7038b3b8146ef05b07d34d |
File details
Details for the file gwdama-0.5.3-py3-none-any.whl
.
File metadata
- Download URL: gwdama-0.5.3-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b12c2e7919df2cb4a3f8edea3d92aa28c45d97bead7ac6eaaa6a96d50b980af0 |
|
MD5 | 1844854da82b537fe8b196c8a338b9bd |
|
BLAKE2b-256 | 7cc92c3ce7a70a3ae5cee9a5c1818d89f1728dd74dbe608e77124b3ace29e0e8 |