object-oriented N-dimensional data processing with notebook functionality
Project description
To learn more about pyspecdata, you can head over to the documentation.
If you already know that you want to install, and you are using Anaconda, you should see conda_upgrade.md.
Please note this package is heavily utilized by three other packages that our lab manages on github:
We have somewhat recently added fast compiled Fortran functions for things like 2D ILT (Tikhonov regularization with basis set compression) for NMR (Nuclear Magnetic Resonance), so please read the install instructions carefully!
pySpecData
Object-oriented Python package for processing spectral data – or in general, n-dimensional data with labeled axes (i.e. N-dimensional gridded data or “nddata”). It depends on numpy, which provides very fast manipulations of N-dimensional gridded arrays (“ndarray”). This package has some overlap with xarray, but it doesn’t attempt to mimic pandas notation, shooting instead for very compact notation for natural slicing, etc. It mainly focuses on making it easier to quickly write good code for processing spectroscopy data. In particular, it takes care of various features related to fourier transformation, error propagation, and direct products in multidimensional data with little to no interaction from the user.
If you are working in a lab developing new spectroscopic methodologies, then this package is definitely for you. If you deal with multi-dimensional data of some other form, then it’s likely for you. Features include:
Features
Labeled axes allow one to manipulate datasets (potentially with different dimensions) without having to explicitly keep track of what the different dimensions correspond to. Code becomes more legible. Also, tiling, direct product, and gridding functions become obsolete.
Fourier transformation with automatic manipulation of axes.
Automatic error propagation.
Commands like plot(data) will generate a plot with automatically labeled axes, errors, and units. All of this information is also written to HDF5 when the data is saved.
Simplified curve fitting that takes advantage of labeled axes and Python’s symbolic algebra package (sympy).
The code is written so that it can be integrated into a nicely formatted PDF lab notebook.
The same code can be run on the command line (to generate pop-up plot windows) and embedded into a LaTeX document.
Extension to other output formats, such as HTML or markdown, should be relatively straightforward.
In a multimedia environment like jupyter, you don’t need a separate plot command. The code can automatically choose a plotting style appropriate to the code (eventually, the general preferences for this can just be configured at the beginning of the jupyter notebook).
More detailed web documentation will be coming soon.
NMR/ESR specific
Because it was written primarily for NMR and ESR data, it also includes:
Routines for reading commercial raw data (e.g. Bruker, Kea) into nddata objects with all relevant information.
The object-oriented features make it much easier to process raw phase-cycled data and to simultaneously view multiple (potentially interfering) coherence pathways.
Contains functions for baseline correction, peak integration, etc.
(Not yet in packaged version) A basic compiled routine for propagating density matrices that can be used to predict the response to shaped pulses.
Version Notes
Note that the current version is intended just for collaborators, etc. (Though, if you do really want to use it for interesting science, we are happy to work with you to make it work for your purposes.) A public-use version 1.0.0, to be accompanied by useful demonstrations, is planned within a year. (Note that the email currently linked to the PyPI account is infrequently checked –if you have interest in this software, please find J. Franck’s website and contact by that email.)
History/Roadmap
(Current version in bold)
- 0.9.5
First version distributed on pypi.python.org.
- 0.9.5.1
0.9.5.1.1 Some important debugging, and also added pyspecdata.ipy → executing the following at the top of a jupyter notebook:
%pylab inline %load_ext pyspecdata.ipy
will cause nddata to “display” as labeled plots.
0.9.5.1.2 added ability to load power saturation 2D data from Bruker
0.9.5.1.3 XEpr data loaded with dBm units rather than W units
added to_ppm function for Bruker files
0.9.5.1.4 Improved internal logging, and started to remove gratuitous dependencies, %load_ext pyspecdata.ipy includes %pylab inline, so that only
%load_ext pyspecdata.ipy
is required for jupyter.
0.9.5.1.6
Removed several legacy modules, and added docstrings for the remaining modules.
Begin phasing out earlier CustomError class.
Make numpy pretty printing available from the general_functions module.
Add xelatex support to the notebook wrapper.
Start to move file search routines away from demanding a single “data directory.”
Improved support for 2D Bruker XEPR
Made it possible to call standard trig functions with nddata as an argument.
- 0.9.5.1.7
ILT (Tikhonov regularization) with SVD Kernel compression (1 and 2 dimensions)
smoosh and chunk deal with axes properly
- 0.9.5.3 Current Version
upgrade to Python 3 and begin to flesh out documentation
- 0.9.5.4
0.9.5.4.1 - to_ppm should only be a method of inherited class - 1.5 and 2.5 D ILT
- 1.0
We are working on four major upgrades relative to the 0.9 sequence:
Axes as objects rather than a set of separate attributes of nddata.
Remove dependence on pytables in favor of h5py.
Replace figure lists with “plotting contexts,” which will still enable PDF vs. GUI plotting, but will better integrated with Qt and more object-oriented in nature
Comma-separated indexing to work correctly with all indexing types. (0.9.5 requires sequential brackets rather than comma-separated indexing for some combined range selections.)
- 1.0.2
GUI for setting configuration directories.
Means for dealing with non-linearly spaced data in image plots (0.9.5 auto-detects log spacing in 1D plots, but pretends that image plots are linear – we will implement linear spline interpolation algorithm)
- 1.0.3
Bruker DSP phase correction for raw data from newer versions of Topspin that is in sync with the code from nmrglue.
- 1.0.4
Package a make-less copy of lapack to allow a cross-platform build of density matrix propagation routines.
- 1.1.0
Integrate with ACERT NLSL Python package for simulation and fitting of ESR spectra.
- 1.2.0
Implement a version of figure list that can be interfaced with Qt.
Installation Notes
Highly Recommended: Install the following packages using a good package-management system (conda or linux package manager), rather than relying on pip or setuptools to install them:
numpy
scipy
sympy
pyqt
pytables (in future work, we hope to eliminate dependence on this package)
matplotlib
h5py
The python libraries, and a Fortran compiler. Under anaconda, these are supplied by libpython and mingw, respectively.
For example, on Windows with Anaconda 2.7. – just run conda install -c anaconda numpy scipy sympy pyqt pytables matplotlib h5py libpython mingw.
On CentOS7, we’ve tested yum install python-matplotlib python-matplotlib-qt4 python-devel sympy h5py python-tables scipy (after running yum install epel-release to install the EPEL distribution)
(If you don’t install these packages with your system pip will try to install them, and there is a good chance it will fail – it’s known not to work great with several of these; setuptools should error out and tell you to install the packages.)
mayavi: Mayavi can be used (and gives very nice graphics), but frequently lags behind common Python distros. Therefore, this package was written so that it doesn’t depend on mayavi. Rather, you can just import mayavi.mlab and pass it to any figure list that you initialize: figlist_var(mlab = mayavi.mlab)
Installation for developers
(Once these are installed, to install from github, just git clone https://github.com/jmfranck/pyspecdata.git then move to the directory where setup.py lives, and do python setup_paramset.py install followed by python setup.py develop)
Important note for conda on Windows 10: For reasons that we don’t understand, the Fortran compiler can give odd errors, depending on which terminal you are using to install. This appears to be Windows’ fault, rather than conda’s (?). We highly recommend trying both the Anaconda prompt, as well as the standard dos prompt (press start: type cmd) if you experience errors related to compilation.
For compiled extensions
- All compiled extensions are currently stripped out, but will be slowly
added back in.
If you are on windows, you will need some additional packages to enable compilation:
libpython
unxutils
mingw
The last two are specific to Windows, and provide things like the gcc and gfortran compiler, as well as make.
Quick-Start
To get started with this code:
Install a good Python 2.7 distribution
On Windows or MacOS: Anaconda 2.7. When installing select “install for all users.”
Install libraries that pyspecdata depends on. (If you’re interested in why you need to do this first, see installation notes below.)
On Windows or MacOS: in the Anaconda Prompt, run conda install numpy scipy sympy pyqt pytables matplotlib h5py libpython mingw.
For Mac, you can also use homebrew. Note that, in the current version python is renamed to python2, and pip to pip2. Most packages can just be installed with pip2 under homebrew. If you want HDF5 functionality, you will need to run brew tap homebrew/science followed by brew install hdf5.
On Linux, just use your package manager (aptitude, yum, etc.) to install these libraries.
Install paramset_pyspecdata: pip install paramset_pyspecdata, then pyspecdata: pip install pyspecdata or follow the “Installation for developers” section above.
If you have difficulties with the install, check that you have a gfortran compiler installed (in conda windows, this comes from mingw) and that, if you are using windows, you are trying to install from a standard dos prompt (we like to use git bash, but anaconda and related compilers can misbehave from git bash sometimes).
Set up directories. You can run the command pyspecdata_dataconfig to assist with this.
It creates a file in your home directory called _pyspecdata (Windows – note the underscore) or .pyspecdata (Mac or Linux).
Here is an example – you can copy and paste it as a starting point:
[General] data_directory = c:/Users/yourusername/exp_data notebook_directory = c:/Users/yourusername/notebook
Note that any backslashes are substituted with forward slashes. Also note that you will need to change the directories to refer to real directories that already exist or that you create on your hard drive (see below). Note that on Windows, you can use notebook, etc. to create this file, but it cannot have a .txt, etc. extension.
Where is my “home directory”? (Where do I put the _pyspecdata file?)
On Windows, your home directory is likely something like C:\Users\yourusername. You can access your home directory by opening any file folder window, and starting to type your name in the address bar – it’s the first folder that shows up underneath.
On MacOS and Linux, it’s the directory indicated by ~. On Linux, this typically expands to /home/yourusername.
On any OS, you can always find your home directory in Python using import os;print os.path.expanduser('~')
What are these directories? → You can either create them or point to existing directories.
data_directory must be set. It is a directory, anywhere on the hard drive, where you store all your raw experimental data. It must contain at least one subdirectory – each subdirectory stores different “experiment types,” typically acquired on different instruments (e.g. you might have subdirectories named 400MHz_NMR, 500MHz_NMR, 95GHz_ESR, and Xband_ESR).
The library now supports having datasets packed into .zip or .tgz files. For example, Bruker NMR files typically comprise a directory with several subdirectories for the numbered experiments. We routinely pack these up as zip files on the spectrometer, and directly read the data from the zip files.
If you’re setting up a lab, you might want to separately sync each different experiment type folders using seafile.
Or you can sync the whole data directory with dropbox.
If set, the notebook_directory is intended to contain latex files with embedded python code, as well as some processed output.
Do not use quotes to surround the directory name. Even if it contains spaces, do not use quotes, and do not escape spaces with backslashes.
Note that on Windows, your desktop folder is typically in C:\Users\yourusername\Desktop
Why do I need to do this?
Setting this configuration allows you to move code between different computers (e.g. a spectrometer computer, a desktop, and a laptop), and re-use the same code, even though the locations of the files are changing. This should work even across different operating systems.
It specifically enables functions like find_file(...), get_datadir(...), etc. that can search the data directory for a file name matching some basic criteria. You should always use these to load your data, and never use the absolute path.
The GUI tool that will allow you to set up _pyspecdata by pointing and clicking has not yet been set up.
Notes on compilation of NNLS
We recently added a compiled extension that performs non-negative least-squares for regularization (DOSY/Relaxometry/etc.)
Under linux or mac, you should have a gcc and gfortran compiler installed, and should make sure you have libpython for this to work.
Under anaconda on windows, we have run into some trouble sometimes where it gives you an error 127. We recommend using the normal dos command prompt (cmd) to install pyspecdata, and make sure that your path is set such that where gcc yields a gcc.exe (NOT .bat) file and where python yields the anaconda python executable. (Recent versions of mingw appear to put .bat files in a preferential location in the path, and these .bat files seem to mess everything up, including compatibility with the git bash prompt.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pySpecData-0.9.5.3.tar.gz
.
File metadata
- Download URL: pySpecData-0.9.5.3.tar.gz
- Upload date:
- Size: 238.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a3ceb6465e2af9aa435fee2bae7db4f627d22442f4a44962e3bb9f65a3e2bdb |
|
MD5 | a2434b11edc6d494a8a2e44fa5564357 |
|
BLAKE2b-256 | 170956d6af0ae4339884f48e5755620a58551feeb384e07940eff1ec2eaaba1b |
File details
Details for the file pySpecData-0.9.5.3-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: pySpecData-0.9.5.3-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 348.8 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16253353e47dffc35ae4f815f9fa26918ca4e5e2309f7b4d994491d9cf267cca |
|
MD5 | 02ed9f814bb5b57e2af9a6b79da805f7 |
|
BLAKE2b-256 | b4c2cc48f53e47713bb6b1b7c3b90ad00c06ed5d836a0e8a638a1abc241a3477 |