Skip to main content

Package to remove cosmic spikes from Raman Spectra.

Project description

Spyky

Spyky incorperates the removal of cosmic spikes from Raman Spectra, as described by Whitaker & Hayes [1], into a python package compatible with sklearn pipelines and parameter optimization.

Reading .spc files

Spyky provides the ability to read several .spc files stored in one location, using the spc-io package. Currently spyky only supports .spc files with a global X and single Y array.

read_spc() returns: the spectra in an array of shape (n_files, n_wavelengths), meaning one row is one spectrum; the wavelengths (note that all .spc files must have the same wavelengths); the names of the read files.

>>> from spyky.reader import read_spc

>>> path = r"./spectra_bin"
>>> spectra, wavelength, names = read_spc(path)

>>> print(spectra)
[[  684.   721.   776. ... 22819. 22517. 22036.]
 [  667.   724.   770. ... 22575. 22275. 21819.]
 [  676.   726.   775. ... 22618. 22346. 21851.]]

>>> print(wavelength)
[ 32.1   34.4   36.6   ...   3286.9   3287.8   3288.7 ]

>>> print(names)
['example_file_1.spc', 'example_file_2.spc', 'test_file_1.spc']

By specifying pattern you can filter which files to read. By default all .spc files in the specified path are read. The expressions are matched by fnmatch.

>>> path = r"./spectra_bin"
>>> s, w, names = read_spc(path, pattern="example*.spc")
>>> print(names)
['example_file_1.spc', 'example_file_2.spc']

>>> s, w, names = read_spc(path, pattern="example*1*.spc")
>>> print(names)
['example_file_1.spc']

The is also the option to export the read files as a .csv by specifying export_to. The header of the .csv file will contain the wavelength.

>>> s, w, names = read_spc(path, export_to=r"./spectra.csv")
>>> print(names)
['example_file_1.spc']

Spike Removal

The class DeSpike is written so that it seamlessly integrates into sklearn preprocessing piplines and is compatible with hyperparameter optimization like GridSearchCV. Therefore .fit() and .transform() methods are implemented. Each take the spectra as an input. First use .fit() to calculate the modified z-scores, then use .transform() to perform the correction, as explained in [1].

>>> from spyky.spikes import DeSpike

>>> spiky = Despike(window=5, threshold=6)
>>> spiky.fit(spectra)

>>> despiked = spiky.transform(spectra)
>>> print(despiked)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

In a pipeline this might look like:

>>> from sklearn.pipeline import make_pipeline
>>> from spyky.reader import read_spc
>>> from spyky.spikes import DeSpike

>>> s, w, n = read_spc(r"/home/arle/MSC/Code/spectra_bin/")

>>> pipe = make_pipeline(DeSpike(window=5, threshold=6))
>>> pipe.fit(s)
>>> corrected = pipe.transform(s)
>>> print(corrected)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

Use "despike__window" and "despike__threshold to test different values through param_grid in GridSearchCV

Sometimes it can happen that the algorithm fasly identifies steep sections of the normal spectra as spikes. If this happens you can use the ignore and ignore_ref to supply an array containing the wavelengths you want to be ignored. The index of the wavenumber array will be used, if you do not supply ignore_ref. Please note your input to ignore must match that of ignore_ref this is easiest to achieve through the use of a mask. Below you can see an example.

wcut = (w > 500) & (w < 1000)
spiky = DeSpike(threshold=3.3, ignore=w[wcut], ignore_ref=w)

References

[1] D. A. Whitaker and K. Hayes, "A simple algorithm for despiking Raman spectra," Chemometrics and Intelligent Laboratory Systems, vol. 179, pp. 82-84, Aug. 2018, doi: 10.1016/j.chemolab.2018.06.009.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spyky-1.0.4.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spyky-1.0.4-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file spyky-1.0.4.tar.gz.

File metadata

  • Download URL: spyky-1.0.4.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.4.tar.gz
Algorithm Hash digest
SHA256 ee9bcd156fa04def7e1251596b0daa64e64b76655d1cb8cd304ba9a701650d5a
MD5 026b526316909d9c33030e4c556fb995
BLAKE2b-256 49cc668efd7856ba42dc1fe0ad827d6ccb513781452bba05f639848a808bb998

See more details on using hashes here.

File details

Details for the file spyky-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: spyky-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b79ec82ec151e4b509bcc2b5927bd8b684f54e61d4afa2f26391630a966183c1
MD5 745ace162ef89b397271444e102df85d
BLAKE2b-256 cdc615a6a55da47bd2e44109d1c2dff58c51f8a264c12ac616363e64ff6c24df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page