Skip to main content

Package to remove cosmic spikes from Raman Spectra.

Project description

Spyky

Spyky incorperates the removal of cosmic spikes from Raman Spectra, as described by Whitaker & Hayes [1], into a python package compatible with sklearn pipelines and parameter optimization.

Reading .spc files

Spyky provides the ability to read several .spc files stored in one location, using the spc-io package. Currently spyky only supports .spc files with a global X and single Y array.

read_spc() returns: the spectra in an array of shape (n_files, n_wavelengths), meaning one row is one spectrum; the wavelengths (note that all .spc files must have the same wavelengths); the names of the read files.

>>> from spyky.reader import read_spc

>>> path = r"./spectra_bin"
>>> spectra, wavelength, names = read_spc(path)

>>> print(spectra)
[[  684.   721.   776. ... 22819. 22517. 22036.]
 [  667.   724.   770. ... 22575. 22275. 21819.]
 [  676.   726.   775. ... 22618. 22346. 21851.]]

>>> print(wavelength)
[ 32.1   34.4   36.6   ...   3286.9   3287.8   3288.7 ]

>>> print(names)
['example_file_1.spc', 'example_file_2.spc', 'test_file_1.spc']

By specifying pattern you can filter which files to read. By default all .spc files in the specified path are read. The expressions are matched by fnmatch.

>>> path = r"./spectra_bin"
>>> s, w, names = read_spc(path, pattern="example*.spc")
>>> print(names)
['example_file_1.spc', 'example_file_2.spc']

>>> s, w, names = read_spc(path, pattern="example*1*.spc")
>>> print(names)
['example_file_1.spc']

The is also the option to export the read files as a .csv by specifying export_to. The header of the .csv file will contain the wavelength.

>>> s, w, names = read_spc(path, export_to=r"./spectra.csv")
>>> print(names)
['example_file_1.spc']

Spike Removal

The class DeSpike is written so that it seamlessly integrates into sklearn preprocessing piplines and is compatible with hyperparameter optimization like GridSearchCV. Therefore .fit() and .transform() methods are implemented. Each take the spectra as an input. First use .fit() to calculate the modified z-scores, then use .transform() to perform the correction, as explained in [1].

>>> from spyky.spikes import DeSpike

>>> spiky = Despike(window=5, threshold=6)
>>> spiky.fit(spectra)

>>> despiked = spiky.transform(spectra)
>>> print(despiked)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

In a pipeline this might look like:

>>> from sklearn.pipeline import make_pipeline
>>> from spyky.reader import read_spc
>>> from spyky.spikes import DeSpike

>>> s, w, n = read_spc(r"/home/arle/MSC/Code/spectra_bin/")

>>> pipe = make_pipeline(DeSpike(window=5, threshold=6))
>>> pipe.fit(s)
>>> corrected = pipe.transform(s)
>>> print(corrected)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

Use "despike__window" and "despike__threshold to test different values through param_grid in GridSearchCV

Sometimes it can happen that the algorithm fasly identifies steep sections of the normal spectra as spikes. If this happens you can use the ignore and ignore_ref to supply an array containing the wavelengths you want to be ignored. The index of the wavenumber array will be used, if you do not supply ignore_ref. Please note your input to ignore must match that of ignore_ref this is easiest to achieve through the use of a mask. Below you can see an example.

wcut = (w > 500) & (w < 1000)
spiky = DeSpike(threshold=3.3, ignore=w[wcut], ignore_ref=w)

References

[1] D. A. Whitaker and K. Hayes, "A simple algorithm for despiking Raman spectra," Chemometrics and Intelligent Laboratory Systems, vol. 179, pp. 82-84, Aug. 2018, doi: 10.1016/j.chemolab.2018.06.009.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spyky-1.0.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spyky-1.0.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file spyky-1.0.0.tar.gz.

File metadata

  • Download URL: spyky-1.0.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3c28979b788f65dcf96bc5496fac4820f3d82d676c81c6aa9fd47f06efa91cb3
MD5 9088f861d7058430fb339563f5007dd0
BLAKE2b-256 0e9569d53a3237c0d7b9a862e100c5cd42d5bed14d80dd9a9767b2026b24599c

See more details on using hashes here.

File details

Details for the file spyky-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: spyky-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2a87571b45c8ae155532419724cdd9e439710868d5db80886b2bdbabb7deade
MD5 c1532c9b386616055fbf4bb2625a70da
BLAKE2b-256 107d093af70871409909da6b06d8f467338a470c333e6d814b8728961d442e87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page