Skip to main content

Package to remove cosmic spikes from Raman Spectra.

Project description

Spyky

Spyky incorperates the removal of cosmic spikes from Raman Spectra, as described by Whitaker & Hayes [1], into a python package compatible with sklearn pipelines and parameter optimization.

Reading .spc files

Spyky provides the ability to read several .spc files stored in one location, using the spc-io package. Currently spyky only supports .spc files with a global X and single Y array.

read_spc() returns: the spectra in an array of shape (n_files, n_wavelengths), meaning one row is one spectrum; the wavelengths (note that all .spc files must have the same wavelengths); the names of the read files.

>>> from spyky.reader import read_spc

>>> path = r"./spectra_bin"
>>> spectra, wavelength, names = read_spc(path)

>>> print(spectra)
[[  684.   721.   776. ... 22819. 22517. 22036.]
 [  667.   724.   770. ... 22575. 22275. 21819.]
 [  676.   726.   775. ... 22618. 22346. 21851.]]

>>> print(wavelength)
[ 32.1   34.4   36.6   ...   3286.9   3287.8   3288.7 ]

>>> print(names)
['example_file_1.spc', 'example_file_2.spc', 'test_file_1.spc']

By specifying pattern you can filter which files to read. By default all .spc files in the specified path are read. The expressions are matched by fnmatch.

>>> path = r"./spectra_bin"
>>> s, w, names = read_spc(path, pattern="example*.spc")
>>> print(names)
['example_file_1.spc', 'example_file_2.spc']

>>> s, w, names = read_spc(path, pattern="example*1*.spc")
>>> print(names)
['example_file_1.spc']

The is also the option to export the read files as a .csv by specifying export_to. The header of the .csv file will contain the wavelength.

>>> s, w, names = read_spc(path, export_to=r"./spectra.csv")
>>> print(names)
['example_file_1.spc']

Spike Removal

The class DeSpike is written so that it seamlessly integrates into sklearn preprocessing piplines and is compatible with hyperparameter optimization like GridSearchCV. Therefore .fit() and .transform() methods are implemented. Each take the spectra as an input. First use .fit() to calculate the modified z-scores, then use .transform() to perform the correction, as explained in [1].

>>> from spyky.spikes import DeSpike

>>> spiky = Despike(window=5, threshold=6)
>>> spiky.fit(spectra)

>>> despiked = spiky.transform(spectra)
>>> print(despiked)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

In a pipeline this might look like:

>>> from sklearn.pipeline import make_pipeline
>>> from spyky.reader import read_spc
>>> from spyky.spikes import DeSpike

>>> s, w, n = read_spc(r"/home/arle/MSC/Code/spectra_bin/")

>>> pipe = make_pipeline(DeSpike(window=5, threshold=6))
>>> pipe.fit(s)
>>> corrected = pipe.transform(s)
>>> print(corrected)
[[  802.75   721.     776.   ... 22819.   22517.   22982.  ]
 [  811.     731.     783.   ... 22947.   22662.   23119.6 ]
 [  796.5    724.     770.   ... 22575.   22275.   22719.6 ]]

Use "despike__window" and "despike__threshold to test different values through param_grid in GridSearchCV

Sometimes it can happen that the algorithm fasly identifies steep sections of the normal spectra as spikes. If this happens you can use the ignore and ignore_ref to supply an array containing the wavelengths you want to be ignored. The index of the wavenumber array will be used, if you do not supply ignore_ref. Please note your input to ignore must match that of ignore_ref this is easiest to achieve through the use of a mask. Below you can see an example.

wcut = (w > 500) & (w < 1000)
spiky = DeSpike(threshold=3.3, ignore=w[wcut], ignore_ref=w)

References

[1] D. A. Whitaker and K. Hayes, "A simple algorithm for despiking Raman spectra," Chemometrics and Intelligent Laboratory Systems, vol. 179, pp. 82-84, Aug. 2018, doi: 10.1016/j.chemolab.2018.06.009.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spyky-1.0.3.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spyky-1.0.3-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file spyky-1.0.3.tar.gz.

File metadata

  • Download URL: spyky-1.0.3.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.3.tar.gz
Algorithm Hash digest
SHA256 c493b488d7c816fed8e6657932da54189361a806d52fdcfd6c1ff0368d022d79
MD5 0512bfa53aeb9d39ea121461fbb3d437
BLAKE2b-256 9bebe03067c5c642b54ef20b0a63f9f81c900d9273aa9d77d10b8d599fe906b7

See more details on using hashes here.

File details

Details for the file spyky-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: spyky-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for spyky-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fb50bb419c66a458facfc5c7b144857a8d713bd4885909d470c9aca04013fa6a
MD5 c98b1c3cb47fb3a9137f5d001145f060
BLAKE2b-256 76113b0cd371e197ffdd4f039fe190876b141dcc0ff8519b1de66bd9233b14b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page