A unified Python package for accessing various Raman spectroscopy datasets.
Project description
Raman-Data: A Unified Python Library for Raman Spectroscopy Datasets
This project aims to create a unified Python package for accessing various Raman spectroscopy datasets. The goal is to provide a simple and consistent API to load data from different sources like Kaggle, Hugging Face, GitHub, and Zenodo. This will be beneficial for the Raman spectroscopy community, enabling easier evaluation of models, such as foundation models for Raman spectroscopy.
✨ Features
- A single, easy-to-use Python package (planned for PyPI).
- Automatic downloading and caching of datasets from their original sources.
- A unified data format for all datasets.
- A simple function to list available datasets, with filtering options.
🚀 Getting Started
The basic interface for the package is defined in raman_data/__init__.py. Here's a preview of how it will work:
from raman_data import raman_data
# To specify a task type import this enum as well
from raman_data import TASK_TYPE
# List all available datasets
print(raman_data())
# List only classification datasets
print(raman_data(task_type=TASK_TYPE.Classification))
# Load a dataset
dataset = raman_data(name="codina/diabetes/AGEs")
# Access the data, targets, and metadata
X = dataset.data
y = dataset.target
metadata = dataset.metadata
print(X.shape)
print(y.shape)
print(metadata)
For more detailed examples see Demo Notebook.
📚 Available Datasets
Here is the list of datasets that are currently included in the package:
Kaggle
Hugging Face
Zenodo
🎯 Milestones
- View Datasets
- Software architecture with dummy data
- Software tests
- Integration of Kaggle
- Integration of Huggingface
- Integration of Github
- Integration of Zenodo
- Integration of other datasets
- Finalize Package
- Documentation
- Publish to PyPi
🔮 For Later (Future Datasets)
Kaggle
- Cancer Cells SERS Spectra (requires authentification)
GitHub
- Raman Spectra Data
- Raman spectra of pathogenic bacteria (more info on this GitHub page)
- High-throughput molecular imaging
- spectrai raman spectra
Zenodo
Other Sources
- Spectra of illicit adulterants
- Raman Spectrum Matching with Contrastive Representation Learning
- Raman spectra of chemical compounds
- Inline Raman Spectroscopy and Indirect Hard Modeling
- The Effect of Sulfate Electrolytes on the Liquid-Liquid Equilibrium
- In-line Monitoring of Microgel Synthesis (weird format)
- N-isopropylacrylamide Microgel Synthesis
- Nonlinear Manifold Learning Determines Microgel Size from Raman Spectroscopy
- NASA AHEAD
- RRUFF
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file raman_data-0.0.1.tar.gz.
File metadata
- Download URL: raman_data-0.0.1.tar.gz
- Upload date:
- Size: 262.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
241255858e07fe93f772109a7acaeac8e5de592fcaf32af23aa229897d6ceff9
|
|
| MD5 |
b70ec169d9f664297924bba64ad1b8ff
|
|
| BLAKE2b-256 |
17b65a381e179caf11e0a9baa9002e53e9748be219391947cb50855dca6d9f94
|
File details
Details for the file raman_data-0.0.1-py3-none-any.whl.
File metadata
- Download URL: raman_data-0.0.1-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ed93f094185e96e7d5b430f7170f6c33e2fde014a22f38be0bbce6c25d61c2e
|
|
| MD5 |
da444cccc458fe727ca164c7099c2373
|
|
| BLAKE2b-256 |
bac72e550ba27026063728923e36be5eb35db1a191808f7fa9fb6fbf28894c94
|