Skip to main content

Download and load soil spectral data

Project description

SoilSpecData

A Python package for handling soil spectroscopy data, with a focus on the Open Soil Spectral Library (OSSL).

Installation

pip install soilspecdata

If you want to install the development version, run in the project root:

pip install -e .[dev]

Features

  • Easy loading and handling of OSSL dataset
  • Support for both VISNIR (Visible Near-Infrared) and MIR (Mid-Infrared) spectral data
  • Flexible wavenumber range filtering
  • Convenient access to soil properties and metadata
  • Automatic caching of downloaded data
  • Get aligned spectra and target variable(s)
  • Further datasets to come …

Quick Start

# Import the package
from soilspecdata.datasets.ossl import get_ossl

Load OSSL dataset

ossl = get_ossl()

The spectral analysis covers both MIR (400-4000 cm⁻¹) and VISNIR (4000-28571 cm⁻¹) regions, with data reported in increasing wavenumbers for consistency across the entire spectral range.

Ranges of interest can further be filtered using the wmin and wmax parameters in the get_mir and get_visnir methods.

MIR spectra

mir_data = ossl.get_mir(require_valid=True)

VISNIR spectra

Using custom wavenumber range:

visnir_data = ossl.get_visnir(wmin=4000, wmax=25000, require_valid=True)

VISNIR | MIR dataclass member variables

print(visnir_data)
SpectraData attributes:
----------------------
Available attributes: wavenumbers, spectra, measurement_type, sample_ids

Wavenumbers:
-----------
[4000, 4003, 4006, 4009, 4012, 4016, 4019, 4022, 4025, 4029]
Shape: (1051,)

Spectra:
-------
[[0.3859, 0.3819, 0.3792, 0.3776, 0.3769],
 [0.3429, 0.3419, 0.3414, 0.3413, 0.3415],
 [0.3425, 0.3384, 0.3354, 0.3334, 0.3323],
 [0.2745, 0.2754, 0.2759, 0.2761, 0.276 ],
 [0.285 , 0.2794, 0.2755, 0.273 , 0.2718]]
Shape: (64644, 1051)

Measurement type (Reflectance or Absorbance):
--------------------------------------------
ref

Sample IDs:
----------
['FS15R_FS4068', 'FS15R_FS4069', 'FS15R_FS4070', 'FS15R_FS4071',
 'FS15R_FS4072', 'FS15R_FS4073', 'FS15R_FS4074', 'FS15R_FS4075',
 'FS15R_FS4076', 'FS15R_FS4077']
Total samples: 64644

Getting soil properties and other metadata

For instance, to get the CEC (in cmolc/kg) of the samples:

properties = ossl.get_properties(['cec_usda.a723_cmolc.kg'], require_complete=True)

For more details on the OSSL dataset and its variables, see the OSSL documentation. Any column name part of the ossl.properties_cols list can be used as a target or metadatavariable.

ossl.properties_cols
['dataset.code_ascii_txt',
 'id.layer_local_c',
 'id.layer_uuid_txt',
 'id.project_ascii_txt',
 'id.location_olc_txt',
 'id.dataset.site_ascii_txt',
 'id.scan_local_c',
 'longitude.point_wgs84_dd',
 'latitude.point_wgs84_dd',
 'layer.sequence_usda_uint16',
 'layer.upper.depth_usda_cm',
 'layer.lower.depth_usda_cm',
 'observation.date.begin_iso.8601_yyyy.mm.dd',
 'observation.date.end_iso.8601_yyyy.mm.dd',
 'surveyor.title_utf8_txt',
 'layer.texture_usda_txt',
 'pedon.taxa_usda_txt',
 'horizon.designation_usda_txt',
 'longitude.county_wgs84_dd',
 'latitude.county_wgs84_dd',
 'location.point.error_any_m',
 'location.country_iso.3166_txt',
 'observation.ogc.schema.title_ogc_txt',
 'observation.ogc.schema_idn_url',
 'surveyor.contact_ietf_email',
 'surveyor.address_utf8_txt',
 'dataset.title_utf8_txt',
 'dataset.owner_utf8_txt',
 'dataset.address_idn_url',
 'dataset.doi_idf_url',
 'dataset.license.title_ascii_txt',
 'dataset.license.address_idn_url',
 'dataset.contact.name_utf8_txt',
 'dataset.contact_ietf_email',
 'acidity_usda.a795_cmolc.kg',
 'aggstb_usda.a1_w.pct',
 'al.dith_usda.a65_w.pct',
 'al.ext_aquaregia_g.kg',
 'al.ext_usda.a1056_mg.kg',
 'al.ext_usda.a69_cmolc.kg',
 'al.ox_usda.a59_w.pct',
 'awc.33.1500kPa_usda.c80_w.frac',
 'b.ext_mel3_mg.kg',
 'bd_iso.11272_g.cm3',
 'bd_usda.a21_g.cm3',
 'bd_usda.a4_g.cm3',
 'c.tot_iso.10694_w.pct',
 'c.tot_usda.a622_w.pct',
 'ca.ext_aquaregia_mg.kg',
 'ca.ext_usda.a1059_mg.kg',
 'ca.ext_usda.a722_cmolc.kg',
 'caco3_iso.10693_w.pct',
 'caco3_usda.a54_w.pct',
 'cec_iso.11260_cmolc.kg',
 'cec_usda.a723_cmolc.kg',
 'cf_iso.11464_w.pct',
 'cf_usda.c236_w.pct',
 'clay.tot_iso.11277_w.pct',
 'clay.tot_usda.a334_w.pct',
 'cu.ext_usda.a1063_mg.kg',
 'ec_iso.11265_ds.m',
 'ec_usda.a364_ds.m',
 'efferv_usda.a479_class',
 'fe.dith_usda.a66_w.pct',
 'fe.ext_aquaregia_g.kg',
 'fe.ext_usda.a1064_mg.kg',
 'fe.ox_usda.a60_w.pct',
 'file_sequence',
 'k.ext_aquaregia_mg.kg',
 'k.ext_usda.a1065_mg.kg',
 'k.ext_usda.a725_cmolc.kg',
 'mg.ext_aquaregia_mg.kg',
 'mg.ext_usda.a1066_mg.kg',
 'mg.ext_usda.a724_cmolc.kg',
 'mn.ext_aquaregia_mg.kg',
 'mn.ext_usda.a1067_mg.kg',
 'mn.ext_usda.a70_mg.kg',
 'n.tot_iso.11261_w.pct',
 'n.tot_iso.13878_w.pct',
 'n.tot_usda.a623_w.pct',
 'na.ext_aquaregia_mg.kg',
 'na.ext_usda.a1068_mg.kg',
 'na.ext_usda.a726_cmolc.kg',
 'oc_iso.10694_w.pct',
 'oc_usda.c1059_w.pct',
 'oc_usda.c729_w.pct',
 'p.ext_aquaregia_mg.kg',
 'p.ext_iso.11263_mg.kg',
 'p.ext_usda.a1070_mg.kg',
 'p.ext_usda.a270_mg.kg',
 'p.ext_usda.a274_mg.kg',
 'p.ext_usda.a652_mg.kg',
 'ph.cacl2_iso.10390_index',
 'ph.cacl2_usda.a477_index',
 'ph.cacl2_usda.a481_index',
 'ph.h2o_iso.10390_index',
 'ph.h2o_usda.a268_index',
 's.ext_mel3_mg.kg',
 's.tot_usda.a624_w.pct',
 'sand.tot_iso.11277_w.pct',
 'sand.tot_usda.c405_w.pct',
 'sand.tot_usda.c60_w.pct',
 'silt.tot_iso.11277_w.pct',
 'silt.tot_usda.c407_w.pct',
 'silt.tot_usda.c62_w.pct',
 'wr.10kPa_usda.a414_w.pct',
 'wr.10kPa_usda.a8_w.pct',
 'wr.1500kPa_usda.a417_w.pct',
 'wr.33kPa_usda.a415_w.pct',
 'wr.33kPa_usda.a9_w.pct',
 'zn.ext_usda.a1073_mg.kg',
 'scan.mir.date.begin_iso.8601_yyyy.mm.dd',
 'scan.mir.date.end_iso.8601_yyyy.mm.dd',
 'scan.mir.model.name_utf8_txt',
 'scan.mir.model.code_any_txt',
 'scan.mir.method.optics_any_txt',
 'scan.mir.method.preparation_any_txt',
 'scan.mir.license.title_ascii_txt',
 'scan.mir.license.address_idn_url',
 'scan.mir.doi_idf_url',
 'scan.mir.contact.name_utf8_txt',
 'scan.mir.contact.email_ietf_txt',
 'scan.visnir.date.begin_iso.8601_yyyy.mm.dd',
 'scan.visnir.date.end_iso.8601_yyyy.mm.dd',
 'scan.visnir.model.name_utf8_txt',
 'scan.visnir.model.code_any_txt',
 'scan.visnir.method.optics_any_txt',
 'scan.visnir.method.preparation_any_txt',
 'scan.visnir.license.title_ascii_txt',
 'scan.visnir.license.address_idn_url',
 'scan.visnir.doi_idf_url',
 'scan.visnir.contact.name_utf8_txt',
 'scan.visnir.contact.email_ietf_txt']
  • Get metadata (e.g., geographical coordinates):
metadata = ossl.get_properties(['longitude.point_wgs84_dd', 'latitude.point_wgs84_dd'], require_complete=False)

Preparing data for machine learning pipeline

To get directly aligned spectra and target variable(s):

X, y, ids = ossl.get_aligned_data(
    spectra_data=mir_data,
    target_cols='cec_usda.a723_cmolc.kg'
)

X.shape, y.shape, ids.shape
((57062, 1701), (57062, 1), (57062,))

And plot the first 20 MIR spectra:

from matplotlib import pyplot as plt

plt.figure(figsize=(12, 3))
plt.plot(mir_data.wavenumbers, mir_data.spectra[:20,:].T, alpha=0.3, color='steelblue', lw=1)
plt.gca().invert_xaxis()
plt.grid(True, linestyle='--', alpha=0.7)

plt.xlabel('Wavenumber (cm⁻¹)')
plt.ylabel('Absorbance');

Data Structure

The package returns spectra data in a structured format containing:

  • Wavenumbers
  • Spectra measurements
  • Measurement type (reflectance/absorbance)
  • Sample IDs

Properties and metadata are returned as pandas DataFrames indexed by sample ID.

Cache Management

By default, the OSSL dataset is cached in ~/.soilspecdata/. To force a fresh download:

ossl = get_ossl(force_download=True)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Apache2

Citation(s)

  • OSSL Library: Safanelli, J.L., Hengl, T., Parente, L.L., Minarik, R., Bloom, D.E., Todd-Brown, K., Gholizadeh, A., Mendes, W. de S., Sanderman, J., 2025. Open Soil Spectral Library (OSSL): Building reproducible soil calibration models through open development and community engagement. PLOS ONE 20, e0296545. https://doi.org/10.1371/journal.pone.0296545

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soilspecdata-0.0.5.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soilspecdata-0.0.5-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file soilspecdata-0.0.5.tar.gz.

File metadata

  • Download URL: soilspecdata-0.0.5.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for soilspecdata-0.0.5.tar.gz
Algorithm Hash digest
SHA256 4ba9e15061ce461c5670e93fc0cedc0a0e0a4ff33550e73fcd64a45174e4fb0d
MD5 78e897e17affb04992454612fd9698f0
BLAKE2b-256 0bec4b7568493ed0d588ce784962c3de4bcbaa93327b8c1971cda0d932b8db0d

See more details on using hashes here.

File details

Details for the file soilspecdata-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: soilspecdata-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for soilspecdata-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ab541e85b92fd330b2631912d4ea57afaff97a70ba48fff56fcae57a07798674
MD5 3d310ed7f57f1f0799a70dff409da2dc
BLAKE2b-256 f2fe10bc0792f9608ee9ac006760272487b74ea20fbce8f8d3235e0c1700ccf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page