Unofficial Python tools for querying NIST Chemistry WebBook pages and extracting molecular-property records
Project description
NistChemPy
Unofficial Python tools for querying NIST Chemistry WebBook pages and extracting molecular-property records.
Project notice: NistChemPy is an unofficial Python package for querying NIST Chemistry WebBook pages and extracting selected molecular-property records. It is not affiliated with, maintained by, or endorsed by NIST. Because the Chemistry WebBook does not provide a stable public web API for this package, functionality may depend on the current structure and behavior of the external web service.
Important index change: NistChemPy no longer ships a prebuilt NIST Chemistry WebBook compound index. Live WebBook search and individual compound-page parsing remain separate functionality, but local index search now requires a user-generated local index/cache.
Rebuilding a full section-availability index can require visiting one WebBook page per compound. With a polite 3 second delay and roughly 100,000-150,000 pages, the initial rebuild can take about 3.5-5+ days before retries and network overhead.
NistChemPy automates selected search and data-extraction workflows for the NIST Chemistry WebBook. It currently supports extraction of basic compound metadata, selected spectral records (IR, THz, MS, and UV-Vis), and gas chromatography records where these are available from the corresponding WebBook pages. Additional properties may be reachable through source URLs stored by the package, but direct extraction is intentionally limited to the implemented record types.
For serious scientific use, users should verify retrieved records against the original NIST Chemistry WebBook pages and the primary literature references given there. Package output should not be treated as an official NIST data product, a complete database dump, or a stable production API.
Main features
-
Search:
-
Search by name, chemical formula, CAS RN, InChI / InChI Key:
nistchempy.run_search. -
Search by structure, including substructural search:
nistchempy.run_structural_search. RDKit is optional and is used for SMILES/InChI-to-MOL conversion helpers and local index structural search. -
Search over a user-local compound index/cache with
nistchempy.WebBookIndex.from_cache()ornistchempy.get_local_index(). NistChemPy does not redistribute a prebuilt WebBook-derived index.
-
-
Compound info (
nistchempy.compound.NistCompound):-
Object stores parsed properties and corresponding source URLs.
-
Supports extraction of selected records:
-
2D and 3D atomic coordinates.
-
Spectral data (IR, MS, UV-Vis).
-
Gas chromatography data.
-
-
Parsed metadata and loaded property objects can be exported as structured records with
to_dict(),to_record(), andto_records(). Record collections can be serialized withnistchempy.records.write_records_json()ornistchempy.records.write_records_jsonl().
-
For more details see the Cookbook section of the documentation.
Related project: NistChemData
NistChemData is a companion repository for local reconstruction workflows and provenance-sensitive extraction scripts. It is not an official NIST product and is not promoted here as an authoritative, complete, current, or independently licensed redistribution of the NIST Chemistry WebBook.
Users should review the NistChemData data-use notice, original NIST Chemistry WebBook pages, applicable NIST terms, and source references before running those workflows or using generated local artifacts in scientific, commercial, or redistributed datasets.
Installation
Install NistChemPy using pip:
pip install nistchempy
[!WARNING] Please note that versions starting with 1.0.0 are not backward compatible with the older alpha versions due to significant changes in the code structure. Version 2.0.0 removes the packaged WebBook-derived index. Code that previously used the old bundled index should migrate to a user-local index loaded with
nistchempy.WebBookIndex.from_cache()ornistchempy.get_local_index().
Local WebBook index
NistChemPy can load a user-local WebBook index from either a cache directory
containing index.csv or from an explicit CSV file path:
import nistchempy as nist
index = nist.get_local_index('/path/to/webbook-index')
# or, for a local CSV you already have locally:
index = nist.get_local_index('/path/to/local_webbook_index.csv')
NistChemPy can also build a user-local index by discovering candidate compounds through the WebBook formula browser, formula search, or sitemaps and then enriching discovered seeds from individual compound pages:
nistchempy index build \
--strategy formula-browser \
--path /path/to/webbook-index \
--request-delay 3 \
--accept-data-terms
The sitemap strategy is available as a secondary/audit discovery source.
The formula-search strategy wraps the legacy carbon-formula search
workflow as a bounded discovery strategy and therefore requires an explicit
carbon range, for example:
nistchempy index discover \
--strategy formula-search \
--formula-carbon-start 1 \
--formula-carbon-end 20 \
--accept-data-terms
A full page-enriched build may need to visit many compound pages. With a polite 3 second delay, a full initial rebuild can take about 3.5-5+ days before retries and network overhead.
Useful CLI commands for existing local indexes:
nistchempy index path
nistchempy index status
nistchempy index search benzene
The documentation includes a Local Index Workflow cookbook page explaining the cache layout, discovery/enrichment pipeline, custom paths, CSV import, and RDKit-assisted local structural search.
Generated local index/cache files are user-local artifacts and are not covered by the NistChemPy software license. See DATA_NOTICE.md for the repository-level data notice. For migration/testing, an existing local CSV can also be imported into the new cache layout:
nistchempy index build \
--from-csv /path/to/local_webbook_index.csv \
--path /path/to/webbook-index \
--accept-data-terms
Development workflows
Default tests are offline and deterministic:
python -m pip install -e ".[dev]"
pytest -q
Live WebBook integration tests are opt-in:
pytest -q -m network
pytest -q -m "network and rdkit"
Documentation notebooks are committed with pregenerated outputs and are not executed by Sphinx. Regenerate them manually after example/API changes:
jupyter nbconvert --execute docs/source/basic_search.ipynb --inplace
jupyter nbconvert --execute docs/source/compound_properties.ipynb --inplace
jupyter nbconvert --execute docs/source/structural_search.ipynb --inplace
jupyter nbconvert --execute docs/source/local_index.ipynb --inplace
jupyter nbconvert --execute docs/source/requests_config.ipynb --inplace
See the documentation development workflow page for the full test, docs, and release checklist.
Release checks
Before publishing a release, build the package and verify that no generated WebBook-derived index/cache artifacts are included:
python -m build
python tools/check_package_artifacts.py dist/*
The check rejects files such as nist_data.zip, nist_data.csv,
compounds_data.json, and package-internal nistchempy/data/ contents.
Documentation
The primary features of NistChemPy, including WebBook search, compound parsing, structured records, and local index workflows, are detailed in the documentation.
AI-assisted development
Starting with the 1.0.6 cleanup/update and continuing through the 2.0.0 development line, OpenAI coding agents were used to assist with implementation, refactoring, documentation, and tests. Other AI models were also used to discuss architecture and implementation details. See AI_USE.md for the project note on AI-assisted development.
Citation
Please cite the Zenodo Concept DOI for NistChemPy:
The Concept DOI is preferred for general citations because it represents the software across archived versions.
If you use NistChemPy in research, please cite the software using the metadata in CITATION.cff.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nistchempy-2.0.0.tar.gz.
File metadata
- Download URL: nistchempy-2.0.0.tar.gz
- Upload date:
- Size: 69.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b17339ac4cd98a6f664cf82b6caf5eed6c2f17ab03acac7c18bec397d40c9bb8
|
|
| MD5 |
9d771eab1af6abb20214fafe353b9376
|
|
| BLAKE2b-256 |
c94abef841a613cdc2d7e7708fbfd3246ba1f16691f08db412ccd80c69e2180d
|
File details
Details for the file nistchempy-2.0.0-py3-none-any.whl.
File metadata
- Download URL: nistchempy-2.0.0-py3-none-any.whl
- Upload date:
- Size: 62.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cccb4db458efb6a30b09fdf360e351792ac433abf6ec969f08fad751f0c58691
|
|
| MD5 |
0694e281a29931394acced6cf23a50bb
|
|
| BLAKE2b-256 |
ab3c3dfc1a3820fdb16289bb3a38d0505b343635c2bcaeec57c011f98d149dbf
|