Skip to main content

Unofficial Python tools for querying NIST Chemistry WebBook pages and extracting molecular-property records

Project description

NistChemPy

DOI

Unofficial Python tools for querying NIST Chemistry WebBook pages and extracting molecular-property records.

Project notice: NistChemPy is an unofficial Python package for querying NIST Chemistry WebBook pages and extracting selected molecular-property records. It is not affiliated with, maintained by, or endorsed by NIST. Because the Chemistry WebBook does not provide a stable public web API for this package, functionality may depend on the current structure and behavior of the external web service.

Important index change: NistChemPy no longer ships a prebuilt NIST Chemistry WebBook compound index. Live WebBook search and individual compound-page parsing remain separate functionality, but local index search now requires a user-generated local index/cache.

Rebuilding a full section-availability index can require visiting one WebBook page per compound. With a polite 3 second delay and roughly 100,000-150,000 pages, the initial rebuild can take about 3.5-5+ days before retries and network overhead.

NistChemPy automates selected search and data-extraction workflows for the NIST Chemistry WebBook. It currently supports extraction of basic compound metadata, selected spectral records (IR, THz, MS, and UV-Vis), and gas chromatography records where these are available from the corresponding WebBook pages. Additional properties may be reachable through source URLs stored by the package, but direct extraction is intentionally limited to the implemented record types.

For serious scientific use, users should verify retrieved records against the original NIST Chemistry WebBook pages and the primary literature references given there. Package output should not be treated as an official NIST data product, a complete database dump, or a stable production API.

Main features

  1. Search:

    • Search by name, chemical formula, CAS RN, InChI / InChI Key: nistchempy.run_search.

    • Search by structure, including substructural search: nistchempy.run_structural_search. RDKit is optional and is used for SMILES/InChI-to-MOL conversion helpers and local index structural search.

    • Search over a user-local compound index/cache with nistchempy.WebBookIndex.from_cache() or nistchempy.get_local_index(). NistChemPy does not redistribute a prebuilt WebBook-derived index.

  2. Compound info (nistchempy.compound.NistCompound):

    • Object stores parsed properties and corresponding source URLs.

    • Supports extraction of selected records:

      • 2D and 3D atomic coordinates.

      • Spectral data (IR, MS, UV-Vis).

      • Gas chromatography data.

    • Parsed metadata and loaded property objects can be exported as structured records with to_dict(), to_record(), and to_records(). Record collections can be serialized with nistchempy.records.write_records_json() or nistchempy.records.write_records_jsonl().

For more details see the Cookbook section of the documentation.

Related project: NistChemData

NistChemData is a companion repository for local reconstruction workflows and provenance-sensitive extraction scripts. It is not an official NIST product and is not promoted here as an authoritative, complete, current, or independently licensed redistribution of the NIST Chemistry WebBook.

Users should review the NistChemData data-use notice, original NIST Chemistry WebBook pages, applicable NIST terms, and source references before running those workflows or using generated local artifacts in scientific, commercial, or redistributed datasets.

Installation

Install NistChemPy using pip:

pip install nistchempy

[!WARNING] Please note that versions starting with 1.0.0 are not backward compatible with the older alpha versions due to significant changes in the code structure. Version 2.0.0 removes the packaged WebBook-derived index. Code that previously used the old bundled index should migrate to a user-local index loaded with nistchempy.WebBookIndex.from_cache() or nistchempy.get_local_index().

Local WebBook index

NistChemPy can load a user-local WebBook index from either a cache directory containing index.csv or from an explicit CSV file path:

import nistchempy as nist

index = nist.get_local_index('/path/to/webbook-index')
# or, for a local CSV you already have locally:
index = nist.get_local_index('/path/to/local_webbook_index.csv')

NistChemPy can also build a user-local index by discovering candidate compounds through the WebBook formula browser, formula search, or sitemaps and then enriching discovered seeds from individual compound pages:

nistchempy index build \
  --strategy formula-browser \
  --path /path/to/webbook-index \
  --request-delay 3 \
  --accept-data-terms

The sitemap strategy is available as a secondary/audit discovery source. The formula-search strategy wraps the legacy carbon-formula search workflow as a bounded discovery strategy and therefore requires an explicit carbon range, for example:

nistchempy index discover \
  --strategy formula-search \
  --formula-carbon-start 1 \
  --formula-carbon-end 20 \
  --accept-data-terms

A full page-enriched build may need to visit many compound pages. With a polite 3 second delay, a full initial rebuild can take about 3.5-5+ days before retries and network overhead.

Useful CLI commands for existing local indexes:

nistchempy index path
nistchempy index status
nistchempy index search benzene

The documentation includes a Local Index Workflow cookbook page explaining the cache layout, discovery/enrichment pipeline, custom paths, CSV import, and RDKit-assisted local structural search.

Generated local index/cache files are user-local artifacts and are not covered by the NistChemPy software license. See DATA_NOTICE.md for the repository-level data notice. For migration/testing, an existing local CSV can also be imported into the new cache layout:

nistchempy index build \
  --from-csv /path/to/local_webbook_index.csv \
  --path /path/to/webbook-index \
  --accept-data-terms

Development workflows

Default tests are offline and deterministic:

python -m pip install -e ".[dev]"
pytest -q

Live WebBook integration tests are opt-in:

pytest -q -m network
pytest -q -m "network and rdkit"

Documentation notebooks are committed with pregenerated outputs and are not executed by Sphinx. Regenerate them manually after example/API changes:

jupyter nbconvert --execute docs/source/basic_search.ipynb --inplace
jupyter nbconvert --execute docs/source/compound_properties.ipynb --inplace
jupyter nbconvert --execute docs/source/structural_search.ipynb --inplace
jupyter nbconvert --execute docs/source/local_index.ipynb --inplace
jupyter nbconvert --execute docs/source/requests_config.ipynb --inplace

See the documentation development workflow page for the full test, docs, and release checklist.

Release checks

Before publishing a release, build the package and verify that no generated WebBook-derived index/cache artifacts are included:

python -m build
python tools/check_package_artifacts.py dist/*

The check rejects files such as nist_data.zip, nist_data.csv, compounds_data.json, and package-internal nistchempy/data/ contents.

Documentation

The primary features of NistChemPy, including WebBook search, compound parsing, structured records, and local index workflows, are detailed in the documentation.

AI-assisted development

Starting with the 1.0.6 cleanup/update and continuing through the 2.0.0 development line, OpenAI coding agents were used to assist with implementation, refactoring, documentation, and tests. Other AI models were also used to discuss architecture and implementation details. See AI_USE.md for the project note on AI-assisted development.

Citation

Please cite the Zenodo Concept DOI for NistChemPy:

10.5281/zenodo.20235917

The Concept DOI is preferred for general citations because it represents the software across archived versions.

If you use NistChemPy in research, please cite the software using the metadata in CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nistchempy-2.0.0.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nistchempy-2.0.0-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file nistchempy-2.0.0.tar.gz.

File metadata

  • Download URL: nistchempy-2.0.0.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for nistchempy-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b17339ac4cd98a6f664cf82b6caf5eed6c2f17ab03acac7c18bec397d40c9bb8
MD5 9d771eab1af6abb20214fafe353b9376
BLAKE2b-256 c94abef841a613cdc2d7e7708fbfd3246ba1f16691f08db412ccd80c69e2180d

See more details on using hashes here.

File details

Details for the file nistchempy-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: nistchempy-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 62.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for nistchempy-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cccb4db458efb6a30b09fdf360e351792ac433abf6ec969f08fad751f0c58691
MD5 0694e281a29931394acced6cf23a50bb
BLAKE2b-256 ab3c3dfc1a3820fdb16289bb3a38d0505b343635c2bcaeec57c011f98d149dbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page