A toolkit for extracting chemical information from the scientific literature.
Project description
ChemDataExtractor
ChemDataExtractor v2 is a toolkit for extracting chemical information from the scientific literature. Python 3.9 to Python 3.11 supported.
Installation
Create a virtual environment, for example with conda
conda create -n cde2 python=3.11
Activate the cde2 environment
conda activate cde2
Install chemdataextractor2 with pip
pip install chemdataextractor2
Features
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies
Documentation & Development
Please read the documentation for instructions on contributing to the project.
License
ChemDataExtractor v2 is licensed under the MIT license_, a permissive, business-friendly license for open source
software.
MIT license: https://github.com/CambridgeMolecularEngineering/ChemDataExtractor/blob/master/LICENSE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file chemdataextractor2-2.4.0.tar.gz.
File metadata
- Download URL: chemdataextractor2-2.4.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8f21df5f8819616dc507d9434f74f1d46d6465cad806a0d91701a3e6ead0c3e
|
|
| MD5 |
f321fc6a4ebf882885b82765405daf1f
|
|
| BLAKE2b-256 |
f499397a9518c7b3d1685c852ae0d43f0d5f3ece56945c5fc8fbed45a0f3270f
|