A toolkit for extracting chemical information from the scientific literature.
Project description
ChemDataExtractor
ChemDataExtractor v2 is a toolkit for extracting chemical information from the scientific literature. Python 3.5 to Python 3.8 supported.
Installation
pip install chemdataextractor2
Features
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies
Documentation & Development
Please read the documentation for instructions on contributing to the project.
License
ChemDataExtractor v2 is licensed under the MIT license
_, a permissive, business-friendly license for open source
software.
MIT license: https://github.com/CambridgeMolecularEngineering/ChemDataExtractor/blob/master/LICENSE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.