Set of parsers and linkers for materials extraction
Project description
Material Parsers (and other tools)
Previously this project was released as grobid-superconductors-tools, born as aister project of grobid-superconductors containing a web service that interfaces with the python libraries (e.g. Spacy).
The service provides the following functionalities:
- Convert material name to formula (e.g. Lead -> Pb, Hydrogen -> H):
/convert/name/formula - Decompose formula into structured dict of elements (e.g. La x Fe 1-x O7-> {La: x, Fe: 1-x, O: 7}):
/convert/formula/composition - Classify material in classes (from the superconductors domain) using a rule-base table (e.g. "La Cu Fe" -> Cuprates):
/classify/formula - Tc's classification (Tc, not-Tc):
/classify/tcfor information please open an issue - Relation extraction given a sentence and two entities:
/process/linkfor information please open an issue - Material processing using Deep Learning models and rule-based processing
/process/material
Usage
The service is deployed on huggingface spaces, and can be used right away. For installing the service in your own environment see below.
Convert material name to formula
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/name/formula' \
--form 'input="Hydrogen"'
output:
{"composition": {"H": "1"}, "name": "Hydrogen", "formula": "H"}
Decompose formula in a structured dict of elements
Example:
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/formula/composition' \
--form 'input="CaBr2-x"'
output:
{"composition": {"Ca": "1", "Br": "2-x"}}
Classify materials in classes
Example:
curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/classify/formula' \
--form 'input="(Mo 0.96 Zr 0.04 ) 0.85 B x "'
output:
['Alloys']
Process material
This process includes a combination of everything listed above, after passing the material sequence through a DL model
Example:
curl --location 'https://lfoppiano-material-parsers.hf.space/process/material' \
--form 'text="(Mo 0.96 Zr 0.04 ) 0.85 B x "'
output:
[
{
"formula": {
"rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x"
},
"resolvedFormulas": [
{
"rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x",
"formulaComposition": {
"Mo": "0.816",
"Zr": "0.034",
"B": "x"
}
}
]
}
]
Evaluation
The model uses DeLFT's model BidLSTM_CRF.
Evaluated on the 23/12/25
precision recall f1-score support
<doping> 0.6926 0.6377 0.6640 265
<fabrication> 0.3333 0.0909 0.1429 44
<formula> 0.8348 0.8459 0.8403 2569
<name> 0.7346 0.7935 0.7629 949
<shape> 0.9089 0.9608 0.9341 841
<substrate> 0.5875 0.3176 0.4123 148
<value> 0.8844 0.8920 0.8882 463
<variable> 0.9645 0.9710 0.9677 448
all (micro avg.) 0.8321 0.8385 0.8353 5727
Installing in your environment
docker run -it lfoppiano/grobid-superconductors-tools:2.1
References
If you use our work, and write about it, please cite our paper:
@article{doi:10.1080/27660400.2022.2153633,
author = {Luca Foppiano and Pedro Baptista Castro and Pedro Ortiz Suarez and Kensei Terashima and Yoshihiko Takano and Masashi Ishii},
title = {Automatic extraction of materials and properties from superconductors scientific literature},
journal = {Science and Technology of Advanced Materials: Methods},
volume = {3},
number = {1},
pages = {2153633},
year = {2023},
publisher = {Taylor & Francis},
doi = {10.1080/27660400.2022.2153633},
URL = {
https://doi.org/10.1080/27660400.2022.2153633
},
eprint = {
https://doi.org/10.1080/27660400.2022.2153633
}
}
Overview of the repository
- Converters TSV to/from Grobid XML files conversion
- Linking module: A rule based python algorithm to link entities
- Commons libraries: contains common code shared between the various component. The Grobid client was borrowed from here, the tokenizer from there.
Developer's notes
Set up on Apple M1
conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt
conda install scikit-learn=1.0.1
We need to remove tensorflow, h5py, scikit-learn from the delft dependencies in setup.py
pip install -e ../../delft
pip install -r requirements.txt
Finally, don't forget to install the spacy model
python -m spacy download en_core_web_sm
Release
bump-my-version bump patch|minor|major
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file material_parsers-3.0.2.tar.gz.
File metadata
- Download URL: material_parsers-3.0.2.tar.gz
- Upload date:
- Size: 16.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce5d4068319952e044640a6e746c7df0224ce4b1ddfcf973d2c1e4be8187cf32
|
|
| MD5 |
0ab803da0382751f481876f1126a4088
|
|
| BLAKE2b-256 |
9241f2136f9a0dd52b4a255866354cecc0dbcd0d9f2cc1eedaaef5ae9589ff3f
|
File details
Details for the file material_parsers-3.0.2-py3-none-any.whl.
File metadata
- Download URL: material_parsers-3.0.2-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1fdced3565741f144ad8bbbab727290953a969975e5c5dd05a4b4435265716e
|
|
| MD5 |
feb74b6c0c657c7c921bd860048b5916
|
|
| BLAKE2b-256 |
105805afd5f8b84a8a2dfb22902dccd3c017f9fea7a2c1a5e62f0e8ebaa91c36
|