Skip to main content

Set of parsers and linkers for materials extraction

Project description

Python CI

Material Parsers (and other tools)

Previously this project was released as grobid-superconductors-tools, born as aister project of grobid-superconductors containing a web service that interfaces with the python libraries (e.g. Spacy).

The service provides the following functionalities:

  • Convert material name to formula (e.g. Lead -> Pb, Hydrogen -> H): /convert/name/formula
  • Decompose formula into structured dict of elements (e.g. La x Fe 1-x O7-> {La: x, Fe: 1-x, O: 7}): /convert/formula/composition
  • Classify material in classes (from the superconductors domain) using a rule-base table (e.g. "La Cu Fe" -> Cuprates): /classify/formula
  • Tc's classification (Tc, not-Tc): /classify/tc for information please open an issue
  • Relation extraction given a sentence and two entities: /process/link for information please open an issue
  • Material processing using Deep Learning models and rule-based processing /process/material

Usage

The service is deployed on huggingface spaces, and can be used right away. For installing the service in your own environment see below.

Convert material name to formula

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/name/formula' \
--form 'input="Hydrogen"'

output:

{"composition": {"H": "1"}, "name": "Hydrogen", "formula": "H"}

Decompose formula in a structured dict of elements

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/formula/composition' \

--form 'input="CaBr2-x"'

output:

{"composition": {"Ca": "1", "Br": "2-x"}}

Classify materials in classes

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/classify/formula' \
--form 'input="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

['Alloys']

Process material

This process includes a combination of everything listed above, after passing the material sequence through a DL model

Example:

curl --location 'https://lfoppiano-material-parsers.hf.space/process/material' \
--form 'text="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

[
    {
        "formula": {
            "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x"
        },
        "resolvedFormulas": [
            {
                "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x",
                "formulaComposition": {
                    "Mo": "0.816",
                    "Zr": "0.034",
                    "B": "x"
                }
            }
        ]
    }
]

Evaluation

The model uses DeLFT's model BidLSTM_CRF.

Evaluated on the 23/12/25

                  precision    recall  f1-score   support

        <doping>     0.6926    0.6377    0.6640       265
   <fabrication>     0.3333    0.0909    0.1429        44
       <formula>     0.8348    0.8459    0.8403      2569
          <name>     0.7346    0.7935    0.7629       949
         <shape>     0.9089    0.9608    0.9341       841
     <substrate>     0.5875    0.3176    0.4123       148
         <value>     0.8844    0.8920    0.8882       463
      <variable>     0.9645    0.9710    0.9677       448

all (micro avg.)     0.8321    0.8385    0.8353      5727

Installing in your environment

docker run -it lfoppiano/grobid-superconductors-tools:2.1

References

If you use our work, and write about it, please cite our paper:

@article{doi:10.1080/27660400.2022.2153633,
    author = {Luca Foppiano and Pedro Baptista Castro and Pedro Ortiz Suarez and Kensei Terashima and Yoshihiko Takano and Masashi Ishii},
    title = {Automatic extraction of materials and properties from superconductors scientific literature},
    journal = {Science and Technology of Advanced Materials: Methods},
    volume = {3},
    number = {1},
    pages = {2153633},
    year = {2023},
    publisher = {Taylor & Francis},
    doi = {10.1080/27660400.2022.2153633},
    URL = {
    https://doi.org/10.1080/27660400.2022.2153633
    },
    eprint = {
    https://doi.org/10.1080/27660400.2022.2153633
    }
}

Overview of the repository

  • Converters TSV to/from Grobid XML files conversion
  • Linking module: A rule based python algorithm to link entities
  • Commons libraries: contains common code shared between the various component. The Grobid client was borrowed from here, the tokenizer from there.

Developer's notes

Set up on Apple M1

conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt 
conda install scikit-learn=1.0.1

We need to remove tensorflow, h5py, scikit-learn from the delft dependencies in setup.py

pip install -e ../../delft 
pip install -r requirements.txt 

Finally, don't forget to install the spacy model

python -m spacy download en_core_web_sm

Release

bump-my-version bump patch|minor|major

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

material_parsers-3.0.2.tar.gz (16.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

material_parsers-3.0.2-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file material_parsers-3.0.2.tar.gz.

File metadata

  • Download URL: material_parsers-3.0.2.tar.gz
  • Upload date:
  • Size: 16.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for material_parsers-3.0.2.tar.gz
Algorithm Hash digest
SHA256 ce5d4068319952e044640a6e746c7df0224ce4b1ddfcf973d2c1e4be8187cf32
MD5 0ab803da0382751f481876f1126a4088
BLAKE2b-256 9241f2136f9a0dd52b4a255866354cecc0dbcd0d9f2cc1eedaaef5ae9589ff3f

See more details on using hashes here.

File details

Details for the file material_parsers-3.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for material_parsers-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c1fdced3565741f144ad8bbbab727290953a969975e5c5dd05a4b4435265716e
MD5 feb74b6c0c657c7c921bd860048b5916
BLAKE2b-256 105805afd5f8b84a8a2dfb22902dccd3c017f9fea7a2c1a5e62f0e8ebaa91c36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page