Skip to main content

A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.

Project description

CathodeDataExtractor


Supported Python versions GitHub LICENSE PyPI version
Cathodedataextractor is a lightweight document-level information extraction pipeline that can automatically extract comprehensive properties related to synthesis parameters, cycling and rate performance of cathode materials from the literature of layered cathode materials for sodium-ion batteries.

Installation


pip install cathodedataextractor

Features


  • It is built on open-source libraries: pymatgen, text2chem, and ChemDataExtractor v2 with some modifications.
  • BatterySciBERT-uncased Multi-Label text classification model for filtering documents.
  • Automated comprehensive data extraction pipeline for cathode materials.
  • Paragraph Multi-Class classification algorithms for documents (HTML/XML) from the RSC and Elsevier.
  • A normalised entity handling process is provided.
  • An effective chemical abbreviation detection module.
  • Heuristic multi-level relation extraction algorithm for electrochemical properties.

In addition, the pipeline is also suitable for string sequence text extraction.

Quick start


Extract from documents

from glob import iglob
from cathodedataextractor.information_extraction_pipe import Pipeline

pipline = Pipeline()
for document in iglob('*ml'):
    extraction_results = pipline.extract(document)

Extract from string

from cathodedataextractor.information_extraction_pipe import Pipeline

extraction_results = Pipeline.from_string(
    'Apart from the conventional cationic redox of transition metals, '
    'both Na-deficit and Na-excess materials have showcased the ability '
    'to exploit oxygen redox activity as O2–/O2n– for a charge '
    'compensation mechanism. To realize cathodes with enhanced energy '
    'density, a technique like the incorporation of alkali metal ions '
    'into transition metal layers has been adopted. Recent work by Boisse '
    '(13) et al. displayed the impact of honeycomb cation ordering of '
    'a highly stabilized intermediate phase for a Na2RuO3 cathode material '
    'in instigating the anionic redox activity and providing a capacity '
    'of 180 mAh g–1 at 0.2C with a capacity retention of 89% for over '
    '50 cycles. More devoted efforts to realize the utmost potential '
    'of anionic redox ought to be carried out in the future.')

Issues?


You can either report an issue on GitHub or contact me directly. Try gouyx@mail2.sysu.edu.cn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cathodedataextractor-0.0.4.tar.gz (65.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page