Skip to main content

A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.

Project description

CathodeDataExtractor


Supported Python versions GitHub LICENSE PyPI version
Cathodedataextractor is a lightweight document-level information extraction pipeline that can automatically extract comprehensive properties related to synthesis parameters, cycling and rate performance of cathode materials from the literature of layered cathode materials for sodium-ion batteries.

Installation


pip install cathodedataextractor

Features


  • It is built on open-source libraries: pymatgen, text2chem, and ChemDataExtractor v2 with some modifications.
  • BatterySciBERT-uncased Multi-Label text classification model for filtering documents.
  • Automated comprehensive data extraction pipeline for cathode materials.
  • Paragraph Multi-Class classification algorithms for documents (HTML/XML) from the RSC and Elsevier.
  • A normalised entity handling process is provided.
  • An effective chemical abbreviation detection module.
  • Heuristic multi-level relation extraction algorithm for electrochemical properties.

In addition, the pipeline is also suitable for string sequence text extraction.

Quick start


Extract from documents

from glob import iglob
from cathodedataextractor.information_extraction_pipe import Pipeline

pipline = Pipeline()
for document in iglob('*ml'):
    extraction_results = pipline.extract(document)

Extract from string

from cathodedataextractor.information_extraction_pipe import Pipeline

extraction_results = Pipeline.from_string(
    'Apart from the conventional cationic redox of transition metals, '
    'both Na-deficit and Na-excess materials have showcased the ability '
    'to exploit oxygen redox activity as O2–/O2n– for a charge '
    'compensation mechanism. To realize cathodes with enhanced energy '
    'density, a technique like the incorporation of alkali metal ions '
    'into transition metal layers has been adopted. Recent work by Boisse '
    '(13) et al. displayed the impact of honeycomb cation ordering of '
    'a highly stabilized intermediate phase for a Na2RuO3 cathode material '
    'in instigating the anionic redox activity and providing a capacity '
    'of 180 mAh g–1 at 0.2C with a capacity retention of 89% for over '
    '50 cycles. More devoted efforts to realize the utmost potential '
    'of anionic redox ought to be carried out in the future.')

Issues?


You can either report an issue on GitHub or contact me directly. Try gouyx@mail2.sysu.edu.cn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cathodedataextractor-0.0.4.tar.gz (65.4 kB view details)

Uploaded Source

File details

Details for the file cathodedataextractor-0.0.4.tar.gz.

File metadata

  • Download URL: cathodedataextractor-0.0.4.tar.gz
  • Upload date:
  • Size: 65.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/6.7.0 pkginfo/1.9.6 requests/2.21.0 requests-toolbelt/1.0.0 tqdm/4.66.1 CPython/3.7.16

File hashes

Hashes for cathodedataextractor-0.0.4.tar.gz
Algorithm Hash digest
SHA256 c23d7d1e982a93d6dc9014bd39d344d939e4020115b5a080f5ac1e8f946c513f
MD5 a85a2f9bae3e93dddfc7428221ef14a9
BLAKE2b-256 3d7a5a5e6df1ce4adb2428a1caf6468e7a125e31f3833da6d0baa9c15e2a76f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page