Skip to main content

Convert between NCBI pubmed/PMC and BIOC formats

Project description

BioConverters Package

PyPi build codecov

The bioconverters packages contains functions for converting PubMed and PMC style XML into BioC format.

Getting Started

Install with pip

pip install bioconverters

Now you are ready to start converting files. Assuming you already have a file containing PMC formatted XML

from bioconverters import pmcxml2bioc

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml'):
    # do stuff with bioc doc

Customizing Handlers

You can overload the parse functions that deal with specific tags but providing the handlers argument. In the example below we are writing a parser for an element which we are omitting from the final text content.

from bioconverters.util import TextChunk
from bioconverters import pmcxml2bioc

def ignore_element(xml_element, custom_handlers):
    tail = (elem.tail or "").strip()
    return [TextChunk(tail, elem)]


for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', tag_handlers={'table': ignore_element}):
    # do stuff with bioc doc

Trim Sentences

You can also choose to truncate sentences to a maximum length. This is off by default. To turn this option off use the flag

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', trim_sentences=True):
    # do stuff with bioc doc

Add XML structure Information

To keep track of approximately where in the XML heirarchy a passage was derived from use the all_xml_path_infon option. Note that this will be default added for any table and figure elements regardless of the flag

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', all_xml_path_infon=True):
    # do stuff with bioc doc

This will add an infon to each passage (where possible) which resembles the following

<infon key="xml_path">body/sec/p</infon>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioconverters-1.0.1.tar.gz (18.0 kB view hashes)

Uploaded Source

Built Distribution

bioconverters-1.0.1-py3-none-any.whl (16.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page