Skip to main content

Convert between NCBI pubmed/PMC and BIOC formats

Project description

BioConverters Package

PyPi build codecov

The bioconverters packages contains functions for converting PubMed and PMC style XML into BioC format.

Getting Started

Install with pip

pip install bioconverters

Now you are ready to start converting files. Assuming you already have a file containing PMC formatted XML

from bioconverters import pmcxml2bioc

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml'):
    # do stuff with bioc doc

Customizing Handlers

You can overload the parse functions that deal with specific tags but providing the handlers argument. In the example below we are writing a parser for an element which we are omitting from the final text content.

from bioconverters.util import TextChunk
from bioconverters import pmcxml2bioc

def ignore_element(xml_element, custom_handlers):
    tail = (elem.tail or "").strip()
    return [TextChunk(tail, elem)]


for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', tag_handlers={'table': ignore_element}):
    # do stuff with bioc doc

Trim Sentences

You can also choose to truncate sentences to a maximum length. This is off by default. To turn this option off use the flag

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', trim_sentences=True):
    # do stuff with bioc doc

Add XML structure Information

To keep track of approximately where in the XML heirarchy a passage was derived from use the all_xml_path_infon option. Note that this will be default added for any table and figure elements regardless of the flag

for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', all_xml_path_infon=True):
    # do stuff with bioc doc

This will add an infon to each passage (where possible) which resembles the following

<infon key="xml_path">body/sec/p</infon>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioconverters-1.0.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

bioconverters-1.0.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file bioconverters-1.0.1.tar.gz.

File metadata

  • Download URL: bioconverters-1.0.1.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for bioconverters-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f7f87a9659b5303f43893a5a44ea488cce3c6763fcfc903f7e9d403cd1e41f77
MD5 e69ad1f3c73c02cfd1ee59dfecec846a
BLAKE2b-256 a7cfb741272a9e755a2beec1cc578feb6d35b55fd5669d70581c76d169bd2b38

See more details on using hashes here.

File details

Details for the file bioconverters-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: bioconverters-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for bioconverters-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b06dd4406b20c6e5219e261617794086353e43a7116f832d49ae13ff2f8484f
MD5 43f5ad71324c04de79b239b6a38a4e24
BLAKE2b-256 630d11f1898f0ee36d05d52e18e0c0f597f94d2ea29f281da5b92c05e0c7640c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page