Convert between NCBI pubmed/PMC and BIOC formats
Project description
BioConverters Package
The bioconverters packages contains functions for converting PubMed and PMC style XML into BioC format.
Getting Started
Install with pip
pip install bioconverters
Now you are ready to start converting files. Assuming you already have a file containing PMC formatted XML
from bioconverters import pmcxml2bioc
for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml'):
# do stuff with bioc doc
Customizing Handlers
You can overload the parse functions that deal with specific tags but providing the handlers argument. In the example below we are writing a parser for an element which we are omitting from the final text content.
from bioconverters.util import TextChunk
from bioconverters import pmcxml2bioc
def ignore_element(xml_element, custom_handlers):
tail = (elem.tail or "").strip()
return [TextChunk(tail, elem)]
for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', tag_handlers={'table': ignore_element}):
# do stuff with bioc doc
Trim Sentences
You can also choose to truncate sentences to a maximum length. This is off by default. To turn this option off use the flag
for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', trim_sentences=True):
# do stuff with bioc doc
Add XML structure Information
To keep track of approximately where in the XML heirarchy a passage was derived from use the all_xml_path_infon
option. Note that this will be default added for any table and figure elements regardless of the flag
for doc in pmcxml2bioc('/path/to/pmc/xml/file.xml', all_xml_path_infon=True):
# do stuff with bioc doc
This will add an infon to each passage (where possible) which resembles the following
<infon key="xml_path">body/sec/p</infon>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bioconverters-1.0.1.tar.gz
.
File metadata
- Download URL: bioconverters-1.0.1.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7f87a9659b5303f43893a5a44ea488cce3c6763fcfc903f7e9d403cd1e41f77 |
|
MD5 | e69ad1f3c73c02cfd1ee59dfecec846a |
|
BLAKE2b-256 | a7cfb741272a9e755a2beec1cc578feb6d35b55fd5669d70581c76d169bd2b38 |
File details
Details for the file bioconverters-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: bioconverters-1.0.1-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b06dd4406b20c6e5219e261617794086353e43a7116f832d49ae13ff2f8484f |
|
MD5 | 43f5ad71324c04de79b239b6a38a4e24 |
|
BLAKE2b-256 | 630d11f1898f0ee36d05d52e18e0c0f597f94d2ea29f281da5b92c05e0c7640c |