Convert between BioNLP formats
Project description
bconv
: Python library for converting between BioNLP formats
bconv
offers format conversion and manipulation of documents with text and annotations.
It supports various popular formats used in natural-language processing for biomedical texts.
Supported formats
The following formats are currently supported:
Name | I | O | T | A | Description |
---|---|---|---|---|---|
bioc_xml , bioc_json |
✓ | ✓ | ✓ | ✓ | BioC |
bionlp |
✓ | ✓ | BioNLP stand-off | ||
brat |
✓ | ✓ | brat stand-off | ||
conll |
✓ | ✓ | ✓ | ✓ | CoNLL |
europepmc , europepmc.zip |
✓ | ✓ | Europe-PMC JSON | ||
pubtator , pubtator_fbk |
✓ | ✓ | ✓ | ✓ | PubTator |
pubmed , pxml |
✓ | ✓ | PubMed abstracts | ||
pmc , nxml |
✓ | ✓ | PMC full-text | ||
pubanno_json , pubanno_json.tgz |
✓ | ✓ | ✓ | ✓ | PubAnnotation JSON |
csv , tsv |
✓ | ✓ | comma/tab-separated values | ||
text_csv , text_tsv |
✓ | ✓ | ✓ | comma/tab-separated values | |
txt |
✓ | ✓ | ✓ | plain text | |
txt.json |
✓ | ✓ | ✓ | collection of plain-text documents |
I: input format; O: output format; T: can represent text; A: can represent annotations (entities).
Installation
bconv
is hosted on PyPI, so you can use pip
to install it:
$ pip install bconv
Usage
Load an annotated collection in BioC XML format:
>>> import bconv
>>> coll = bconv.load('path/to/example.xml', fmt='bioc_xml')
>>> coll
<Collection with 37 documents at 0x7f1966e4b3c8>
A Collection is a sequence of Document objects:
>>> coll[0]
<Document with 12 sections at 0x7f1966e2f6d8>
Documents contain Sections, which contain Sentences:
>>> sent = coll[0][3][5]
>>> sent.text
'A Live cell imaging reveals that expression of GFP‐KSHV‐TK, but not GFP induces contraction of HeLa cells.'
Find the first annotation for this sentence:
>>> e = next(sent.iter_entities())
>>> e.start, e.end, e.text
(571, 578, 'KSHV‐TK')
>>> e.metadata
{'type': 'gene/protein', 'ui': 'Uniprot:F5HB62'}
Write the whole collection to a new file in CoNLL format:
>>> with open('path/to/example.conll', 'w', encoding='utf8') as f:
... bconv.dump(coll, f, fmt='conll', tagset='IOBES', include_offsets=True)
Documentation
bconv
is documented in the GitHub wiki.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bconv-1.2.1.tar.gz
.
File metadata
- Download URL: bconv-1.2.1.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.1 CPython/3.10.6 Linux/5.19.0-38-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
123ced08350eb6d74fc2f0d86c407f77f2c89ae46dbd46895186ab639b52fd4d
|
|
MD5 |
ae5a733876ab9a434267299effe63977
|
|
BLAKE2b-256 |
7ec7ac9b87c1fc51b81f57347cb13695d578dd478fb586c06bf94212affffca2
|
File details
Details for the file bconv-1.2.1-py3-none-any.whl
.
File metadata
- Download URL: bconv-1.2.1-py3-none-any.whl
- Upload date:
- Size: 42.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.1 CPython/3.10.6 Linux/5.19.0-38-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
832884c697ec4eea2020587a47014fd5cd27744053bf2c36e0525d36b69a3f97
|
|
MD5 |
169ef5da2746dca2afae58fa9796ab84
|
|
BLAKE2b-256 |
cef978dd23cb5f496d5c3f06dbf73ec9190ab7a31580448187f578adfd3e943f
|