Skip to main content

Convert between BioNLP formats

Project description

bconv: Python library for converting between BioNLP formats

bconv offers format conversion and manipulation of documents with text and annotations. It supports various popular formats used in natural-language processing for biomedical texts.

Supported formats

The following formats are currently supported:

Name I O T A Description
bioc_xml, bioc_json BioC
bionlp BioNLP stand-off
brat brat stand-off
conll CoNLL
europepmc, europepmc.zip Europe-PMC JSON
pubtator, pubtator_fbk PubTator
pubmed, pxml PubMed abstracts
pmc, nxml PMC full-text
pubanno_json, pubanno_json.tgz PubAnnotation JSON
csv, tsv comma/tab-separated values
text_csv, text_tsv comma/tab-separated values
txt plain text
txt.json collection of plain-text documents

I: input format; O: output format; T: can represent text; A: can represent annotations (entities).

Installation

bconv is hosted on PyPI, so you can use pip to install it:

$ pip install bconv

Usage

Load an annotated collection in BioC XML format:

>>> import bconv
>>> coll = bconv.load('path/to/example.xml', fmt='bioc_xml')
>>> coll
<Collection with 37 documents at 0x7f1966e4b3c8>

A Collection is a sequence of Document objects:

>>> coll[0]
<Document with 12 sections at 0x7f1966e2f6d8>

Documents contain Sections, which contain Sentences:

>>> sent = coll[0][3][5]
>>> sent.text
'A Live cell imaging reveals that expression of GFP‐KSHV‐TK, but not GFP induces contraction of HeLa cells.'

Find the first annotation for this sentence:

>>> e = next(sent.iter_entities())
>>> e.start, e.end, e.text
(571, 578, 'KSHV‐TK')
>>> e.metadata
{'type': 'gene/protein', 'ui': 'Uniprot:F5HB62'}

Write the whole collection to a new file in CoNLL format:

>>> with open('path/to/example.conll', 'w', encoding='utf8') as f:
...     bconv.dump(coll, f, fmt='conll', tagset='IOBES', include_offsets=True)

Documentation

bconv is documented in the GitHub wiki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bconv-1.2.1.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

bconv-1.2.1-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file bconv-1.2.1.tar.gz.

File metadata

  • Download URL: bconv-1.2.1.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.6 Linux/5.19.0-38-generic

File hashes

Hashes for bconv-1.2.1.tar.gz
Algorithm Hash digest
SHA256 123ced08350eb6d74fc2f0d86c407f77f2c89ae46dbd46895186ab639b52fd4d
MD5 ae5a733876ab9a434267299effe63977
BLAKE2b-256 7ec7ac9b87c1fc51b81f57347cb13695d578dd478fb586c06bf94212affffca2

See more details on using hashes here.

File details

Details for the file bconv-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: bconv-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.6 Linux/5.19.0-38-generic

File hashes

Hashes for bconv-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 832884c697ec4eea2020587a47014fd5cd27744053bf2c36e0525d36b69a3f97
MD5 169ef5da2746dca2afae58fa9796ab84
BLAKE2b-256 cef978dd23cb5f496d5c3f06dbf73ec9190ab7a31580448187f578adfd3e943f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page