Skip to main content

A small Python library for the NLP Interchange Format (NIF)

Project description

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. It offers a standard representation of annotated texts for tasks such as Named Entity Recognition or Entity Linking. It is used by GERBIL to run reproducible evaluations of annotators.

This Python library can be used to serialize and deserialized annotated corpora in NIF.


NIF Documentation

Supported NIF versions

NIF 2.1, serialized in any of the formats supported by rdflib


This library is revolves around three core classes: * a NIFContext is a document (a string); * a NIFPhrase is the annotation of a snippet of text (usually a phrase) in a document; * a NIFCollection is a set of documents, which constitutes a collection. In NIF, each of these objects is identified by a URI, and their attributes and relations are encoded by RDF triples between these URIs. This library abstracts away the encoding by letting you manipulate collections, contexts and phrases as plain Python objects.


  1. Import and create a collection
from pynif import NIFCollection

collection = NIFCollection(uri="")
  1. Create a context
context = collection.add_context(
    mention="Diego Maradona is from Argentina.")
  1. Create entries for the entities
    taClassRef=['', '', ''],

    taClassRef=['', '',
  1. Finally, get the output with the format that you need
generated_nif = collection.dumps(format='turtle')

You will obtain the NIF representation as a string:

<> a nif:ContextCollection ;
    nif:hasContext <> ;
    ns1:conformsTo <> .

<> a nif:Context,
        nif:OffsetBasedString ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "33"^^xsd:nonNegativeInteger ;
    nif:isString "Diego Maradona is from Argentina." .

<> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Diego Maradona" ;
    nif:beginIndex "0"^^xsd:nonNegativeInteger ;
    nif:endIndex "14"^^xsd:nonNegativeInteger ;
    nif:referenceContext <> ;
    nif:taMsClassRef <> ;
    itsrdf:taAnnotatorsRef <> ;
    itsrdf:taClassRef <>,
        <> ;
    itsrdf:taConfidence 9.869993e-01 ;
    itsrdf:taIdentRef <> .

<> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Argentina" ;
    nif:beginIndex "23"^^xsd:nonNegativeInteger ;
    nif:endIndex "32"^^xsd:nonNegativeInteger ;
    nif:referenceContext <> ;
    nif:taMsClassRef <> ;
    itsrdf:taAnnotatorsRef <> ;
    itsrdf:taClassRef <>,
        <> ;
    itsrdf:taConfidence 9.804964e-01 .
  1. You can then parse it back:
parsed_collection = NIFCollection.loads(generated_nif, format='turtle')

for context in parsed_collection.contexts:
   for phrase in context.phrases:


If you have any problems with or questions about this library, please contact us through a GitHub issue.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynif-0.2.0.tar.gz (17.2 kB view hashes)

Uploaded source

Built Distribution

pynif-0.2.0-py2.py3-none-any.whl (19.0 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page