Skip to main content

A fast and simple NER tool

Project description

Quickner ⚡

A simple, fast, and easy to use NER annotator for Python

PyPI version License PyPI - Downloads Build Status

Showcase

Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition). It is written in Rust and accessible through a Python API.

Quickner is blazing fast, simple to use, and easy to configure using a TOML file.

Installation

# Create a virtual environment
python3 -m venv env
source env/bin/activate

# Install quickner
pip install quickner # or pip3 install quickner

Usage

Using the config file

from quickner import Quickner, Config

config = Config(path="config.toml") # or Config() if the config file is in the current directory

# Initialize the annotator
quick = Quickner(config=config)

# Annotate the texts using the config file
quick.process() # or annotator.process(True) to save the annotated data to a file

Using Documents

from quickner import Quickner, Document

# Create documents
doc_1 = Document("rust is made by Mozilla")
doc_2 = Document("Python was created by Guido van Rossum")
doc_3 = Document("Java was created by James Gosling")

# Documents can be added to a list
documents = [doc_1, doc_2, doc_3]

# Initialize the annotator

quick = Quickner(documents=documents)
>>> Entities: 0 | Documents: 3 | Annotations: 

Using Documents and Entities

from quickner import Quickner, Document, Entity

# Create documents from texts
texts = (
  "rust is made by Mozilla",
  "Python was created by Guido van Rossum",
  "Java was created by James Gosling at Sun Microsystems",
  "Swift was created by Chris Lattner and Apple",
)
documents = [Document(text) for text in texts]

# Create entities
entities = (
  ("Rust", "PL"),
  ("Python", "PL"),
  ("Java", "PL"),
  ("Swift", "PL"),
  ("Mozilla", "ORG"),
  ("Apple", "ORG"),
  ("Sun Microsystems", "ORG"),
  ("Guido van Rossum", "PERSON"),
  ("James Gosling", "PERSON"),
  ("Chris Lattner", "PERSON"),
)
entities = [Entity(*(entity)) for entity in entities]

# Initialize the annotator
quick = Quickner(documents=documents, entities=entities)
quick.process()

>>> quick
Entities: 6 | Documents: 3 | Annotations: PERSON: 2, PL: 3, ORG: 1
>>> quick.documents 
[Document(id=87e03d58b1ba4d72, text=rust is made by Mozilla, label=[(0, 4, PL), (16, 23, ORG)]), Document(id=f1da5d23ef88f3dc, text=Python was created by Guido van Rossum, label=[(0, 6, PL), (22, 38, PERSON)]), Document(id=e4324f9818e7e598, text=Java was created by James Gosling, label=[(0, 4, PL), (20, 33, PERSON)])]

Find documents by label

from quickner import Quickner, Document, Entity

# Create documents
doc_1 = Document("rust is made by Mozilla")
doc_2 = Document("Python was created by Guido van Rossum")
doc_3 = Document("Java was created by James Gosling")

# Create entities
rust = Entity("Rust", "PL")
mozilla = Entity("Mozilla", "ORG")
python = Entity("Python", "PL")
guido = Entity("Guido van Rossum", "PERSON")
java = Entity("Java", "PL")
james = Entity("James Gosling", "PERSON")

# Documents and entities can be added to a list
documents = [doc_1, doc_2, doc_3]
entities = [rust, mozilla, python, guido, java, james]

# Initialize the annotator
quick = Quickner(documents=documents, entities=entities)
quick.process()

>>> quick
Entities: 6 | Documents: 3 | Annotations: PERSON: 2, PL: 3, ORG: 1
>>> quick.find_documents("PERSON")
[Document(id=f1da5d23ef88f3dc, text=Python was created by Guido van Rossum, label=[(0, 6, PL), (22, 38, PERSON)]), Document(id=e4324f9818e7e598, text=Java was created by James Gosling, label=[(0, 4, PL), (20, 33, PERSON)])]

Get a Spacy Compatible Generator Object

# Create documents from texts
texts = (
  "rust is made by Mozilla",
  "Python was created by Guido van Rossum",
  "Java was created by James Gosling at Sun Microsystems",
  "Swift was created by Chris Lattner and Apple",
)
documents = [Document(text) for text in texts]

# Create entities
entities = (
  ("Rust", "PL"),
  ("Python", "PL"),
  ("Java", "PL"),
  ("Swift", "PL"),
  ("Mozilla", "ORG"),
  ("Apple", "ORG"),
  ("Sun Microsystems", "ORG"),
  ("Guido van Rossum", "PERSON"),
  ("James Gosling", "PERSON"),
  ("Chris Lattner", "PERSON"),
)
entities = [Entity(*(entity)) for entity in entities]

# Initialize the annotator
quick = Quickner(documents=documents, entities=entities)
quick.process()

# Get a spacy compatible generator object
>>> quick.spacy()
<builtins.SpacyGenerator object at 0x102311440>
# Divide the documents into chunks
>>> chunks = quick.spacy(chunks=2)
>>> for chunk in chunks:
...     print(chunk)
...
[('rust is made by Mozilla', {'entitiy': [(0, 4, 'PL'), (16, 23, 'ORG')]}), ('Python was created by Guido van Rossum', {'entitiy': [(0, 6, 'PL'), (22, 38, 'PERSON')]})]
[('Java was created by James Gosling at Sun Microsystems', {'entitiy': [(0, 4, 'PL'), (20, 33, 'PERSON'), (37, 53, 'ORG')]}), ('Swift was created by Chris Lattner and Apple', {'entitiy': [(0, 5, 'PL'), (21, 34, 'PERSON'), (39, 44, 'ORG')]})]

Single document annotation

from quickner import Document, Entity

# Create a document from a string
rust = Document.from_string("rust is made by Mozilla")
# Create a list of entities
entities = [Entity("Rust", "PL"), Entity("Mozilla", "ORG")]
# Annotate the document with the entities, case_sensitive is set to False by default
rust.annotate(entities, case_sensitive=True)
>>> rust
Document(id=87e03d58b1ba4d72, text=rust is made by Mozilla, label=[(16, 23, ORG)])

Load from file

Initialize the Quickner object from a file containing existing annotations.

Quickner.from_jsonl and Quickner.from_spacy are class methods that return a Quickner object and are able to parse the annotations and entities from a jsonl or spaCy file.

from quickner import Quickner

quick = Quickner.from_jsonl("annotations.jsonl") # load the annotations from a jsonl file
quick = Quickner.from_spacy("annotations.json") # load the annotations from a spaCy file

Configuration

The configuration file is a TOML file with the following structure:

# Configuration file for the NER tool

[general]
# Mode to run the tool, modes are:
# Annotation from the start
# Annotation from already annotated texts
# Load annotations and add new entities

[logging]
level = "debug" # level of logging (debug, info, warning, error, fatal)

[texts]

[texts.input]
filter = false     # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file

[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false              # if true, only strictly alphanumeric texts will be used
case_sensitive = false            # if true, case sensitive search will be used
max_length = 1024                 # maximum length of the text
min_length = 0                    # minimum length of the text
numbers = false                   # if true, texts with numbers will not be used
punctuation = false               # if true, texts with punctuation will not be used
special_characters = false        # if true, texts with special characters will not be used

[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)

[annotations.output]
path = "annotations.jsonl" # path to the output file

[entities]

[entities.input]
filter = true         # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true           # if true, the entities found will be saved in the output file

[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false             # if true, only strictly alphanumeric entities will be used
case_sensitive = false           # if true, case sensitive search will be used
max_length = 20                  # maximum length of the entity
min_length = 0                   # minimum length of the entity
numbers = false                  # if true, entities with numbers will not be used
punctuation = false              # if true, entities with punctuation will not be used
special_characters = true        # if true, entities with special characters will not be used

[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search

License

MOZILLA PUBLIC LICENSE Version 2.0

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Authors

  • [Omar MHAIMDAT]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickner-0.0.1a11.tar.gz (20.4 MB view hashes)

Uploaded Source

Built Distributions

quickner-0.0.1a11-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp311-none-win_amd64.whl (940.0 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

quickner-0.0.1a11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp311-cp311-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (2.3 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

quickner-0.0.1a11-cp310-none-win_amd64.whl (939.9 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

quickner-0.0.1a11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (2.3 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

quickner-0.0.1a11-cp39-none-win_amd64.whl (940.3 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

quickner-0.0.1a11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (2.3 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

quickner-0.0.1a11-cp38-none-win_amd64.whl (940.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

quickner-0.0.1a11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (2.3 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

quickner-0.0.1a11-cp37-none-win_amd64.whl (940.4 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

quickner-0.0.1a11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

quickner-0.0.1a11-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (2.3 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page