Skip to main content

Automated Custom NER tool

Project description

fastent
=======

Fastent is a tool designed for creating end to end Custom Named Entity
Recognition models. Entities **ARE NOT** limited to the usual predefiend
classes of Person(PER), Location(LOC),
Companies/agencies/institutions(ORG) etc etc. Any custom entity that can
be described using a list of words can be created.

The package is comprised of several modules that can be used both
sperately for their designated tasks (i.e Anotation, contextualization,
etc etc.) or in a combined workflow. Most of the modules offer
multilingual support, meaning the datasets and text don't necessarily
require English language.

Table of contents
=================

.. raw:: html

<!--ts-->

- `Installation <#installation>`__
- `Usage <#usage>`__

- `Dataset generation <#Dataset-Generation>`__
- `Contextualization <#Contextualization>`__
- `Api for model download <#Api>`__
- `Annotation <#Annotation>`__
- `Text utilities <#Text-utilities>`__
- `wordnet utilities <#Wordnet>`__
- `Poincare embeddings wrapper <#Poincare>`__
- `Combinging everyting <#combo>`__

- `Baselines <#tests>`__
- `Dependency <#dependency>`__

Installation
============

This section show the process for installing the package with different
methods

From source
~~~~~~~~~~~

1) lets start by cloning the package

::

git clone https://github.com/fastent/fastent.git

2) Installing all the relevant packages

::

pip install -r requirements.txt

3) Install couchDB

Update the current packages

::

sudo apt-get update

Adding PPA Repository

::

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:couchdb/stable
sudo apt-get update

Installing CouchDB

::

sudo apt-get install couchdb

Ownership changes (recommended to fix the permission)

::

sudo chown -R couchdb:couchdb /usr/bin/couchdb /etc/couchdb /usr/share/couchdb

Once this is completed we need to fix the permissions

::

sudo chmod -R 0770 /usr/bin/couchdb /etc/couchdb /usr/share/couchdb

Restarting CouchDB

::

sudo systemctl restart couchdb

couchDB can now be accessed from http://127.0.0.1:5984/\_utils/

4) Now you need to install NLTK dependencies.

::

>>> import nltk
>>> nltk.download()

The minimum installation requires to download the *stopwords* corpora.
(Feel free to add more if you feel so)

From pip
~~~~~~~~

Coming Soon

Usage
=====

Dataset generation
~~~~~~~~~~~~~~~~~~

The module includes a possibility to generate a dataset for raw entity
words. Example command looks as this if using source

::

python dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin

If using the package is installed

::

from fastent import dataset_pseudo_generator

model = dataset_pseudo_generator.spacy_initialize(model_name)
dataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastent-0.7.3.tar.gz (16.9 kB view details)

Uploaded Source

File details

Details for the file fastent-0.7.3.tar.gz.

File metadata

  • Download URL: fastent-0.7.3.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastent-0.7.3.tar.gz
Algorithm Hash digest
SHA256 0cd3f60b8d9d74a4014f0a90904e735e7ee45ede1d7a6ac989604aca8f2f14ca
MD5 c338632444a4acdc7e21e7bfbbf37e83
BLAKE2b-256 3463a716de02d845675d05df7d44f444f7dce24844ad644262847470ffccbeec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page