Language Human Annotation
Project description
LangHuAn
Language Human Annotations, a frontend for tagging AI project labels, drived by pandas dataframe data.
From Chinese word 琅嬛[langhuan] (Legendary realm where god curates books)
Here's a 5 minutes youtube video explaining how langhuan works
Installation
pip install langhuan
Minimun configuration walk through
langhuan start a flask application from pandas dataframe 🐼 !
Simplest configuration for NER task 🚀
from langhuan import NERTask
app = NERTask.from_df(
df, text_col="description",
options=["institution", "company", "name"])
app.run("0.0.0.0", port=5000)
Simplest configuration for Classify task 🚀
from langhuan import ClassifyTask
app = ClassifyTask.from_df(
df, text_col="comment",
options=["positive", "negative", "unbiased", "not sure"])
app.run("0.0.0.0", port=5000)
Frontend
You can visit following pages for this app.
Tagging
http://[ip]:[port]/
is for our hard working taggers to visit.
Admin
http://[ip]:[port]/admin
is a page where you can 👮🏽♂️:
- See the progress of each user.
- Force save the progress, (or it will only save according to
save_frequency
, default 42 entries) - Download the tagged entries
Advanced settings
Validation
You can set minimun verification number: cross_verify_num
, aka, how each entry will be validated, default is 1
If you set cross_verify_num
to 2, and you have 5 taggers, each entry will be seen by 2 taggers
app = ClassifyTask.from_df(
df, text_col="comment",
options=["positive", "negative", "unbiased", "not sure"],
cross_verify_num=2,)
Preset the tagging
You can set a column in dataframe, eg. called guessed_tags
, to preset the tagging result.
Each cell can contain the format of tagging result, eg.
{"tags":[
{"text": "Genomicare Bio Tech", "offset":32, "label":"company"},
{"text": "East China University of Politic Science & Law", "offset":96, "label":"company"},
]}
Then you can run the app with preset tag column
app = NERTask.from_df(
df, text_col="description",
options=["institution", "company", "name"],
preset_tag_col="guessed_tags")
app.run("0.0.0.0", port=5000)
Order strategy
The order of which text got tagged first is according to order_strategy.
Default is set to "forward_match"
, you can try pincer
or trident
Assume the order_by_column is set to the prediction of last batch of deep learning model:
- trident means the taggers tag the most confident positive, most confident negative, most unsure ones first.
Load History
If your service stopped, you can recover the progress from cache.
Previous cache will be at $HOME/.cache/langhuan/{task_name}
You can change the save_frequency to suit your task, default is 42 entries.
app = NERTask.from_df(
df, text_col="description",
options=["institution", "company", "name"],
save_frequency=128,
load_history=True,
task_name="task_NER_210123_110327"
)
Admin Control
This application assumes internal use within organization, hence the mininum security. If you set admin_control, all the admin related page will require
adminkey
, the key will appear in the console prompt
app = NERTask.from_df(
df, text_col="description",
options=["institution", "company", "name"],
admin_control=True,
)
From downloaded data => pytorch dataset
For downloaded NER data tags, you can create a dataloader with the json file automatically:
- pytorch + huggingface tokenizer
- tensorflow + huggingface tokenizer, development pending
Gunicorn support
This is a light weight solution. When move things to gunicorn, multithreads is acceptable, but multiworkers will cause chaos.
gunicorn --workers=1 --threads=5 app:app
Compatibility 💍
Well, this library hasn't been tested vigorously against many browsers with many versions, so far
- compatible with chrome, firefox, safari if version not too old.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file langhuan-0.1.12.tar.gz
.
File metadata
- Download URL: langhuan-0.1.12.tar.gz
- Upload date:
- Size: 736.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17b90e7282c3f848d685b164e18b42419ede9b015352ece1838e72283612825e |
|
MD5 | 8d71de3baa34fbd6c5351e9e65acc3e6 |
|
BLAKE2b-256 | e9ab5bb42a04d5bb5d16d089fb90f3a621147c03c65f393528ca9c0fb1653e05 |
File details
Details for the file langhuan-0.1.12-py3-none-any.whl
.
File metadata
- Download URL: langhuan-0.1.12-py3-none-any.whl
- Upload date:
- Size: 745.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c60f961109244112b8ddc18b57aba818e699590efafe145343fe120ff530bcb |
|
MD5 | 5571e70b2d2dece92f802a57fefc6a57 |
|
BLAKE2b-256 | 142919ffb9e518873306b32b90107cd6bf68a1d775ecc3d4b8bff8da7642f8f8 |