Skip to main content

lightsmile's nlp label library

Project description

lightLabel

一个自己使用的标注系统(后端)

0. 声明

本项目目前只计划开放当前版本源码,以后的代码应该会闭源

1. 简介

该标注系统主要用于简单的标注任务,数据的标注信息会实时同步到数据库中,目前已经基本实现了文本分类任务(样本类别数量不大)。

2. 功能特性

  • 系统各功能层次耦合度较低,将标注任务抽象成资源类。
  • 将数据库访问功能也封装了抽象类,并且提供了MongoDB的默认实现,理论上可以自由添加其他数据库的实现。
  • 采用flask作为web服务框架,较严格采用restful设计风格,系统会为每个标注任务都提供相关restAPI接口。

3. 使用示例

示例代码

from lightlabel import Engine, TextClassification

text_cls = TextClassification('ttt_demo', 'des_demo')
text_cls.add_classes(['唐朝人物', '虚拟人物', '三国人物'])
text_cls.update_from_csv(r'C:\Users\Alienware\Desktop\text_classification_demo.csv', headers=['word'])
engine = Engine()
engine.add_plan(text_cls)
engine.run()

运行结果

 * Running on http://localhost:5000/ (Press CTRL+C to quit)
{'ttt_demo'}
 * Serving Flask app "lightlabel.web.engine" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
127.0.0.1 - - [21/Feb/2020 18:24:52] "GET /ttt_demo/items HTTP/1.1" 200 -
127.0.0.1 - - [21/Feb/2020 18:25:31] "GET /ttt_demo/data HTTP/1.1" 200 -
127.0.0.1 - - [21/Feb/2020 18:30:57] "GET /project_lists HTTP/1.1" 200 -
127.0.0.1 - - [21/Feb/2020 18:30:58] "GET /favicon.ico HTTP/1.1" 200 -

text_classification_demo.csv文件中内容

李白
曹操
夏侯惇
张飞
周瑜
陆逊
司马懿

数据库中该文本分类任务的数据内容(数据略有不符,因为我标注了几例)

/* 1 */
{
    "_id" : ObjectId("5e4d411fc827b0709d554149"),
    "check_status" : false,
    "labeled_data" : "唐朝人物",
    "labeled_status" : true,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "李白"
    },
    "updated_time" : null
}

/* 2 */
{
    "_id" : ObjectId("5e4d411fc827b0709d55414b"),
    "check_status" : false,
    "labeled_data" : "三国人物",
    "labeled_status" : true,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "曹操"
    },
    "updated_time" : null
}

/* 3 */
{
    "_id" : ObjectId("5e4d411fc827b0709d55414d"),
    "check_status" : false,
    "labeled_data" : "三国人物",
    "labeled_status" : true,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "夏侯惇"
    },
    "updated_time" : null
}

/* 4 */
{
    "_id" : ObjectId("5e4d411fc827b0709d55414f"),
    "check_status" : false,
    "labeled_data" : null,
    "labeled_status" : false,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "张飞"
    },
    "updated_time" : null
}

/* 5 */
{
    "_id" : ObjectId("5e4d411fc827b0709d554151"),
    "check_status" : false,
    "labeled_data" : null,
    "labeled_status" : false,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "周瑜"
    },
    "updated_time" : null
}

/* 6 */
{
    "_id" : ObjectId("5e4d411fc827b0709d554153"),
    "check_status" : false,
    "labeled_data" : null,
    "labeled_status" : false,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "陆逊"
    },
    "updated_time" : null
}

/* 7 */
{
    "_id" : ObjectId("5e4d411fc827b0709d554155"),
    "check_status" : false,
    "labeled_data" : null,
    "labeled_status" : false,
    "labeled_user" : null,
    "raw_data" : {
        "word" : "司马懿"
    },
    "updated_time" : null
}

数据库中该文本分类任务对应的任务信息

/* 1 */
{
    "_id" : ObjectId("5e4d411fc827b0709d554143"),
    "description" : "des_demo",
    "label_status" : true,
    "task_type" : "TextClassification",
    "title" : "ttt_demo",
    "data" : {
        "classes" : [ 
            "唐朝人物", 
            "虚拟人物", 
            "三国人物"
        ]
    },
    "data_path" : [ 
        "C:\\Users\\Alienware\\Desktop\\text_classification_demo.csv"
    ]
}

rest接口

  • http://localhost:5000/project_lists:返回所有任务信息
  • http://localhost:5000/ttt_demo/items:该任务所有标注条目
  • http://localhost:5000/ttt_demo/data:该任务相关数据,如在该例中为所有标签类别组成的列表

具体如下截图:

UTOOLS1582280855124.png

UTOOLS1582280902325.png

UTOOLS1582280929192.png

4. 参考

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightLabel-0.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

lightLabel-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file lightLabel-0.1.0.tar.gz.

File metadata

  • Download URL: lightLabel-0.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.19.5 CPython/3.7.0

File hashes

Hashes for lightLabel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 15b419966c54f04c7f2c0135789501dfa342f887fdc305d786daba796f741270
MD5 90f5b298405c0540b727ecf850015701
BLAKE2b-256 bc1675fe0ddf4231dc11ed4d5f8b614f118312bd6aa222a294a475e8b7a635a4

See more details on using hashes here.

File details

Details for the file lightLabel-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lightLabel-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.19.5 CPython/3.7.0

File hashes

Hashes for lightLabel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3671a451c90546c0db418942e2d498a37991ce2fc89da805a8d02a3763ecc96a
MD5 cc9fb07d9409036ebd4536c2d1df6a9d
BLAKE2b-256 273e08540c3da320e6e966bf36e737b8990d7f02bb7d42e486bd566607d44001

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page