This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.
Project description
Span to IBO
This is a script to convert the output file of doccano
to a format that is easy to handle with sklearn-crfsuite
.
Usage
python doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>
Input file format
The input file is a jsonl file exported from doccano
.
{"text": "東京都渋谷区渋谷 2丁目2−8 渋谷マークシティ", "labels": [[0, 9, "LOC"]]}
{"text": "東京都渋谷区神南 1丁目1−1", "labels": [[0, 7, "LOC"]]}
...
Output file format
The output file is a json file of the following format:
[
[
{"word": "東京都", "label": "B-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,東京都,トウキョウト,トーキョート", "BOS": true, "EOS": false},
{"word": "渋谷区", "label": "I-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,渋谷区,シブヤク,シブヤク", "BOS": false, "EOS": false},
...
],
...,
]
Reference
This program is mainly based on the following repository. https://github.com/ToshihikoSakai/jsontoconll
All mistakes in this script are mine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
span_to_ibo-0.1.0.tar.gz
(4.5 kB
view details)
Built Distribution
File details
Details for the file span_to_ibo-0.1.0.tar.gz
.
File metadata
- Download URL: span_to_ibo-0.1.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0db8dc420d54c7dcb0508adcd9ce75f876a31a4023a6ae95e52bc5f4a725e2e9 |
|
MD5 | 093297582092752a99c3aa186c058d03 |
|
BLAKE2b-256 | 3ddac894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5 |
File details
Details for the file span_to_ibo-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: span_to_ibo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06074b8fb2bdd1b39ecf061c80197e9c46b32e47a473a119160c5f2f249d0342 |
|
MD5 | eb6966ec82683cff004508cef0374d3a |
|
BLAKE2b-256 | e09f4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932 |