This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.
Project description
Span to IBO
This is a script to convert the output file of doccano
to a format that is easy to handle with sklearn-crfsuite.
Usage
python doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>
Input file format
The input file is a jsonl file exported from doccano.
{"text": "東京都渋谷区渋谷 2丁目2−8 渋谷マークシティ", "labels": [[0, 9, "LOC"]]}
{"text": "東京都渋谷区神南 1丁目1−1", "labels": [[0, 7, "LOC"]]}
...
Output file format
The output file is a json file of the following format:
[
[
{"word": "東京都", "label": "B-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,東京都,トウキョウト,トーキョート", "BOS": true, "EOS": false},
{"word": "渋谷区", "label": "I-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,渋谷区,シブヤク,シブヤク", "BOS": false, "EOS": false},
...
],
...,
]
Reference
This program is mainly based on the following repository. https://github.com/ToshihikoSakai/jsontoconll
All mistakes in this script are mine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file span_to_ibo-0.1.0.tar.gz.
File metadata
- Download URL: span_to_ibo-0.1.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0db8dc420d54c7dcb0508adcd9ce75f876a31a4023a6ae95e52bc5f4a725e2e9
|
|
| MD5 |
093297582092752a99c3aa186c058d03
|
|
| BLAKE2b-256 |
3ddac894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5
|
File details
Details for the file span_to_ibo-0.1.0-py3-none-any.whl.
File metadata
- Download URL: span_to_ibo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06074b8fb2bdd1b39ecf061c80197e9c46b32e47a473a119160c5f2f249d0342
|
|
| MD5 |
eb6966ec82683cff004508cef0374d3a
|
|
| BLAKE2b-256 |
e09f4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932
|