Skip to main content

This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.

Project description

Span to IBO

This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.

Usage

python doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>

Input file format

The input file is a jsonl file exported from doccano.

{"text": "東京都渋谷区渋谷 2丁目2−8 渋谷マークシティ", "labels": [[0, 9, "LOC"]]}
{"text": "東京都渋谷区神南 1丁目1−1", "labels": [[0, 7, "LOC"]]}
...

Output file format

The output file is a json file of the following format:

[
    [
        {"word": "東京都", "label": "B-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,東京都,トウキョウト,トーキョート", "BOS": true, "EOS": false},
        {"word": "渋谷区", "label": "I-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,渋谷区,シブヤク,シブヤク", "BOS": false, "EOS": false},
        ...
    ],
    ...,
]

Reference

This program is mainly based on the following repository. https://github.com/ToshihikoSakai/jsontoconll

All mistakes in this script are mine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span_to_ibo-0.1.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

span_to_ibo-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file span_to_ibo-0.1.0.tar.gz.

File metadata

  • Download URL: span_to_ibo-0.1.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure

File hashes

Hashes for span_to_ibo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0db8dc420d54c7dcb0508adcd9ce75f876a31a4023a6ae95e52bc5f4a725e2e9
MD5 093297582092752a99c3aa186c058d03
BLAKE2b-256 3ddac894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5

See more details on using hashes here.

File details

Details for the file span_to_ibo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: span_to_ibo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.4 Linux/5.4.0-1104-azure

File hashes

Hashes for span_to_ibo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06074b8fb2bdd1b39ecf061c80197e9c46b32e47a473a119160c5f2f249d0342
MD5 eb6966ec82683cff004508cef0374d3a
BLAKE2b-256 e09f4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page