A utility for storing and reading files for Korean LM training.
Project description
ko_lm_dataformat
- Utilities for storing data for Korean LM training
- Most of the code are from lm_dataformat
Basic Usage
To write:
ar = Archive('output_dir')
for x in something():
# do other stuff
ar.add_data(somedocument, meta={
'example': stuff,
'someothermetadata': [othermetadata, otherrandomstuff],
'otherotherstuff': True
})
# remember to commit at the end!
ar.commit()
To read:
rdr = Reader('input_dir_or_file')
for doc in rdr.stream_data():
# do something with the document
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ko_lm_dataformat-0.1.0rc1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24895d66fafb3f6078efb6d1a083c35e8f6cb8db4a5fb82cb26285e649ef0596 |
|
MD5 | 96406398a131ad975f27b0d8d190ec24 |
|
BLAKE2b-256 | 79352646bcdbeb81bd770e95ceade08840c8e5716f4e025b90abcf4daa9f5b64 |
Close
Hashes for ko_lm_dataformat-0.1.0rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f12bc7c758f5d0ee5dd78602dc0288ecd6391f88138fa9a1f3c1514d9f84d49c |
|
MD5 | 2d57f1f7971ae6d5b617f02aa6b85f6c |
|
BLAKE2b-256 | bbb7b0f9925458c077cfa79121d89de95e5b67888c9f6581eec7ef63fee04880 |