A utility for storing and reading files for LM training.
Project description
LM_Dataformat
Utilities for storing data for LM training.
Basic Usage
To write:
ar = Archive('output_dir')
for x in something():
# do other stuff
ar.add_data(somedocument, meta={
'example': stuff,
'someothermetadata': [othermetadata, otherrandomstuff],
'otherotherstuff': True
})
# remember to commit at the end!
ar.commit()
To read:
rdr = Reader('input_dir_or_file')
for doc in rdr.stream_data():
# do something with the document
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lm_dataformat-0.0.20.tar.gz
(5.7 kB
view details)
Built Distribution
File details
Details for the file lm_dataformat-0.0.20.tar.gz
.
File metadata
- Download URL: lm_dataformat-0.0.20.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0016165b34d8f004753ac265348c3525532e55088f6c9c160f3597e660207145 |
|
MD5 | f7d4f7bfa95ee4e716cf3173abb87509 |
|
BLAKE2b-256 | 759dbd07ed572bde0e1c0eefb8c3ef9ca2fb31592946e82fef0ce902534e55e0 |
File details
Details for the file lm_dataformat-0.0.20-py3-none-any.whl
.
File metadata
- Download URL: lm_dataformat-0.0.20-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 247468181c9c2fea33a663cdb2f6fea489ddf6741d216fe6b466e60f002705af |
|
MD5 | a0283cd76814e6c47d6c92947a4f7a26 |
|
BLAKE2b-256 | 7c5bddcd67b81e0a41c367c1e52e761fd6071407985cc67f621d5e16c5209ff0 |