A utility for storing and reading files for LM training.
Project description
LM_Dataformat
Utilities for storing data for LM training.
Basic Usage
To write:
ar = Archive('output_dir')
for x in something():
# do other stuff
ar.add_data(somedocument, meta={
'example': stuff,
'someothermetadata': [othermetadata, otherrandomstuff],
'otherotherstuff': True
})
# remember to commit at the end!
ar.commit()
To read:
rdr = Reader('input_dir_or_file')
for doc in rdr.stream_data():
# do something with the document
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lm_dataformat-0.0.18.tar.gz
(5.1 kB
view hashes)
Built Distribution
Close
Hashes for lm_dataformat-0.0.18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 133f0c6ec9d7723a5f6eb5355b61d944729d7235f82c81399445d53e05e1a52a |
|
MD5 | 27aee0b7d8ba77f6cd80de8157a1a93e |
|
BLAKE2b-256 | 3b3a5efc9c20f6e37b7991a7af98ef30fdfed74b6eaeb9380ad46b721d3e56a3 |