Randomized fast readline for large text files.
Project description
random-readline
Randomized fast readline for large text files.
Install
pip install random_readline
Usage
from random_readline import readline
# lines are shuffled by default.
n_lines, read = readline("text.txt")
for line in read():
print(line)
Sequencial read
from random_readline import readline
# lines are not shuffled as it is.
n_lines, read = readline("text.txt", shuffle=False)
for line in read():
print(line)
Gzipped file
import gzip
from random_readline import readline
n_lines, read = readline("text.txt.gz", opener=gzip.open)
for line in read():
print(line)
Control the frequency of seeking
Since random seeking can be very slow with gzipped files, the readline function has an option chunk_size
to control the frequency of seeking.
This value is set to 1
by default, which means that a seeking is performed every single line to read the entire file completely at random.
Increasing the value of chunk_size
will reduce the frequency with which seekings are performed, thus improving performance in exchange for randomness.
import gzip
from random_readline import readline
# lines will be randomized by every 100 lines
n_lines, read = readline("text.txt.gz", opener=gzip.open, chunk_size=100)
for line in read():
print(line)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
random_readline-0.1.0.tar.gz
(3.8 kB
view hashes)
Built Distribution
Close
Hashes for random_readline-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8806ca797168a7e998e03584de7e6ac3deb3fc600fe7fd6d522275fd03b7c4e1 |
|
MD5 | f43b0fbe9d92a74ca9a34e3a41aa947a |
|
BLAKE2b-256 | e5122ac205c2539c0c2c9d2ad3f083288d2aaf19cebe221d46c8a2f6c962dcb5 |