Skip to main content

Randomized fast readline for large text files.

Project description

random-readline

Randomized fast readline for large text files.

Install

pip install random_readline

Usage

from random_readline import readline

# lines are shuffled by default.
n_lines, read = readline("text.txt")

for line in read():
    print(line)

Sequencial read

from random_readline import readline

# lines are not shuffled as it is.
n_lines, read = readline("text.txt", shuffle=False)

for line in read():
    print(line)

Gzipped file

import gzip
from random_readline import readline

n_lines, read = readline("text.txt.gz", opener=gzip.open)

for line in read():
    print(line)

Control the frequency of seeking

Since random seeking can be very slow with gzipped files, the readline function has an option chunk_size to control the frequency of seeking.

This value is set to 1 by default, which means that a seeking is performed every single line to read the entire file completely at random.

Increasing the value of chunk_size will reduce the frequency with which seekings are performed, thus improving performance in exchange for randomness.

import gzip
from random_readline import readline

# lines will be randomized by every 100 lines
n_lines, read = readline("text.txt.gz", opener=gzip.open, chunk_size=100)

for line in read():
    print(line)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random_readline-0.1.0.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distribution

random_readline-0.1.0-py3-none-any.whl (3.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page