Skip to main content

Randomized fast readline for large text files.

Project description

random-readline

Randomized fast readline for large text files.

Install

pip install random_readline

Usage

from random_readline import readline

# lines are shuffled by default.
n_lines, read = readline("text.txt")

for line in read():
    print(line)

Sequencial read

from random_readline import readline

# lines are not shuffled as it is.
n_lines, read = readline("text.txt", shuffle=False)

for line in read():
    print(line)

Gzipped file

import gzip
from random_readline import readline

n_lines, read = readline("text.txt.gz", opener=gzip.open)

for line in read():
    print(line)

Control the frequency of seeking

Since random seeking can be very slow with gzipped files, the readline function has an option chunk_size to control the frequency of seeking.

This value is set to 1 by default, which means that a seeking is performed every single line to read the entire file completely at random.

Increasing the value of chunk_size will reduce the frequency with which seekings are performed, thus improving performance in exchange for randomness.

import gzip
from random_readline import readline

# lines will be randomized by every 100 lines
n_lines, read = readline("text.txt.gz", opener=gzip.open, chunk_size=100)

for line in read():
    print(line)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random_readline-0.1.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

random_readline-0.1.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file random_readline-0.1.0.tar.gz.

File metadata

  • Download URL: random_readline-0.1.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.9

File hashes

Hashes for random_readline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e7957a6f1f32c150477cfa8ba7242f62eaeb6896dbcee576cde797581d12a28a
MD5 36c47b9c9d5631941693977fa8219e34
BLAKE2b-256 e45497e5c1864c6e17f4421f432c4a7d8915de243a50e3e1c2e7ab67cc780060

See more details on using hashes here.

File details

Details for the file random_readline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for random_readline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8806ca797168a7e998e03584de7e6ac3deb3fc600fe7fd6d522275fd03b7c4e1
MD5 f43b0fbe9d92a74ca9a34e3a41aa947a
BLAKE2b-256 e5122ac205c2539c0c2c9d2ad3f083288d2aaf19cebe221d46c8a2f6c962dcb5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page