Skip to main content

Gives Python the ability to randomly access any chunk of a file quickly, without loading any content into memory, and implements two new dynamic types of file handles.

Project description

Overview

linereader is a python package that gives the user the ability to access files with ease. The linereader package offers several new powerful ways of using files.

Two main new types of file handles are added to linereader:

1- copen, a cache based solution to random file access and dynamic processing

2- dopen, a slower but universal way of random file access and dynamic processing

Random file access and processing with a cache

linereader was meant as a direct substitute to python’s built-in linecache module. With linereader, cached entries are loaded using less memory, and are around 12% faster to access than those of linecache. There are extra utility functions added to linereader to aid in the manipulation of the global cache.

If one wants to get a upgrade of linecache, linereader offers a polymorphic getline function, where:

from linecache import getline can be replaced by: from linereader import getline

and both behave the same way, loading a file’s contents into cached memory.

An example of this usage would be as follows:

from linereader import getline

filename = 'C:/Python34/testfile.txt'

line_1 = getline(filename, 1)
line_2 = getline(filename, 2)

print(line_1, line_2)

In addition to getline, linereader also contains getonce, and copen, which are used as solutions to cache based file access.

Random file access and processing without loading into memory

The problem with file accessing methods that load the entire file into a cache, is that they only work on small files. Usually, a 5GB file cannot be loaded into memory without the python interpreter crashing. Even if a file can be loaded, it slows down the session, and eats up useful memory. A new file handle that was added into linereader, linereader.dopen, works around this problem and can access any line from any size text/logging/data file with consistency. The speed to which the file can be accessed is proportional to the amount of characters that are being read. There exists a slight python overhead when accessing any file line, that takes around 31 microseconds. Using a 10 GB test file, a line consisting of one character was returned in 31 microseconds, and a line containing 135 characters was returned in 97 microseconds.

dopen’s special internals allow for near-identical return speeds on same length lines within the same file. This means that if file a was loaded using dopen, and lines 368 and 290 both contained the same amount of characters, they would take almost the same exact time to return. The way that the dopen handle was made, allows for the ability to quickly jump from one position in a file to the next. Conventional methods of reading from a file have to iterate through all the characters or lines and silently read the content that the user doesn’t want, to pass over and get to the the content that they need.

A simple example of dopen’s usage results as follows:

import linereader

filename = 'C:/Python34/NEWS.txt'
file = linereader.dopen(filename)

header = file.getline(1) + file.getline(2)
line_500 = file.getline(500)
line_38 = file.getline(38)
from_38_to_500 = file.getlines(38, 500)

The usage of dopen gets very advanced, and is actually completely polymorphic with the regular open() handle:

import linereader

filename = 'C:/Python34/README.txt'
file = linereader.dopen(filename)

file.seek(50)
chars = file.read(10)
file.seek(1337)

chars += file.read(80)
chars += file.readline()

rest = file.readlines()

In addition, dopen also offers powerful methods for the navigation of the file pointers:

from linereader import dopen
file = dopen('C:/Python34/README.txt')

file.seekline(58)
line_58 = file.readline()
next_10_lines = file.readnext(10)
line_67 = file.getline(67)

Contact

If you have any questions or issues regarding linereader, please contact me at:

-npandolfi@wpi.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

linereader-1.0.0.zip (8.6 kB view details)

Uploaded Source

linereader-1.0.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

linereader-1.0.0.win32.exe (199.1 kB view details)

Uploaded Source

File details

Details for the file linereader-1.0.0.zip.

File metadata

  • Download URL: linereader-1.0.0.zip
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for linereader-1.0.0.zip
Algorithm Hash digest
SHA256 b2e6931bc2acce64354c368a6285a0c6eccb100c81b2e575e87cc618a881b77c
MD5 137d3c076c3db9f9132d61268f1ef8e3
BLAKE2b-256 95c7a64aed2e11bfafb068397fa59a43e0e0ed7615d5c718631d92673dad50a4

See more details on using hashes here.

File details

Details for the file linereader-1.0.0.tar.gz.

File metadata

  • Download URL: linereader-1.0.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for linereader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1ed1d11034e7e59d463264605dd71a791e762aebfe3adc5637f8c2076b3de728
MD5 d2d71b79eac5e0eb80fe5043d91d8e2a
BLAKE2b-256 a2dd3b99fda477a51c54f5544b9074534e03991e17ac713fd1cc2640ffc8e38c

See more details on using hashes here.

File details

Details for the file linereader-1.0.0.win32.exe.

File metadata

File hashes

Hashes for linereader-1.0.0.win32.exe
Algorithm Hash digest
SHA256 4b2bcb6cad7f830bb4796276dabee3abb4900bff0638182f5595130441deb2a0
MD5 76c31eeec8140a4e94c5710973aa788b
BLAKE2b-256 54d541d22b2d5ad50d054c1843e73f0c85292342784e915ca8fe13efed88ab8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page