Gives Python the ability to randomly access any chunk of a file quickly, without loading any content into memory, and implements two new dynamic types of file handles.
linereader is a python package that gives the user the ability to access files with ease. The linereader package offers several new powerful ways of using files.
Two main new types of file handles are added to linereader:
1- copen, a cache based solution to random file access and dynamic processing
2- dopen, a slower but universal way of random file access and dynamic processing
Random file access and processing with a cache
linereader was meant as a direct substitute to python’s built-in linecache module. With linereader, cached entries are loaded using less memory, and are around 12% faster to access than those of linecache. There are extra utility functions added to linereader to aid in the manipulation of the global cache.
If one wants to get a upgrade of linecache, linereader offers a polymorphic getline function, where:
from linecache import getline can be replaced by: from linereader import getline
and both behave the same way, loading a file’s contents into cached memory.
An example of this usage would be as follows:
from linereader import getline filename = 'C:/Python34/testfile.txt' line_1 = getline(filename, 1) line_2 = getline(filename, 2) print(line_1, line_2)
In addition to getline, linereader also contains getonce, and copen, which are used as solutions to cache based file access.
Random file access and processing without loading into memory
The problem with file accessing methods that load the entire file into a cache, is that they only work on small files. Usually, a 5GB file cannot be loaded into memory without the python interpreter crashing. Even if a file can be loaded, it slows down the session, and eats up useful memory. A new file handle that was added into linereader, linereader.dopen, works around this problem and can access any line from any size text/logging/data file with consistency. The speed to which the file can be accessed is proportional to the amount of characters that are being read. There exists a slight python overhead when accessing any file line, that takes around 31 microseconds. Using a 10 GB test file, a line consisting of one character was returned in 31 microseconds, and a line containing 135 characters was returned in 97 microseconds.
dopen’s special internals allow for near-identical return speeds on same length lines within the same file. This means that if file a was loaded using dopen, and lines 368 and 290 both contained the same amount of characters, they would take almost the same exact time to return. The way that the dopen handle was made, allows for the ability to quickly jump from one position in a file to the next. Conventional methods of reading from a file have to iterate through all the characters or lines and silently read the content that the user doesn’t want, to pass over and get to the the content that they need.
A simple example of dopen’s usage results as follows:
import linereader filename = 'C:/Python34/NEWS.txt' file = linereader.dopen(filename) header = file.getline(1) + file.getline(2) line_500 = file.getline(500) line_38 = file.getline(38) from_38_to_500 = file.getlines(38, 500)
The usage of dopen gets very advanced, and is actually completely polymorphic with the regular open() handle:
import linereader filename = 'C:/Python34/README.txt' file = linereader.dopen(filename) file.seek(50) chars = file.read(10) file.seek(1337) chars += file.read(80) chars += file.readline() rest = file.readlines()
In addition, dopen also offers powerful methods for the navigation of the file pointers:
from linereader import dopen file = dopen('C:/Python34/README.txt') file.seekline(58) line_58 = file.readline() next_10_lines = file.readnext(10) line_67 = file.getline(67)
If you have any questions or issues regarding linereader, please contact me at:
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.