This is a python library to parse files, it's giving tools to easily read a file with efficiency. It's based on linux commands like grep, sed, cat, head, tail and tested with them.
Project description
file utils
Table of contents
Examples:
-
WithEOL: python:
-
WithCustomDelims: python:
Intro
This package allows to read/parse a file in python. When should we use this package? If your file is really big (> 100 000 lines), because if you want to parse a file in python you'll write:
f = open("my_file", "r")
buffer: str = f.read()
...
or:
f = open("my_file", "r")
for line in f.readlines():
...
- With the first one, there is a memory issue because you must save the entire file into a buffer.
- With the second one, there is a time issue because a loop can be very slow in python.
So, this package gives tools to easily read a file with efficiently. It's based on Linux tools like grep, sed, cat, head, tail and tested with them.
WithEOL class as the same memory problem as the first example. If you want to resolve it, you must use WithCustomDelims with the "\n" delimiter.
So, why I keep WithEOL?
WithEOL is helping me to test the code, it's using a built in rust function and I'm using it as a reference to compare with WithCustomDelims.
Installation
python
With pypi:
pip install file-utils
From source:
maturin develop
Before-starting
This package is ASCII/UTF-8 compliant, all others encoded files will not work...
Arguments-explaination
- path: the path to the file
- remove_empty_string: ignore the empty string "[ ]*"
- n: get n lines with tail/head
- n1: the beginning line to take with between
- n2: the last line to take with between
- restrict: if enable, if we have last N lines, it just keep the regex in those lines. If not enable, it takes last N regex
with regex:
- regex_keep: list of regex to keep
- regex_pass: list of regex to pass/ignore
WithEOL-python
Example-file
We will use this example file test.txt
With cat -e test.txt:
[Warning]:Entity not found$
[Error]:Unable to recover data$
[Info]:Segfault$
[Warning]:Indentation$
[Error]:Memory leaks$
[Info]:Entity not found$
[Warning]:Unable to recover data$
$
[Error]:Segfault$
[Info]:Indentation$
[Warning]:Memory leaks$
Example-simple-head-python
1\ Simple head (can be change to tail) Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 2 # Number of lines to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=n)
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data']
Example-simple-tail-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 2 # Number of lines to read
try:
tail: list = file_utils_lib.WithEOL.tail(path=path, n=n)
print(tail)
except:
print("Unable to open/read the file")
Stdout:
['[Info]:Indentation', '[Warning]:Memory leaks']
Example-simple-between-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
n1: int = 2 # First line to read
n2: int = 4 # Last line to read
try:
between: list = file_utils_lib.WithEOL.between(path=path, n1=n1, n2=n2)
print(between)
except:
print("Unable to open/read the file")
Stdout:
['[Error]:Unable to recover data', '[Info]:Segfault', '[Warning]:Indentation']
Example-simple-parse-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
try:
parse: list = file_utils_lib.WithEOL.parse(path=path)
print(parse)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data', '[Info]:Segfault', '[Warning]:Indentation', '[Error]:Memory leaks', '[Info]:Entity not found', '[Warning]:Unable to recover data', ' ', '[Error]:Segfault', '[Info]:Indentation', '[Warning]:Memory leaks']
Example-simple-count_lines-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
try:
count: list = file_utils_lib.WithEOL.count_lines(path=path)
print(count)
except:
print("Unable to open/read the file")
Stdout:
11
Example-remove_empty_string-python
With remove_empty_string enable:
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
tail: list = file_utils_lib.WithEOL.tail(path=path, n=n, remove_empty_string=True)
print(tail)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Unable to recover data', '[Error]:Segfault', '[Info]:Indentation', '[Warning]:Memory leaks']
With remove_empty_string disable (default option):
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
tail: list = file_utils_lib.WithEOL.tail(path=path, n=n, remove_empty_string=False)
print(tail)
except:
print("Unable to open/read the file")
Stdout:
[' ', '[Error]:Segfault', '[Info]:Indentation', '[Warning]:Memory leaks']
Example-regex_keep-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=n, remove_empty_string=False, regex_keep=["\[Warning\]:*", "\[Error\]:*"])
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data', '[Warning]:Indentation']
Why there is just 3 elements instead of 4? You should look at the restrict option
Example-regex_pass-python
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=n, remove_empty_string=False, regex_pass=["\[Warning\]:*", "\[Error\]:*"])
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Info]:Segfault']
Why there is just 3 elements instead of 4? You should look at the restrict option
Example-restrict-python
With restrict disable:
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=4, remove_empty_string=False, regex_keep=["\[Warning\]:*", "\[Error\]:*"], restrict=False)
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data', '[Warning]:Indentation', '[Error]:Memory leaks']
With restrict enbale(default):
Code:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 4 # First line to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=4, remove_empty_string=False, regex_keep=["\[Warning\]:*", "\[Error\]:*"], restrict=True)
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data', '[Warning]:Indentation']
WithCustomDelims-python
How-to-use-it-python
It it like WithEOL but with a list of custom delimiter. For example:
import file_utils_lib
path: str = "my_path_to_file"
n: int = 2 # Number of lines to read
try:
head: list = file_utils_lib.WithEOL.head(path=path, n=n)
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data']
has the same behavious as
import file_utils_lib
path: str = "my_path_to_file"
n: int = 2 # Number of lines to read
try:
head: list = file_utils_lib.WithCustomDelims.head(path=path, n=n, delimiter=['\n])
print(head)
except:
print("Unable to open/read the file")
Stdout:
['[Warning]:Entity not found', '[Error]:Unable to recover data']
So, you use it as same as WithEOL but with a list of custom delimiter.
What-delim-can-be-used
All string can be used like:
- ";"
- "abc"
- "éà"
- ::
- "小六号"
- "毫"
With-more-than-one-delimiter
If my file contains:
;À ;la ;;
pêche éèaux moules, @moules, ::小六号moules::Je n'veux小六号 plus ::y
aller éèmaman小六号
We'll have with ";", "\n", "éè", "@", "小六号", "::"
import file_utils_lib
path: str = "my_path_to_file"
try:
parse: list = file_utils_lib.WithCustomDelims.parse(path=path, delimiter=[";", "\n", "éè", "@", "::"])
print(parse)
except:
print("Unable to open/read the file")
Stdout
['', 'À ', 'la ', '', '', 'pêche ', 'aux moules, ', 'moules, ', '', 'moules', "Je n'veux", ' plus ', 'y ', 'aller ', 'maman', '']
Python-class
If we translate the rust into python, we'll have:
class WithEOL:
# head: Read the n first lines
# if n > (numbers of lines in the file) => return the whole file
def head(path: str, n: int, \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True):
...
# between: Read the lines [n1, n2]
# if n1 > n2 => return an empty list
# if n1 > (numbers of lines in the file) => return an empty list
def between(path: str, n1: int, n2: int \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True):
...
# tail: Read the n last lines
# if n > (numbers of lines in the file) => return the whole file
def tail(path: str, n: int, \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True):
...
# parse: Read the whole file
def parse(path: str, \
remove_empty_string: bool = False \
regex_keep: list = [] \
regex_pass: list = []):
...
# Count the number of lines
def count_lines(path: str \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = []):
...
class WithCustomDelims:
# head: Read the n first lines
# if n > (numbers of lines in the file) => return the whole file
def head(path: str, n: int, delimiter: list \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True \
buffer_size: int = 1024):
...
# between: Read the lines [n1, n2]
# if n1 > n2 => return an empty list
# if n1 > (numbers of lines in the file) => return an empty list
def between(path: str, n1: int, n2: int, delimiter: list \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True \
buffer_size: int = 1024):
...
# tail: Read the n last lines
# if n > (numbers of lines in the file) => return the whole file
def tail(path: str, n: int, delimiter: list \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
restrict: bool = True \
buffer_size: int = 1024):
...
# parse: Read the whole file
def parse(path: str, delimiter: list \
remove_empty_string: bool = False \
regex_keep: list = [] \
regex_pass: list = [] \
buffer_size: int = 1024):
...
# Count the number of lines
def count_lines(path: str, delimiter: list \
remove_empty_string: bool = False, \
regex_keep: list = [] \
regex_pass: list = [] \
buffer_size: int = 1024):
...
Structure
- src/: all sources files
- tests/: all tests for rust
- tests_files/: all files used for tests
- tests_python/: a python script to test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_utils-0.2.1.tar.gz.
File metadata
- Download URL: file_utils-0.2.1.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fed705421bde43bc1db9d1bccd4bd525dda183a530ccbd42527cb7e378b7446
|
|
| MD5 |
ebb19ef930681fb10bef4936d40ffc24
|
|
| BLAKE2b-256 |
8d3e505b77b84a28fff26b133ef7c7b7e897486dca911dd86080a9b93ba13855
|
File details
Details for the file file_utils-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: file_utils-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3aa5fb22c7dddc02d82db948d3ad4cb53eaf17036860552db83e220ccdfe864
|
|
| MD5 |
39f697d3f8e2b3c088002540784cf035
|
|
| BLAKE2b-256 |
84fe13aa51730d435fc6c83678a461b810f838e0fe4ff3d4e7030ee0dbe5b6b0
|