Skip to main content

A grep clone in Python supporting ANSI color coding and more

Project description

greplica

A grep clone in Python with both CLI and library interfaces, supporting ANSI color coding and more.

Shameless Promotion

Check out my other Python clone tools:

Known Differences with grep

  • The -D, --devices option is not supported and no support is planned. All inputs are handled as file streams only.
  • Context cannot be given as raw number -NUM.
  • The Python module re is internally used for all regular expressions. The inputted regular expression is modified only when basic regular expressions are used. See --help for more information.

Contribution

Feel free to open a bug report or make a merge request on github.

Installation

This project is uploaded to PyPI at https://pypi.org/project/greplica/

To install, ensure you are connected to the internet and execute: python3 -m pip install greplica --upgrade

Once installed, there will be a script called greplica under Python's script directory. If grep is not found on the system, then a script called grep will also be installed. Ensure Python's scripts directory is under the environment variable PATH in order to be able to execute the script properly from command line.

CLI Help

usage: greplica [-E | -F | -G] [-P] [-e EXPRESSIONS] [-f FILE [FILE ...]] [-i]
                [--no-ignore-case] [-w] [-x] [--end END] [-z] [-s] [-v] [-V] [--help]
                [-m NUM] [-b] [-n] [--line-buffered] [-H] [-h] [--label LABEL] [-o] [-q]
                [--binary-files TYPE] [-a] [-I] [-d ACTION] [-r] [-R]
                [--include GLOB [GLOB ...]] [--exclude GLOB [GLOB ...]]
                [--exclude-from FILE [FILE ...]] [--exclude-dir GLOB [GLOB ...]] [-L] [-l]
                [-c] [-T] [-Z] [--result-sep SEP] [--name-num-sep SEP] [--name-byte-sep SEP]
                [--context-group-sep SEP] [--context-result-sep SEP]
                [--context-name-num-sep SEP] [--context-name-byte-sep SEP] [-B NUM] [-A NUM]
                [-C NUM] [--color [WHEN]] [-U]
                [EXPRESSIONS] [FILE [FILE ...]]

Reimplementation of grep command entirely in Python.

positional arguments:
  EXPRESSIONS           Expressions to search for, separated by newline character (\n). This
                        is required if --regexp or --file are not specified.
  FILE                  Files or directories to search. Stdin will be searched if not
                        specified, unless -r is specified. Then current directory will be
                        recursively searched.How directories are handled is controlled by -d
                        and -r options.

Expression Interpretation:
  -E, --extended-regexp
                        EXPRESSIONS are "extended" regular expressions. In this mode,
                        greplica passes regular expressions directly to Python re without
                        modification. This for the most part matches original "extended"
                        syntax, but be aware that there will be differences.
  -F, --fixed-strings   EXPRESSIONS are strings
  -G, --basic-regexp    EXPRESSIONS are "basic" regular expressions. In this mode, greplica
                        modifies escaping sequences for characters ?+{}|() before passing to
                        Python re. This for the most part matches original "basic" syntax,
                        but be aware that there will be differences.
  -P, --perl-regexp     EXPRESSIONS are "perl" regular expressions. In this mode, greplica
                        passes regular expressions directly to Python re without
                        modification. This for the most part matches original "perl" syntax,
                        but be aware that there will be differences.
  -e EXPRESSIONS, --regexp EXPRESSIONS
                        use EXPRESSIONS for matching
  -f FILE [FILE ...], --file FILE [FILE ...]
                        take EXPRESSIONS from FILE
  -i, --ignore-case     ignore case in expressions
  --no-ignore-case      do not ignore case (default)
  -w, --word-regexp     match whole words only
  -x, --line-regexp     match whole lines only
  --end END             end-of-line character for parsing search files (default: \n); this
                        does not affect file parsing for -f or --exclude-from
  -z, --null-data       same as --end='\0'

Miscellaneous:
  -s, --no-messages     suppress error messages
  -v, --invert-match    select non-matching lines
  -V, --version         display version information and exit
  --help                display this help text and exit

Output control:
  -m NUM, --max-count NUM
                        stop after NUM lines
  -b, --byte-offset     print line's byte offset with each line
  -n, --line-number     print line number with each line
  --line-buffered       flush output on each line
  -H, --with-filename   print file name with each line
  -h, --no-filename     suppress the file name output
  --label LABEL         use LABEL as the standard input file name
  -o, --only-matching   show only nonempty parts of lines that match
  -q, --quiet, --silent
                        suppress all normal output
  --binary-files TYPE   sets how binary file is parsed; TYPE is 'binary', 'text', or
                        'without-match'
  -a, --text            same as --binary-files=text
  -I                    same as --binary-files=without-match
  -d ACTION, --directories ACTION
                        controls how directory input is handled in FILE; ACTION is 'read',
                        'recurse', or 'skip'
  -r, --recursive       same as --directories=recurse
  -R, --dereference-recursive
                        same as --directories=recurse_links
  --include GLOB [GLOB ...]
                        limit files to those matching GLOB
  --exclude GLOB [GLOB ...]
                        skip files that match GLOB
  --exclude-from FILE [FILE ...]
                        read FILE for exclude globs file name globs
  --exclude-dir GLOB [GLOB ...]
                        skip directories that match GLOB
  -L, --files-without-match
                        print only names of FILEs with no selected lines
  -l, --files-with-matches
                        print only names of FILEs with selected lines
  -c, --count           print only a count of selected lines per FILE
  -T, --initial-tab     currently just adds tabs to each sep value (will make better later)
  -Z, --null            adds 0 to the end of result-sep
  --result-sep SEP      String to place between header info and and search output
  --name-num-sep SEP    String to place between file name and line number when both are
                        enabled
  --name-byte-sep SEP   String to place between file name and byte number when both are
                        enabled
  --context-group-sep SEP
                        String to print between context groups
  --context-result-sep SEP
                        String to place between header info and context line
  --context-name-num-sep SEP
                        String to place between file name and line number on context line
  --context-name-byte-sep SEP
                        String to place between file name and byte number on context line

Context Control:
  -B NUM, --before-context NUM
                        print NUM lines of leading context
  -A NUM, --after-context NUM
                        print NUM lines of trailing context
  -C NUM, --context NUM
                        print NUM lines of output context
  --color [WHEN], --colour [WHEN]
                        use ANSI escape codes to highlight the matching strings; WHEN is
                        'always', 'never', or 'auto'
  -U, --binary          do not strip CR characters at EOL (MSDOS/Windows)

Library Help

greplica can be used as a library from another module. The following is a simple example.

from greplica.grep import Grep

grep_obj = Grep()
grep_obj.add_expressions('hello .*ld')
grep_obj.add_files('file1.txt', 'path/to/file2.txt', 'path/to/directory/')
grep_obj.directory_handling_type = Grep.Directory.RECURSE
data = grep_obj.execute()

# Prints a list of Grep.FileDat objects which contain filename, start_index,
# stop_index, and num_matches. The values of start_index and stop_index are the start
# and stop indices into data.lines that matching in this file. start_index and
# stop_index will be None only when certain options prevent output to data.lines.
# num_matches may be less than (stop_index - start_index) if data.lines contain
# context separator lines when either *_context_count value is greater than 0 and
# context_sep is not ''.
# ex:
# file1.txt, 0, 1, 1
# path/to/file2.txt, 1, 2, 1
for f in data.files:
  print('{}, {}, {}, {}'.format(f.filename, f.start_index, f.stop_index, f.num_matches))

# Prints a list of Grep.LineDat objects which contain filename, line_num, byte_offset,
# and line. Context separator will be its own element in this list when either
# *_context_count value is greater than 0 and context_sep is not ''. In those cases,
# line_num and byte_offset will be None.
# ex:
# file1.txt, 3, 117, hello world!
# path/to/file2.txt, 8, 393, hello household
for l in data.lines:
  print('{}, {}, {}, {}'.format(l.filename, l.line_num, l.byte_offset, l.line))

# Prints a list of Grep.InfoDat objects which contain filename and info.
# ex:
# path/to/directory/file.bin, path/to/directory/file.bin: binary file matches
for i in data.info
  print('{}, {}'.format(i.filename, i.info))

# Prints a list of Grep.ErrorDat objects which contain filename and err_str.
# ex:
# path/to/directory/restricted/file.bin, greplica: [Errno 13] Permission denied:
# 'path/to/directory/restricted/file.bin'
for e in data.errors:
  print('{}, {}'.format(e.filename, e.err_str))

The following describes initialization arguments to Grep.

__init__(self, out_file:io.IOBase=None, err_file:io.IOBase=None, default_in_file:io.IOBase=None)
  '''
  Initializes Grep
  Inputs: out_file - a file object to pass to print() as 'file' for regular messages.
                      This should be set to sys.stdout if writing to terminal is desired.
                      Writing to file is skipped when this is set to None. (default: None)
          err_file - a file object to pass to print() as 'file' for error messages.
                      This should be set to sys.stderr if writing to terminal is desired.
                      Writing to file is skipped when this is set to None. (default: None)
          default_in_file - default input file stream used when no files added.
                      This should be set to sys.stdin if reading from terminal is desired by default.
                      An exception will be caused on execute() if this is None and no files added.
                      (default: None)
  '''

The following methods may be called to add expressions, file paths, and globs.

add_dir_exclude_globs(self, *args:Union[str, List[str]])
  '''
  Skip directories that match given globs.
  '''

add_expressions(self, *args:Union[str, List[str]])
  '''
  Adds a single expression or list of expressions that Grep will search for in selected files.
  Inputs: all arguments must be list of strings or string - each string is an expression
  '''

add_file_exclude_globs(self, *args:Union[str, List[str]])
  '''
  Skip files that match given globs.
  '''

add_file_include_globs(self, *args:Union[str, List[str]])
  '''
  Limit files to those matching given globs.
  '''

add_files(self, *args:Union[str, List[str]])
  '''
  Adds a single file or list of files that Grep will crawl through. Each entry must be a path
  to a file or directory. Directories are handled based on value of directory_handling_type.
  Inputs: all arguments must be list of strings or string - each string is a file path
  '''

clear_dir_exclude_globs(self)
  '''
  Clear all directory exclude globs previously added by add_dir_exclude_globs().
  '''

clear_expressions(self)
  '''
  Clears all expressions that were previously set by add_expressions().
  '''

clear_file_exclude_globs(self)
  '''
  Clear all file exclude globs previously added by add_file_exclude_globs().
  '''

clear_file_include_globs(self)
  '''
  Clear all file include globs previously added by add_file_include_globs().
  '''

clear_files(self)
  '''
  Clear all files that were previously set by add_files().
  '''

The following Grep options may be adjusted.

# Determines how expressions are parsed
search_type:Grep.SearchType = Grep.SearchType.BASIC_REGEXP

# When true, expression's case is ignored during search
ignore_case:bool = False

# When true, regex search is performed using pattern \\b{expr}\\b
word_regexp:bool = False

# When true, line regex search is used
line_regexp:bool = False

# When true, no error messages are printed
no_messages:bool = False

# When true, matching lines are those that don't match expression
invert_match:bool = False

# When set, this is the maximum number of matching lines printed for each file
max_count:int = None

# When true, line number of match is printed before result
output_line_numbers:bool = False

# When true, file name is printed before result
output_file_name:bool = False

# When true, byte offset is printed before result
output_byte_offset:bool = False

# When true, each printed line is flushed before proceeding
line_buffered:bool = False

# (property) The sequence of bytes expected at the end of each line
# Returns bytes, can be set as str or bytes
end = b'\n'

# The string printed after header information and before line contents
results_sep:str = ':'

# The string printed before line number if both file name and line number are printed
name_num_sep:str = ':'

# The string printed before byte offset value if byte offset as well as either file
# name or line number is printed.
name_byte_sep:str = ':'

# The string printed between each context group
context_sep:str = '--\n'

# The string printed after header information and before context line contents
context_results_sep:str = '-'

# The string printed before context line number if both file name and line number are printed
context_name_num_sep:str = '-'

# The string printed before context byte offset value if byte offset as well as either
# file name or line number is printed.
context_name_byte_sep:str = '-'

# Grep.ColorMode: sets the color mode
self.color_mode:Grep.ColorMode = Grep.ColorMode.AUTO

# Grep.Directory: sets how directories are handled when they are included in file list
directory_handling_type:Grep.Directory = Grep.Directory.READ

# The label to print when output_file_name is true and stdin is parsed
label:str = '(standard input)'

# When true, normal output is not printed
quiet:bool = False

# When true, only the matching contents are printed for each line
only_matching:bool = False

# Grep.BinaryParseFunction: sets how binary files are handled
binary_parse_function:Grep.BinaryParseFunction = Grep.BinaryParseFunction.PRINT_ERROR

# When true, CR are stripped from the end of every line when found
strip_cr:bool = True

# Number of lines of context to print before a match
before_context_count:int = 0

# Number of lines of context to print after a match
after_context_count:int = 0

# When true, only the file name of matching files are printed
print_matching_files_only:bool = False

# When true, only the file name of non-matching files are printed
print_non_matching_files_only:bool = False

# When true, only count of number of matches for each file is printed
print_count_only:bool = False

# When true, add spaces to the left of numbers based on file size
space_numbers_by_size:bool = False

# Dictionary: Contains grep color information
# grep_color_dict gets initialized from environment variable GREP_COLORS; the default is:
# {
#     'mt':None,
#     'ms':'01;31',
#     'mc':'01;31',
#     'sl':'',
#     'cx':'',
#     'rv':False,
#     'fn':'35',
#     'ln':'32',
#     'bn':'32',
#     'se':'36',
#     'ne':False
# }
grep_color_dict:dict

At any point, reset() may be called to reset all settings.

reset(self)
  '''
  Resets all Grep state values except for out_file, err_file, and default_in_file.
  '''

The following method executes using all data set above.

execute(self, return_matches:bool=True) -> GrepResult
  '''
  Executes Grep with all the assigned attributes.
  Inputs: return_matches - set to True to fill in lines, info, and errors in the result
                         - set to False if outputting to terminal is the only thing that is
                           desired, saving memory
  Returns: a GrepResult object
  Raises: ValueError if no expressions added
          ValueError if no files added and no default input file set during init
  '''

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

greplica-1.2.7.tar.gz (31.2 kB view details)

Uploaded Source

File details

Details for the file greplica-1.2.7.tar.gz.

File metadata

  • Download URL: greplica-1.2.7.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for greplica-1.2.7.tar.gz
Algorithm Hash digest
SHA256 9403792a1a91da9da073d9b16e62a8abef8b66f49e0e0119a00b96f4c589bfb0
MD5 1638b53e58dcb247600e75397bd941ce
BLAKE2b-256 c0ac02463f071808b49e33ff3fea9ec93c4e2915c4b63e57cbe1689cb6c3361c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page