Skip to main content

Search string from files recursively including pdf and docx files

Project description

findstring

This program, findstring, allows you to search for a specific string within files in a directory, including support for searching within PDF and DOCX files. It works similarly to the grep -rI command but adds the ability to read and search through PDF and DOCX files.

Install

pip install findstring

Usage

findstring [OPTIONS] search_string

Options

  • search_string: The string to search for in the files.

Optional Arguments

  • -h, --help: Show the help message and exit. This option provides a summary of how to use findstring, including descriptions of all available options.

  • -b, --binary:
    Scan binary files as well. If this option is used, the tool will attempt to read binary files and search for the specified string.

  • -d, --directory:
    Root directory to start searching from. If not specified, the current directory (.) is used by default.

  • -t, --text:
    Show the matched lines containing the search string in the output.

  • -l, --max_length:
    Maximum number of characters to be shown as a result. The default is 0, which means no limit is set. When --max_length is specified, --text is also enabled.

  • -v, --verbose:
    Enable verbose output. The program will provide more detailed information about its operation, including which directories and files are being checked.

Features

  • PDF Support:
    The program can search within PDF files using the pdfminer library. It extracts text from the PDF and searches for the specified string.

  • DOCX Support:
    DOCX files are also supported, with text extraction handled by the docx library.

  • Binary File Scanning:
    When the --binary flag is enabled, the program will attempt to read binary files and search for the specified string.

  • Context Display:
    When the --text option is enabled, the tool will display the lines containing the search string, with maximum numbers of characters specified by the --max_length option.

Examples

Search for a string in the current directory

findstring "example_string"

Search for a string in a specific directory

findstring -d /path/to/directory "example_string"

Search with verbose output and show matched lines

findstring -tv "example_string"

Limit the length of the output text to 50 characters

findstring -l 50 "example_string"

Search in binary files

findstring -b "example_string"

Error Handling

The program will attempt to handle errors such as unreadable files gracefully. If an error occurs while reading a file, the program will skip the file and continue processing the rest, optionally displaying an error message if verbose mode is enabled.

Highlighting Search Results

The program is configured to highlight matching search results in bold red text by default. This highlighting is controlled by the GREP_COLORS environment variable, specifically using the mt= option.

If you wish to change the color or style of the highlighted text, you can modify the mt= setting in the GREP_COLORS environment variable. The mt= value is a color code that specifies the style and color used for matching text.

For example:

  • Bold Red (default): mt=1;31
  • Bold Green: mt=1;32
  • Underline Blue: mt=4;34

To apply a custom color, you can set the GREP_COLORS environment variable in your shell as follows:

export GREP_COLORS='mt=1;32'

This example would change the highlighted text to bold green.

The program automatically detects if the output is directed to a terminal (TTY). If not, it will print the plain text without any colorization.

Using as a Library

You can use findstring as a library as:

from findstring import findstring
findstring("/path/to/directory" "example_string")

Arguments of findstring function is as follows.

  • root_dir (str):
    The root directory to start searching from.

  • search_string (str):
    The string to search for within the files.

  • verbose (bool, optional):
    If True, print additional information during the search process. Default is False.

  • show_text (bool, optional):
    If True, display the matched line containing the search string. Default is False.

  • max_length (int, optional):
    Maximum number of characters to display in the result. Default is 0 (no limit).

  • binary (bool, optional):
    If True, scan binary files as well. Default is False.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

findstring-1.0.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

findstring-1.0.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file findstring-1.0.0.tar.gz.

File metadata

  • Download URL: findstring-1.0.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for findstring-1.0.0.tar.gz
Algorithm Hash digest
SHA256 635064d20aa26b693f4ff0275b62d57caa47ac0530834ed306d42bd7692cb259
MD5 7726672316c58ddbae55589127fa42bb
BLAKE2b-256 68f89c80eed7188156190418742f588ed0cb58bed038f264bacc3fc0d73f74de

See more details on using hashes here.

File details

Details for the file findstring-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: findstring-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for findstring-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d9bbcf1f79ac60e966457900773f3f562bf4677b33b51df2b47924340bacf43
MD5 5b2e10fb94d24c6ae08fbfec477f497e
BLAKE2b-256 0402906cb231a45e61f69da96f72feffc08ccdaffd84b5c6528b383a64c61e5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page