Skip to main content

Search a directory path for any of a list of strings.

Project description

string-path-search

Walk a directory tree, searching for files containing any of a set of text strings.

Note: The different naming conventions for projects on GitHub vs packages in Python results in some unnecessary confusion: "string-path-search" (with hyphens) is the name of the project on GitHub. This project provides the "string_path_search" (with underscores) Python package. Please bear with me.

Why not just use find and grep?

  • Avoids long, hard-to-debug shell commands with lots of backticks and parentheses.
  • Works on Windows without needing to install a unix work-alike like Cygwin.
  • Searches for a bunch of different strings in one go.
  • Searches within (possibly compressed) jar, tar, or zip archives.
  • Outputs results in CSV or Excel format.

System requirements

  • Tested on Windows 10, Linux, and Windows 10/cygwin. May also work on other platforms supported by Python.
  • Python 3.4 or later (https://www.python.org/downloads/).
  • A Python pip module appropriate to the installed Python version (https://pip.pypa.io/en/stable/installing/). It is possible to install Python packages, including string_path_search, without pip, but it's a lot harder without it.

A note about installing python and pip

Some Linux systems (also Cygwin on Windows) come with python 2 pre-installed. You have to install python 3 yourself. There are gotchas involved:

  1. The "python" and "pip" packages may be reserved for version 2. It may not be as easy as "apt-get install python3" either. The latest package might be called "python3.7" or similar. Ditto with pip.
  2. Once installed, the python 3 binary may be called "python3", not "python". Ditto with pip.

Installation from pypi:

> python -m pip install --user string-path-search 

Installation from GitHub

You can also download string-path-search with your browser as a .zip or .tgz archive from https://github.com/j-lawrence-b1/string-path-search/releases/latest into any convenient directory. Once unpacked, you can install string_path_search and its dependencies using the included setup.py script (which uses pip internally!).

> chdir <my-downloads-dir>\\string-path-search-0.3.2
> python setup.py build install --user 

Note: Installing with the '--user' option will install the string_path_search Python package under your login's HOME directory (C:/Users//.local/Scripts on Windows or /home//.local/bin on Linux). If you plan to run the provided string_path_search.exe directly, you should add this directory to you your shell's execution path.

Usage

Although you can import and use this package in other Python scripts, string_path_search is primarily intended to be invoked as a console app:

    $ python -m string_path_search [OPTIONS] <scan-root> [<search-term> [...]]

or, you can add run the standalone string_path_search.exe directly (see Note, above).

    $ ~/.local/bin/string_path_search [OPTIONS] <scan-root> [<search-term> [...]]

where:

    -a, --scan-archives = Unpack and scan within archives
        (Default: Skip arhive files. Only jar, tar, and zip archives will be
            unpacked. Tar bzip2, gzip, and xz compression is supported.
    -B, --branding-text=<branding-text> = A string of text containing
        company or other information to add above the column headers in
        scan reports (Default: no text).
    -b, --branding-logo=<branding-logo> = (MS Excel only) An image
        file containing a corporate logo or other graphic to add above the
        column headers in scan reports (Default: no logo).
    -h, --help = Print usage information and exit.
    -e, --excel-output = Generate Microsoft Excel 2007 (.xlsx) output
        (Default: Generate comma-separated-value (CSV) text output)
    -i  --ingore-case = Ignore UPPER/lowercase differences when matching strings
        (Default: case differences are significant).
    -o, --output-dir=<output-dir> = Location for output (Default:
        <current working directory>).
    -s, --search-strings-file=<search-strings> = A file containing strings to
        search for, one per line (Default: Get search strings from the command line).
    -q, --quiet = Decrease logging verbosity (may repeat). -qqqq will suppress all logging.
    -t, --temp-dir=<temp-dir> = Location for unpacking archives
        (Default: <output_dir>/temp).
    -v, --verbose = Increase logging verbosity.
    -x, --exclusions-file=<exclusion-file> = A file containing (base) filenames to
        exclude from the search results, one per line (Default: Include all results).
<scan-root> = Directory to scan.
<search-term> ... = One or more terms to search for in <scan-root>.

Examples

Perform a caseless search of the test/data directory for any occurrence of 'copyright', 'gpl', 'foo', 'bar', or 'baz' and output the results to a file called 'scan-<timestamp>.csv' in the current working directory.

$ python -m string_path_search -i tests/data "copyright (c)" gpl foo bar baz

Same as example 1, except output to an Excel spreadsheet:

$ python -m string_path_search -i -e tests/data "copyright (c)" gpl foo bar baz

Gotcha: Use double-quotes for multi-word search strings. For some reason, single quotes screw up the command line parser.

License

string_path_search is distributed under the MIT License.

Disclaimer regarding the test data:

The files in the test/data folder were randomly downloaded from publicly available Open Source projects. Distributing these materials with string_path_search as test data may or may not be in violation of the applicable licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

string_path_search-0.3.3.tar.gz (14.0 kB view hashes)

Uploaded Source

Built Distributions

string_path_search-0.3.3-py3.7.egg (23.5 kB view hashes)

Uploaded Source

string_path_search-0.3.3-py3-none-any.whl (14.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page