Skip to main content

Search a directory path for any of a list of strings.

Project description

string_path_search

Walk a directory tree, searching for files containing any of a set of text strings.

Why not just use find and grep?

  • Avoids long, hard-to-debug shell commands with lots of backticks and parentheses.
  • Works on Windows without needing to install a unix work-alike like Cygwin.
  • Searches for a bunch of different strings in one go.
  • Searches within (possibly compressed) jar, tar, or zip archives.
  • Outputs results in CSV or Excel format.

System requirements

A note about installing python and pip

Some Linux systems (also Cygwin on Windows) come with python 2 pre-installed. You have to install python 3 yourself. There are gotchas involved:

  1. The "python" and "pip" packages may be reserved for version 2. It may not be as easy as "apt-get install python3" either. The latest package might be called "python3.7" or similar. Ditto with pip.
  2. Once installed, the python 3 binary may be called "python3", not "python". Ditto with pip.

Installation from pypi:

$ python -m pip install string_path_search --user 

Installation from GitHub

$ git clone git@github.com:j-lawrence-b1/string-path-search.git
$ python string_path_search/setup.py install --user 

Usage

Although you can import and use this package in other Python scripts, string_path_search is primarily intended to be invoked as a console app:

    $ python -m string_path_search [OPTIONS] <scan-root> [<search-term> [...]]
    where:
        -a, --scan-archives = Unpack and scan within archives
            (Default: Skip arhive files. Only jar, tar, and zip archives will be
                unpacked. Tar bzip2, gzip, and xz compression is supported.
        -B, --branding-text=<branding-text> = A string of text containing
            company or other information to add above the column headers in
            scan reports (Default: no text).
        -b, --branding-logo=<branding-logo> = (MS Excel only) An image
            file containing a corporate logo or other graphic to add above the
            column headers in scan reports (Default: no logo).
        -h, --help = Print usage information and exit.
        -e, --excel-output = Generate Microsoft Excel 2007 (.xlsx) output
            (Default: Generate comma-separated-value (CSV) text output)
        -i  --ingore-case = Ignore UPPER/lowercase differences when matching strings
            (Default: case differences are significant).
        -o, --output-dir=<output-dir> = Location for output (Default:
            <current working directory>).
        -s, --search-strings-file=<search-strings> = A file containing strings to
            search for, one per line (Default: Get search strings from the command line).
        -q, --quiet = Decrease logging verbosity (may repeat). -qqqq will suppress all logging.
        -t, --temp-dir=<temp-dir> = Location for unpacking archives
            (Default: <output_dir>/temp).
        -v, --verbose = Increase logging verbosity.
        -x, --exclusions-file=<exclusion-file> = A file containing (base) filenames to
            exclude from the search results, one per line (Default: Include all results).
    <scan-root> = Directory to scan.
    <search-term> ... = One or more terms to search for in <scan-root>.

Examples

Perform a caseless search of the test/data directory for any occurrence of 'copyright', 'gpl', 'foo', 'bar', or 'baz' and output the results to a file called 'scan-<timestamp>.csv' in the current working directory.

$ python -m string_path_search -i test/data copyright gpl foo bar baz

Same as example 1, except output to an Excel spreadsheet:

$ python -m string_path_search -i -e test/data copyright gpl foo bar baz

License

string_path_search is distributed under the MIT License.

Disclaimer regarding the test data:

The files in the test/data folder were randomly downloaded from publicly available Open Source projects. Distributing these materials with string_path_search as test data may or may not be in violation of the applicable licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

string_path_search-0.3.1.tar.gz (12.6 kB view hashes)

Uploaded Source

Built Distributions

string_path_search-0.3.1-py3.7.egg (23.0 kB view hashes)

Uploaded Source

string_path_search-0.3.1-py3-none-any.whl (14.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page