Search a directory path for any of a list of strings.
Walk a directory tree, searching for files containing any of a set of text strings.
Note: The different naming conventions for projects on GitHub vs packages in Python results in some unnecessary confusion: "string-path-search" (with hyphens) is the name of the project on GitHub. This project provides the "string_path_search" (with underscores) Python package. Please bear with me.
Why not just use find and grep?
- Avoids long, hard-to-debug shell commands with lots of backticks and parentheses.
- Works on Windows without needing to install a unix work-alike like Cygwin.
- Searches for a bunch of different strings in one go.
- Searches within (possibly compressed) jar, tar, or zip archives.
- Outputs results in CSV or Excel format.
- Tested on Windows 10, Linux, and Windows 10/cygwin. May also work on other platforms supported by Python.
- Python 3.4 or later (https://www.python.org/downloads/).
- A Python pip module appropriate to the installed Python version (https://pip.pypa.io/en/stable/installing/). It is possible to install Python packages, including string_path_search, without pip, but it's a lot harder without it.
A note about installing python and pip
Some Linux systems (also Cygwin on Windows) come with python 2 pre-installed. You have to install python 3 yourself. There are gotchas involved:
- The "python" and "pip" packages may be reserved for version 2. It may not be as easy as "apt-get install python3" either. The latest package might be called "python3.7" or similar. Ditto with pip.
- Once installed, the python 3 binary may be called "python3", not "python". Ditto with pip.
Installation from pypi:
> python -m pip install --user string-path-search
Installation from GitHub
You can also download string-path-search with your browser as a .zip or .tgz archive from https://github.com/j-lawrence-b1/string-path-search/releases/latest into any convenient directory. Once unpacked, you can install string_path_search and its dependencies using the included setup.py script (which uses pip internally!).
> chdir <my-downloads-dir>\\string-path-search-0.3.2 > python setup.py build install --user
Note: Installing with the '--user' option will install the string_path_search Python package under your login's HOME directory (C:/Users//.local/Scripts on Windows or /home//.local/bin on Linux). If you plan to run the provided string_path_search.exe directly, you should add this directory to you your shell's execution path.
Although you can import and use this package in other Python scripts, string_path_search is primarily intended to be invoked as a console app:
$ python -m string_path_search [OPTIONS] <scan-root> [<search-term> [...]]
or, you can add run the standalone string_path_search.exe directly (see Note, above).
$ ~/.local/bin/string_path_search [OPTIONS] <scan-root> [<search-term> [...]]
-a, --scan-archives = Unpack and scan within archives (Default: Skip arhive files. Only jar, tar, and zip archives will be unpacked. Tar bzip2, gzip, and xz compression is supported. -B, --branding-text=<branding-text> = A string of text containing company or other information to add above the column headers in scan reports (Default: no text). -b, --branding-logo=<branding-logo> = (MS Excel only) An image file containing a corporate logo or other graphic to add above the column headers in scan reports (Default: no logo). -h, --help = Print usage information and exit. -e, --excel-output = Generate Microsoft Excel 2007 (.xlsx) output (Default: Generate comma-separated-value (CSV) text output) -i --ingore-case = Ignore UPPER/lowercase differences when matching strings (Default: case differences are significant). -o, --output-dir=<output-dir> = Location for output (Default: <current working directory>). -s, --search-strings-file=<search-strings> = A file containing strings to search for, one per line (Default: Get search strings from the command line). -q, --quiet = Decrease logging verbosity (may repeat). -qqqq will suppress all logging. -t, --temp-dir=<temp-dir> = Location for unpacking archives (Default: <output_dir>/temp). -v, --verbose = Increase logging verbosity. -x, --exclusions-file=<exclusion-file> = A file containing (base) filenames to exclude from the search results, one per line (Default: Include all results). <scan-root> = Directory to scan. <search-term> ... = One or more terms to search for in <scan-root>.
Perform a caseless search of the test/data directory for any occurrence of 'copyright', 'gpl', 'foo', 'bar', or 'baz' and output the results to a file called 'scan-<timestamp>.csv' in the current working directory.
$ python -m string_path_search -i tests/data "copyright (c)" gpl foo bar baz
Same as example 1, except output to an Excel spreadsheet:
$ python -m string_path_search -i -e tests/data "copyright (c)" gpl foo bar baz
Gotcha: Use double-quotes for multi-word search strings. For some reason, single quotes screw up the command line parser.
string_path_search is distributed under the MIT License.
Disclaimer regarding the test data:
The files in the test/data folder were randomly downloaded from publicly available Open Source projects. Distributing these materials with string_path_search as test data may or may not be in violation of the applicable licenses.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for string_path_search-0.3.3-py3.7.egg
Hashes for string_path_search-0.3.3-py3-none-any.whl