Search a directory path for any of a list of strings.
Project description
string-path-search
Walk a directory tree, searching for files containing any of a set of text strings.
Note: The different naming conventions for projects on GitHub vs packages in Python results in some unnecessary confusion: "string-path-search" (with hyphens) is the name of the project on GitHub. This project provides the "string_path_search" (with underscores) Python package. Please bear with me.
Why not just use find and grep?
- Avoids long, hard-to-debug shell commands with lots of backticks and parentheses.
- Works on Windows without needing to install a unix work-alike like Cygwin.
- Searches for a bunch of different strings in one go.
- Searches within (possibly compressed) jar, tar, or zip archives.
- Outputs results in CSV or Excel format.
System requirements
- Tested on Windows 10, Linux, and Windows 10/cygwin. May also work on other platforms supported by Python.
- Python 3.4 or later (https://www.python.org/downloads/).
- A Python pip module appropriate to the installed Python version (https://pip.pypa.io/en/stable/installing/). It is possible to install Python packages, including string_path_search, without pip, but it's a lot harder without it.
A note about installing python and pip
Some Linux systems (also Cygwin on Windows) come with python 2 pre-installed. You have to install python 3 yourself. There are gotchas involved:
- The "python" and "pip" packages may be reserved for version 2. It may not be as easy as "apt-get install python3" either. The latest package might be called "python3.7" or similar. Ditto with pip.
- Once installed, the python 3 binary may be called "python3", not "python". Ditto with pip.
Installation from pypi:
> python -m pip install --user string-path-search
Installation from GitHub
You can also download string-path-search with your browser as a .zip or .tgz archive from https://github.com/j-lawrence-b1/string-path-search/releases/latest into any convenient directory. Once unpacked, you can install string_path_search and its dependencies using the included setup.py script (which uses pip internally!).
> chdir <my-downloads-dir>\\string-path-search-0.3.2 > python setup.py build install --user
Note: Installing with the '--user' option will install the string_path_search Python package under your login's HOME directory (C:/Users//.local/Scripts on Windows or /home//.local/bin on Linux). If you plan to run the provided string_path_search.exe directly, you should add this directory to you your shell's execution path.
Usage
Although you can import and use this package in other Python scripts, string_path_search is primarily intended to be invoked as a console app:
$ python -m string_path_search [OPTIONS] <scan-root> [<search-term> [...]]
or, you can add run the standalone string_path_search.exe directly (see Note, above).
$ ~/.local/bin/string_path_search [OPTIONS] <scan-root> [<search-term> [...]]
where:
-a, --scan-archives = Unpack and scan within archives (Default: Skip arhive files. Only jar, tar, and zip archives will be unpacked. Tar bzip2, gzip, and xz compression is supported. -B, --branding-text=<branding-text> = A string of text containing company or other information to add above the column headers in scan reports (Default: no text). -b, --branding-logo=<branding-logo> = (MS Excel only) An image file containing a corporate logo or other graphic to add above the column headers in scan reports (Default: no logo). -h, --help = Print usage information and exit. -e, --excel-output = Generate Microsoft Excel 2007 (.xlsx) output (Default: Generate comma-separated-value (CSV) text output) -i --ingore-case = Ignore UPPER/lowercase differences when matching strings (Default: case differences are significant). -o, --output-dir=<output-dir> = Location for output (Default: <current working directory>). -s, --search-strings-file=<search-strings> = A file containing strings to search for, one per line (Default: Get search strings from the command line). -q, --quiet = Decrease logging verbosity (may repeat). -qqqq will suppress all logging. -t, --temp-dir=<temp-dir> = Location for unpacking archives (Default: <output_dir>/temp). -v, --verbose = Increase logging verbosity. -x, --exclusions-file=<exclusion-file> = A file containing (base) filenames to exclude from the search results, one per line (Default: Include all results). <scan-root> = Directory to scan. <search-term> ... = One or more terms to search for in <scan-root>.
Examples
Perform a caseless search of the test/data directory for any occurrence of 'copyright', 'gpl', 'foo', 'bar', or 'baz' and output the results to a file called 'scan-<timestamp>.csv' in the current working directory.
$ python -m string_path_search -i tests/data "copyright (c)" gpl foo bar baz
Same as example 1, except output to an Excel spreadsheet:
$ python -m string_path_search -i -e tests/data "copyright (c)" gpl foo bar baz
Gotcha: Use double-quotes for multi-word search strings. For some reason, single quotes screw up the command line parser.
License
string_path_search is distributed under the MIT License.
Disclaimer regarding the test data:
The files in the test/data folder were randomly downloaded from publicly available Open Source projects. Distributing these materials with string_path_search as test data may or may not be in violation of the applicable licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for string_path_search-0.3.3-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd1da3740b274b835aaa17420c012eedacf796a4b84fbc2b2463da84a8815d61 |
|
MD5 | 8eb0cbb8790a6a4900aa20fd625f9afb |
|
BLAKE2b-256 | 43aa68097202586bb2db89d3187f08f64d16462d1f95d835775ed51904f8c088 |
Hashes for string_path_search-0.3.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb14be76bbce94cbcc3267a18222a219c49dafaef9bf2448876927e1e6fc2cdc |
|
MD5 | 3b6dd42ca6a9799ade5a6d6cf625e7aa |
|
BLAKE2b-256 | 15b65066696818b1c97c70e34a7146c51aabdb862f544d1fdb094f1f10d49a13 |