Search a directory path for any of a list of strings.
Project description
string_path_search
Walk a directory tree, searching for files containing any of a set of text strings.
Why not just use find and grep?
- Avoids long, hard-to-debug shell commands with lots of backticks and parentheses.
- Works on Windows without needing to install a unix work-alike like Cygwin.
- Searches for a bunch of different strings in one go.
- Searches within (possibly compressed) jar, tar, or zip archives.
- Outputs results in CSV or Excel format.
System requirements
- Tested on Windows 10 and Linux. May work on other platforms supported by Python.
- Python 3.4 or later (https://www.python.org/downloads/).
- Pip module appropriate to the installed Python version. (https://pip.pypa.io/en/stable/installing/).
A note about installing python and pip
Some Linux systems (also Cygwin on Windows) come with python 2 pre-installed. You have to install python 3 yourself. There are gotchas involved:
- The "python" and "pip" packages may be reserved for version 2. It may not be as easy as "apt-get install python3" either. The latest package might be called "python3.7" or similar. Ditto with pip.
- Once installed, the python 3 binary may be called "python3", not "python". Ditto with pip.
Installation from pypi:
$ python -m pip install string_path_search --user
Installation from GitHub
$ git clone git@github.com:j-lawrence-b1/string-path-search.git $ python string_path_search/setup.py install --user
Usage
Although you can import and use this package in other Python scripts, string_path_search is primarily intended to be invoked as a console app:
$ python -m string_path_search [OPTIONS] <scan-root> [<search-term> [...]] where: -a, --scan-archives = Unpack and scan within archives (Default: Skip arhive files. Only jar, tar, and zip archives will be unpacked. Tar bzip2, gzip, and xz compression is supported. -B, --branding-text=<branding-text> = A string of text containing company or other information to add above the column headers in scan reports (Default: no text). -b, --branding-logo=<branding-logo> = (MS Excel only) An image file containing a corporate logo or other graphic to add above the column headers in scan reports (Default: no logo). -h, --help = Print usage information and exit. -e, --excel-output = Generate Microsoft Excel 2007 (.xlsx) output (Default: Generate comma-separated-value (CSV) text output) -i --ingore-case = Ignore UPPER/lowercase differences when matching strings (Default: case differences are significant). -o, --output-dir=<output-dir> = Location for output (Default: <current working directory>). -s, --search-strings-file=<search-strings> = A file containing strings to search for, one per line (Default: Get search strings from the command line). -q, --quiet = Decrease logging verbosity (may repeat). -qqqq will suppress all logging. -t, --temp-dir=<temp-dir> = Location for unpacking archives (Default: <output_dir>/temp). -v, --verbose = Increase logging verbosity. -x, --exclusions-file=<exclusion-file> = A file containing (base) filenames to exclude from the search results, one per line (Default: Include all results). <scan-root> = Directory to scan. <search-term> ... = One or more terms to search for in <scan-root>.
Examples
Perform a caseless search of the test/data directory for any occurrence of 'copyright', 'gpl', 'foo', 'bar', or 'baz' and output the results to a file called 'scan-<timestamp>.csv' in the current working directory.
$ python -m string_path_search -i test/data copyright gpl foo bar baz
Same as example 1, except output to an Excel spreadsheet:
$ python -m string_path_search -i -e test/data copyright gpl foo bar baz
License
string_path_search is distributed under the MIT License.
Disclaimer regarding the test data:
The files in the test/data folder were randomly downloaded from publicly available Open Source projects. Distributing these materials with string_path_search as test data may or may not be in violation of the applicable licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for string_path_search-0.3.1-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 487f38ac4880a2e0bf63e24ed1dd894a23f085c6548aca1a4b0f099e608a272c |
|
MD5 | 2a9adbaf25289b62e898a7c258c04185 |
|
BLAKE2b-256 | 30c41a5cd7d58c2d0003be93bc2d66ae7f23c8020f38cdbea6fadd95c10004ec |
Hashes for string_path_search-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b42374e003f1051cdbefd7fdeaf3a137738a6389aa55b0d90269707ff5614bb1 |
|
MD5 | 3b40c84e58161735523e8744e7937a6c |
|
BLAKE2b-256 | dfacf6cd320dd8242825e98086c9fff996df50a70809ad22ed8f44adcec9bf7c |