Grep for text in images
Project description
IMGrep
Want to find that one meme with the funny punchline? Looking for a picture of a PowerPoint presentation on a specific topic, that you took 5 years ago? IMGrep might help.
It works like grep
, but for images, and with a lot less features ... and it's also much slower ... and not suuuper
accurate, especially for handwriting, or weird fonts ;D
imgrep
is built on top of Tesseract-OCR and uses
the pytesseract bindings to interface with it.
Install
You can install imgrep
from PyPI with pip
pip install imgrep
Usage
Get the usage with imgrep -h
.
usage: imgrep [-h] [-i] [-r] [-f] [-0] pattern file
Grep for text in images.
positional arguments:
pattern A Python regex, to search for.
file Path of the image(s) to search through. (Or folder(s), if `--recursive' is specified).
options:
-h, --help show this help message and exit
-i, --ignore-case Ignore case distinctions in patterns and input data.
-r, --recursive Grep through every file under a given directory.
-f, --filenames-only Only print the file names, not the contents. Makes no sense without `--recursive', and will be ignored if `--recursive' is not specified.
-0, --null Print the output seperated by null characters, this is useful for badly named files. Makes no sense without `--filenames-only', but will be done regardless, if specified!
Be patient. It uses multiple cores, but this just takes a while. Searching for a specific string in my ca. 2000 image strong memes folder took about 8 minutes and 30 seconds.
Performance
Is abysmal. You've been warned.
Neither accuracy, nor execution time are that great, but it works for my use case. And it is still much faster, than combing through my photos one by one, when I'm looking for something specific.
TODO
- Having
-a
and-b
flags to include N lines of output after, and before the match would be nice. - Also coloring the output on smart terminals would be cool.
- That opens a whole can of worms with determining whether the terminal supports it or not.
- Or whether the user wants color (NOCOLOR, or TERM=dumb etc.?).
- Fuzzy search
- Preprocessing/Indexing (?)
- Pro:
- Done once it heavily improves performance for subsequent searches
- Con:
- I don't want to put garbage files into the users' filesystem.
This could be done, by having only one index file in XDG_CONFIG_HOME (or similar on other OS's), that gets uninstalled with imgrep in the end.
- I don't want to put garbage files into the users' filesystem.
- Conclusion:
- Probably worth it
- Pro:
Alternatives
I noticed, that there is a similar project, even with the same name imgrep. I am in no way affiliated with that project, but it looks cool, and it might actually suit you better, because they already allow for preprocessing and fuzzy search, both of which are not currently implemented in this project.
OTOH for a quick and dirty one off job the convenience of python -m pip install imgrep; imgrep -rif pattern images
is
probably nice.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file imgrep-0.0.4.tar.gz
.
File metadata
- Download URL: imgrep-0.0.4.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d948b5a0e127806f1c83eff2646b094a43b0508bd19ce68a6894b448c90db54 |
|
MD5 | 6476a84cdcc1dd8a26cdb0e1229878b3 |
|
BLAKE2b-256 | a3c7cc3e762f8978ea556046e2716fe410bac92002138719d49ad2d6df586017 |
File details
Details for the file imgrep-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: imgrep-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b334584985b7ba52ae196bf3a08da1e125deaa54a9c0e3c45758d1b63193e24 |
|
MD5 | 1ad5286c0f2e4665cb4a9d8691943983 |
|
BLAKE2b-256 | 8d273d367f4285ce90225610469a05ab9bac27c2d63fc2de36f61bbf5febc24a |