Skip to main content

Grep for text in images

Project description

GitHub Workflow Status License PyPI

IMGrep

Want to find that one meme with the funny punchline? Looking for a picture of a PowerPoint presentation on a specific topic, that you took 5 years ago? IMGrep might help.

It works like grep, but for images, and with a lot less features ... and it's also much slower ... and not suuuper accurate, especially for handwriting, or weird fonts ;D

imgrep is built on top of Tesseract-OCR and uses the pytesseract bindings to interface with it.

Install

You can install imgrep from PyPI with pip

pip install imgrep

Usage

Get the usage with imgrep -h.

usage: imgrep [-h] [-i] [-r] [-f] [-0] pattern file

Grep for text in images.

positional arguments:
  pattern               A Python regex, to search for.
  file                  Path of the image(s) to search through. (Or folder(s), if `--recursive' is specified).

options:
  -h, --help            show this help message and exit
  -i, --ignore-case     Ignore case distinctions in patterns and input data.
  -r, --recursive       Grep through every file under a given directory.
  -f, --filenames-only  Only print the file names, not the contents. Makes no sense without `--recursive', and will be ignored if `--recursive' is not specified.
  -0, --null            Print the output seperated by null characters, this is useful for badly named files. Makes no sense without `--filenames-only', but will be done regardless, if specified!

Be patient. It uses multiple cores, but this just takes a while. Searching for a specific string in my ca. 2000 image strong memes folder took about 8 minutes and 30 seconds.

Performance

Is abysmal. You've been warned.

Neither accuracy, nor execution time are that great, but it works for my use case. And it is still much faster, than combing through my photos one by one, when I'm looking for something specific.

TODO

  • Having -a and -b flags to include N lines of output after, and before the match would be nice.
  • Also coloring the output on smart terminals would be cool.
    • That opens a whole can of worms with determining whether the terminal supports it or not.
    • Or whether the user wants color (NOCOLOR, or TERM=dumb etc.?).
  • Fuzzy search
  • Preprocessing/Indexing (?)
    • Pro:
      • Done once it heavily improves performance for subsequent searches
    • Con:
      • I don't want to put garbage files into the users' filesystem.
        This could be done, by having only one index file in XDG_CONFIG_HOME (or similar on other OS's), that gets uninstalled with imgrep in the end.
    • Conclusion:
      • Probably worth it

Alternatives

I noticed, that there is a similar project, even with the same name imgrep. I am in no way affiliated with that project, but it looks cool, and it might actually suit you better, because they already allow for preprocessing and fuzzy search, both of which are not currently implemented in this project.

OTOH for a quick and dirty one off job the convenience of python -m pip install imgrep; imgrep -rif pattern images is probably nice.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgrep-0.0.4.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

imgrep-0.0.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file imgrep-0.0.4.tar.gz.

File metadata

  • Download URL: imgrep-0.0.4.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for imgrep-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6d948b5a0e127806f1c83eff2646b094a43b0508bd19ce68a6894b448c90db54
MD5 6476a84cdcc1dd8a26cdb0e1229878b3
BLAKE2b-256 a3c7cc3e762f8978ea556046e2716fe410bac92002138719d49ad2d6df586017

See more details on using hashes here.

File details

Details for the file imgrep-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: imgrep-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for imgrep-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3b334584985b7ba52ae196bf3a08da1e125deaa54a9c0e3c45758d1b63193e24
MD5 1ad5286c0f2e4665cb4a9d8691943983
BLAKE2b-256 8d273d367f4285ce90225610469a05ab9bac27c2d63fc2de36f61bbf5febc24a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page