A command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally
SiteSearcher is a command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally.
sitesearcher indexer <mydomain> - Create a local search index for
sitesearcher search <mydomain> - Open search prompt for
Indexing of large sites can take quite long, but you can stop the indexer at any time and continue later at the point where you left off. To halt the indexer, simply type
<CTRL>+C once and wait for graceful exit. To restart run the index command again with the
--continue flag, i.e.
sitesearcher indexer <mydomain> --continue.
SiteSearcher tries to be web server friendly, while crawling. It obeys
robot.txt, identifies itself with the
"SiteSearcher" UserAgent and uses the Scrapy Autothrottle Extension to reduce the load on the server.
If you have
pip installed, you can use
pip to download and install SiteSearcher.
pip install sitesearcher
Download source releases from PyPI at http://pypi.python.org/pypi/sitesearcher
You can check out the latest version of source code from GitHub.
git clone https://github.com/sbabrass/sitesearcher
SiteSearcher supports Python Versions 2.7 and 3.3+.
However switching between Python versions may require a rebuild of your indexes, as there is currently no support for SiteSearcher/Python 2 to read and write indexes created with SiteSearcher/Python 3 and vice versa.