A command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally
Project description
About SiteSearcher
SiteSearcher is a command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally.
Usage
sitesearcher indexer <mydomain>
- Create a local search index for <mydomain>
sitesearcher search <mydomain>
- Open search prompt for <mydomain>
Indexing of large sites can take quite long, but you can stop the indexer at any time and continue later at the point where you left off. To halt the indexer, simply type <CTRL>+C
once and wait for graceful exit. To restart run the index command again with the --continue
flag, i.e. sitesearcher indexer <mydomain> --continue
.
Web Server Friendly
SiteSearcher tries to be web server friendly, while crawling. It obeys robot.txt
, identifies itself with the "SiteSearcher"
UserAgent and uses the Scrapy Autothrottle Extension to reduce the load on the server.
Installing SiteSearcher
If you have pip
installed, you can use pip
to download and install SiteSearcher.
pip install sitesearcher
SiteSearcher uses the Scrapy bot framework and therefore inherits its dependencies.
Getting the source
Download source releases from PyPI at http://pypi.python.org/pypi/sitesearcher
You can check out the latest version of source code from GitHub.
git clone https://github.com/sbabrass/sitesearcher
Python Version Support
SiteSearcher supports Python Versions 2.7 and 3.3+.
However switching between Python versions may require a rebuild of your indexes, as there is currently no support for SiteSearcher/Python 2 to read and write indexes created with SiteSearcher/Python 3 and vice versa.
History
0.1a1
Initial version of the SiteSearcher tool
Create Scrapy crawler to extract full text content of sites
Create Whoosh indexer to index stored sites
Create CLI for indexing and searching
0.1a2
Minor code cleanups
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sitesearcher-0.1a2.tar.gz
.
File metadata
- Download URL: sitesearcher-0.1a2.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56d6aa106746281deb53e823469cb3451012b45af798ef5e24df726e6d00bbfe |
|
MD5 | 5476516cf00f92e5da42ba8c6af02a8b |
|
BLAKE2b-256 | 96c56ae1b2a6290449cffb9ffe8544df564aa20be5b108078200a7ba0fcde2c9 |