Skip to main content

A command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally

Project description

About SiteSearcher

SiteSearcher is a command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally.

Usage

sitesearcher indexer <mydomain> - Create a local search index for <mydomain>

sitesearcher search <mydomain> - Open search prompt for <mydomain>

Indexing of large sites can take quite long, but you can stop the indexer at any time and continue later at the point where you left off. To halt the indexer, simply type <CTRL>+C once and wait for graceful exit. To restart run the index command again with the --continue flag, i.e. sitesearcher indexer <mydomain> --continue.

Web Server Friendly

SiteSearcher tries to be web server friendly, while crawling. It obeys robot.txt, identifies itself with the "SiteSearcher" UserAgent and uses the Scrapy Autothrottle Extension to reduce the load on the server.

Installing SiteSearcher

If you have pip installed, you can use pip to download and install SiteSearcher.

pip install sitesearcher

SiteSearcher uses the Scrapy bot framework and therefore inherits its dependencies.

Getting the source

Download source releases from PyPI at http://pypi.python.org/pypi/sitesearcher

You can check out the latest version of source code from GitHub.

git clone https://github.com/sbabrass/sitesearcher

Python Version Support

SiteSearcher supports Python Versions 2.7 and 3.3+.

However switching between Python versions may require a rebuild of your indexes, as there is currently no support for SiteSearcher/Python 2 to read and write indexes created with SiteSearcher/Python 3 and vice versa.

History

0.1a1

  • Initial version of the SiteSearcher tool

  • Create Scrapy crawler to extract full text content of sites

  • Create Whoosh indexer to index stored sites

  • Create CLI for indexing and searching

0.1a2

  • Minor code cleanups

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sitesearcher-0.1a2.tar.gz (7.2 kB view details)

Uploaded Source

File details

Details for the file sitesearcher-0.1a2.tar.gz.

File metadata

File hashes

Hashes for sitesearcher-0.1a2.tar.gz
Algorithm Hash digest
SHA256 56d6aa106746281deb53e823469cb3451012b45af798ef5e24df726e6d00bbfe
MD5 5476516cf00f92e5da42ba8c6af02a8b
BLAKE2b-256 96c56ae1b2a6290449cffb9ffe8544df564aa20be5b108078200a7ba0fcde2c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page