This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

About SiteSearcher

SiteSearcher is a command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally.

Usage

sitesearcher indexer <mydomain> - Create a local search index for <mydomain>

sitesearcher search <mydomain> - Open search prompt for <mydomain>

Indexing of large sites can take quite long, but you can stop the indexer at any time and continue later at the point where you left off. To halt the indexer, simply type <CTRL>+C once and wait for graceful exit. To restart run the index command again with the --continue flag, i.e. sitesearcher indexer <mydomain> --continue.

Web Server Friendly

SiteSearcher tries to be web server friendly, while crawling. It obeys robot.txt, identifies itself with the "SiteSearcher" UserAgent and uses the Scrapy Autothrottle Extension to reduce the load on the server.

Installing SiteSearcher

If you have pip installed, you can use pip to download and install SiteSearcher.

pip install sitesearcher

SiteSearcher uses the Scrapy bot framework and therefore inherits its dependencies.

Getting the source

Download source releases from PyPI at http://pypi.python.org/pypi/sitesearcher

You can check out the latest version of source code from GitHub.

git clone https://github.com/sbabrass/sitesearcher

Python Version Support

SiteSearcher supports Python Versions 2.7 and 3.3+.

However switching between Python versions may require a rebuild of your indexes, as there is currently no support for SiteSearcher/Python 2 to read and write indexes created with SiteSearcher/Python 3 and vice versa.

History

0.1a1

  • Initial version of the SiteSearcher tool
  • Create Scrapy crawler to extract full text content of sites
  • Create Whoosh indexer to index stored sites
  • Create CLI for indexing and searching

0.1a2

  • Minor code cleanups
Release History

Release History

0.1a2

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1a1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
sitesearcher-0.1a2.tar.gz (7.2 kB) Copy SHA256 Checksum SHA256 Source Sep 5, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting