sitesearcher

A command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally

These details have not been verified by PyPI

Project description

============
SiteSearcher
============

About SiteSearcher
==================

**SiteSearcher** is a command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally.

Usage
-----

:code:`sitesearcher indexer <mydomain>` - Create a local search index for :code:`<mydomain>`

:code:`sitesearcher search <mydomain>` - Open search prompt for :code:`<mydomain>`

Indexing of large sites can take quite long, but you can stop the indexer at any time and continue later at the point where you left off. To halt the indexer, simply type :code:`<CTRL>+C` once and wait for graceful exit. To restart run the index command again with the :code:`--continue` flag, i.e. :code:`sitesearcher indexer <mydomain> --continue`.

Web Server Friendly
-------------------

**SiteSearcher** tries to be web server friendly, while crawling. It obeys :code:`robot.txt`, identifies itself with the :code:`"SiteSearcher"` UserAgent and uses the `Scrapy Autothrottle Extension <http://doc.scrapy.org/en/latest/topics/autothrottle.html>`_ to reduce the load on the server.

Installing SiteSearcher
=======================

If you have :code:`pip` installed, you can use :code:`pip` to download and install **SiteSearcher**.

.. code:: bash

pip install sitesearcher

**SiteSearcher** uses the `Scrapy <http://scrapy.org>`_ bot framework and therefore inherits its `dependencies <http://doc.scrapy.org/en/latest/intro/install.html#installing-scrapy>`_.

Getting the source
==================

Download source releases from PyPI at http://pypi.python.org/pypi/sitesearcher

You can check out the latest version of source code from GitHub.

.. code::

git clone https://github.com/sbabrass/sitesearcher

Python Version Support
======================

**SiteSearcher** supports Python Versions 2.7 and 3.3+.

However switching between Python versions may require a rebuild of your indexes, as there is currently no support for SiteSearcher/Python 2 to read and write indexes created with SiteSearcher/Python 3 and vice versa.

History
=======

0.1a1
----

- Initial version of the SiteSearcher tool
- Create ``Scrapy`` crawler to extract full text content of sites
- Create ``Whoosh`` indexer to index stored sites
- Create CLI for indexing and searching

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1a2 pre-release

Sep 5, 2016

This version

0.1a1 pre-release

Sep 5, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sitesearcher-0.1a1.tar.gz (7.4 kB view details)

Uploaded Sep 5, 2016 Source

File details

Details for the file sitesearcher-0.1a1.tar.gz.

File metadata

Download URL: sitesearcher-0.1a1.tar.gz
Upload date: Sep 5, 2016
Size: 7.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for sitesearcher-0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`59db8117cdfc42b984fec59d33c8317d5a9d5f63894a9f38d695f336ea8fd5c3`
MD5	`2cd8fe43bbd1a6f2f0b1590a9edb947f`
BLAKE2b-256	`c77f7037bcbcfc8699cd05d98dad9ad061b1dcad5e2d5c6c48da3339fb1644ca`