Skip to main content

Dogsheep search index

Project description

dogsheep-beta

PyPI Changelog Tests License

Dogsheep search index

Installation

Install this tool like so:

$ pip install dogsheep-beta

Usage

Run the indexer using the dogsheep-beta command-line tool:

$ dogsheep-beta index dogsheep.db config.yml

The config.yml file contains details of the databases and document types that should be indexed:

twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
    users:
        sql: |-
            select
                id as key,
                name || ' @' || screen_name as title,
                created_at as timestamp,
                description as search_1
            from users

This will create a search_index table in the dogsheep.db database populated by data from those SQL queries.

By default the search index that this tool creates will be configured for Porter stemming. This means that searches for words like run will match documents containing runs or running.

If you don't want to use Porter stemming, use the --tokenize none option:

$ dogsheep-beta index dogsheep.db config.yml --tokenize none

You can pass other SQLite tokenize argumenst here, see the SQLite FTS tokenizers documentation.

Columns

The columns that can be returned by our query are:

  • key - a unique (within that type) primary key
  • title - the title for the item
  • timestamp - an ISO8601 timestamp, e.g. 2020-09-02T21:00:21
  • search_1 - a larger chunk of text to be included in the search index
  • category - an integer category ID, see below
  • is_public - an integer (0 or 1, defaults to 0 if not set) specifying if this is public or not

Public records are things like your public tweets, blog posts and GitHub commits.

Categories

Indexed items can be assigned a category. Categories are integers that correspond to records in the categories table, which defaults to containing the following:

id name
1 created
2 saved
3 received

created is for items that have been created by the Dogsheep instance owner.

saved is for items that they have saved, liked or favourited.

received is for items that have been specifically sent to them by other people - incoming emails or direct messages for example.

Custom results display

Each indexed item type can define custom display HTML as part of the config.yml file. It can do this using a display key containing a fragment of Jinja template, and optionally a display_sql key with extra SQL to execute to fetch the data to display.

Here's how to define a custom display template for a tweet:

twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
        display: |-
            <p>{{ title }} - tweeted at {{ timestamp }}</p>
            <blockquote>{{ search_1 }}</blockquote>

This example reuses the value that were stored in the search_index table when the indexing query was run.

To load in extra values to display in the template, use a display_sql query like this:

twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
        display_sql: |-
            select
                users.screen_name,
                tweets.full_text,
                tweets.created_at
            from
                tweets join users on tweets.user = users.id
            where
                tweets.id = :key
        display: |-
            <p>{{ display.screen_name }} - tweeted at {{ display.created_at }}</p>
            <blockquote>{{ display.full_text }}</blockquote>

The display_sql query will be executed for every search result, passing the key value from the search_index table as the :key parameter.

This performs well because many small queries are efficient in SQLite.

Displaying maps

This plugin will eventually include a number of useful shortcuts for rendering interesting content.

The first available shortcut is for displaying maps. Make your custom content output something like this:

<div
    data-map-latitude="{{ display.latitude }}"
    data-map-longitude="{{ display.longitude }}"
    style="display: none; float: right; width: 250px; height: 200px; background-color: #ccc;"
></div>

JavaScript on the page will look for any elements with data-map-latitude and data-map-longitude and, if it finds any, will load Leaflet and convert those elements into maps centered on that location. The default zoom level will be 12, or you can set a data-map-zoom attribute to customize this.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd dogsheep-beta
python3 -mvenv venv
source venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dogsheep-beta-0.7.1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

dogsheep_beta-0.7.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file dogsheep-beta-0.7.1.tar.gz.

File metadata

  • Download URL: dogsheep-beta-0.7.1.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for dogsheep-beta-0.7.1.tar.gz
Algorithm Hash digest
SHA256 1b3ca4aeccf5ccbf3d58ef33aa630fa364cc1983d0ddf8e8d1f10b0cc46d19ba
MD5 484d4feb26fc1282c834f943e1736107
BLAKE2b-256 d4abd30cc37e577cc5b5b6810260a66c32c5647066fd7a7afc6bb34d41258a0b

See more details on using hashes here.

File details

Details for the file dogsheep_beta-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: dogsheep_beta-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for dogsheep_beta-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7a92c202a186c85a7001e4d48f503109a2abcf54d43d431f200f89cfc424e72
MD5 cd25e5ac86c4ea8121060bcaaaac626a
BLAKE2b-256 b0f059ebeb52209d961bd341960ae29ce1ce06dd1c4ccc04b3d354bfcef68125

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page