Dogsheep search index
Project description
dogsheep-beta
Dogsheep search index
Installation
Install this tool like so:
$ pip install dogsheep-beta
Usage
Run the indexer using the dogsheep-beta
command-line tool:
$ dogsheep-beta index dogsheep.db config.yml
The config.yml
file contains details of the databases and tables that should be indexed:
twitter.db:
tweets:
sql: |-
select
tweets.id as key,
'Tweet by @' || users.screen_name as title,
tweets.created_at as timestamp,
tweets.full_text as search_1
from tweets join users on tweets.user = users.id
users:
sql: |-
select
id as key,
name || ' @' || screen_name as title,
created_at as timestamp,
description as search_1
from users
This will create a search_index
table in the dogsheep.db
database populated by data from those SQL queries.
By default the search index that this tool creates will be configured for Porter stemming. This means that searches for words like run
will match documents containing runs
or running
.
If you don't want to use Porter stemming, use the --tokenize none
option:
$ dogsheep-beta index dogsheep.db config.yml --tokenize none
You can pass other SQLite tokenize argumenst here, see the SQLite FTS tokenizers documentation.
Columns
The columns that can be returned by our query are:
key
- a unique (within that table) primary keytitle
- the title for the itemtimestamp
- an ISO8601 timestamp, e.g.2020-09-02T21:00:21
search_1
- a larger chunk of text to be included in the search indexcategory
- an integer category ID, see belowis_public
- an integer (0 or 1, defaults to 0 if not set) specifying if this is public or not
Public records are things like your public tweets, blog posts and GitHub commits.
Categories
Indexed items can be assigned a category. Categories are integers that correspond to records in the categories
table, which defaults to containing the following:
id | name |
---|---|
1 | created |
2 | saved |
3 | received |
created
is for items that have been created by the Dogsheep instance owner.
saved
is for items that they have saved, liked or favourited.
received
is for items that have been specifically sent to them by other people - incoming emails or direct messages for example.
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd dogsheep-beta
python3 -mvenv venv
source venv/bin/activate
Or if you are using pipenv
:
pipenv shell
Now install the dependencies and tests:
pip install -e '.[test]'
To run the tests:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dogsheep-beta-0.4a1.tar.gz
.
File metadata
- Download URL: dogsheep-beta-0.4a1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0404894b6e6d33656826e7777a49be930e6e4e8c603386e1c6280d6a253baf28 |
|
MD5 | c4d97ff2beb6bcfa885ac2419447e85e |
|
BLAKE2b-256 | 271ab43f7c1ef5d92557795127b4182ac5c74c871a3574729ed8884638fa190f |
File details
Details for the file dogsheep_beta-0.4a1-py3-none-any.whl
.
File metadata
- Download URL: dogsheep_beta-0.4a1-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e9c4d6805ac13eb2c78c720e5853c5fd7050b5b31aca9495d5505ca679cc512 |
|
MD5 | d2412fe7212f05f55d87e194100f7cc3 |
|
BLAKE2b-256 | c66bc972af5c836aa12c701ce0129116b9a70055605e30d25b284c67922e22f6 |