Backend for Karp
Project description
karp-backend-5
This package is the legacy version of Karp, [go here for the current version(https://github.com/spraakbanken/karp-backend)]
master
Karp is the lexical platform of Språkbanken. Now migrated to Python 3.6+.
Karp in Docker
For easy testing, use Docker to run Karp-b.
-
Follow the steps given here
-
Run
docker-compose up -d
-
Test it by running
curl localhost:8081/app/test
If you want to use Karp without Docker, keep on reading.
Prerequisites
- ElasticSearch6
- SQL, preferrably MariaDB
- a WSGI server for example mod_wsgi with Apache, Waitress, Gunicorn, uWSGI. . .
- an authentication server. Read more about this here
- Python >= 3.6 with pip
Installation
Karp uses virtuals envs for python. To get running:
- run
make install
- or:
- Create the virtual environment using
python3 -m venv venv
. - Activate the virtual environment with
source venv/bin/activate
. pip install -r requirements.txt
- Create the virtual environment using
Configuration
Set the environment varibles KARP5_INSTANCE_PATH
and KARP5_ELASTICSEARCH_URL
:
- using
export VAR=value
- or creating a file
.env
in the root of your cloned path withVAR=value
KARP5_INSTANCE_PATH
- the path where your configs are. If you have cloned this repo you can use/path/to/karp-backend/
.KARP5_ELASTICSEARCH_URL
- the url to elasticsearch. Typicallylocalhost:9200
Copy config.json.example
to config.json
and make your changes.
You will also need to make configurations for your lexicons.
Read more here.
Tests
TODO: DO MORE TESTS!
Run the tests by typing: make test
Test that karp-backend
is working by starting it
make run
or python run.py
Known bugs
Counts from the statistics
call may not be accurate when performing
subaggregations (multiple buckets) on big indices unless the query
restricts the search space. Using
breadth_first
mode does not (always) help.
Possible workarounds:
- use composite aggregation instead, but this does not work with filtering.
- set a bigger shard_size (27 000 works for saldo), but this might break your ES cluster.
- have smaller indices (one lexicon per index) but this does not help for big lexicons or statistics over many lexicons.
- don't allow deeper subaggregations than 2. Chaning the
size
won't help.
Elasticsearch
If saving stops working because of Database Exception: Error during update. Message: TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];').
, you need to unlock the relevant ES index.
This is how you do it:
Repeat for every combination of host
and port
that is relevant for you. But you only need to do it once per cluster.
- Check if any index is locked:
curl <host>:<port>/_all/_settings/index.blocks*
- If all is open, Elasticsearch answers with
{}
- else it answers with
{<index>: { "settings": { "index": { "blocks": {"read_only_allow_delete": "true"} } } }, ... }
- If all is open, Elasticsearch answers with
- To unlock all locked indices on a
host
andport
:curl -X PUT <host>:<port>/_all/_settings -H 'Content-Type: application' -d '{"index.blocks.read_only_allow_delete": null}'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file karp-backend-5-5.29.0.tar.gz
.
File metadata
- Download URL: karp-backend-5-5.29.0.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd4816665f816d2d9e9143cc46a4f2d7edf332df12fa9eb88fdcf981c749fedd |
|
MD5 | db4bee7016235d37bc31a2bd55e5ca59 |
|
BLAKE2b-256 | e73ec4c1b868e0ea39be0f94a8c658072123b989be72e87a7c762d6af8502335 |
File details
Details for the file karp_backend_5-5.29.0-py3-none-any.whl
.
File metadata
- Download URL: karp_backend_5-5.29.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 736127bcdbb4b15c597cdee5ad003e89baf7a3be699e76af022b9486daa35c39 |
|
MD5 | 7cfd626ce38e47ac7c21af9611edcf4c |
|
BLAKE2b-256 | ca12bedfb53e8259e7254b22f1118e9553b41eab8e0ea4adb715a3d8f51ce537 |