OpenSearch manager repository.
Project description
:rocket: osman
STRV repository with useful functions to ease (y)our work with OpenSearch.
osman stands for OpenSearch MANager.
:mag: Content
:computer: Installation
- Create a new virtual environment.
- Run
pip install osmanager.
:hammer: Usage
Create an Osman instance
The environment variables are read by OsmanConfig.
from osman import Osman, OsmanConfig
os_man = Osman(OsmanConfig())
Environment variables can be overridden.
os_man = Osman(OsmanConfig(host_url=<OpenSearch_host_url>))
Create an index
mapping = {
"mappings": {
"properties": {
"age": {"type": "integer"},
"id": {"type": "integer"},
"name": {"type": "text"},
}
}
}
settings = {
"settings": {
"number_of_shards": 3
}
}
os_man.create_index(
name=<index_name>, mapping=mapping, settings=settings
)
Upload a search template
source = {
"query": {
"match": {
"age": "{{age}}"
}
}
}
params = {
"age": 10
}
os_man.upload_search_template(
source=source, name=<template_name>, index=<index_name>, params=params
)
Upload a painless script
source = "doc['first.keyword'].value + ' ' + doc['last.keyword'].value"
os_man.upload_painless_script(source=source, name=<script_name>)
Delete a search template or a painless script
Removes either a painless script or a search template.
os_man.delete_script(name=<script_or_template_name>)
Debug a painless script
Executes a certain painless script with provided data and parameters. It then checks if the expected result is returned. The context_type must be provided and is either score or filter. This refers to the score or filter queries as is described here.
context_type = "score"
documents = {"id": 1, "container": [1, 2, 3]}
expected_result = 0
os_man.debug_painless_script(
source=<source>,
index=<index_name>,
params=<params>,
context_type=context_type,
documents=documents,
expected_result=expected_result
)
Debug a search template
Executes a certain search template against an index with defined parameters. It then checks if the expected indices are returned.
expected_ids = ["123", "10"]
os_man.debug_search_template(
source=source, name=<template_name>, index=<index_name>, params=params, expected_ids=expected_ids
)
Reindex
Reindex an existing index with a new mapping and/or settings. In order to reindex, this function adds a suffix [1, 2] to the index name. Afterwards, the index should be referenced by its alias rather than its name.
For example:
An index with the name test-index is reindexed. Its name becomes test-index-1. When reindexed again, its name will become test-index-2. Hence, it should be referenced by its unchanging alias test-index.
os_man.reindex(
name=<index_name>, mapping=<new_mapping>, settings=<new_settings>
)
:construction_worker_man: Contribution
:wrench: Local environment setup
Launch the docker daemon in advance of the following steps. You can develop the code using virtualenv but the local OpenSearch instance requires docker.
Local OpenSearch instance and docker development environment
Command make help shows a brief description of possible targets. A typical workflow scenario is:
- Run
make docker-run-opensearchto launch OpenSearch instance on port 9200 and OpenSearch Dashboards on port 5601, as described in docker-compose-opensearch.yml. - Run
make dev-envto get a bash shell in the development environment defined in Dockerfile. Under Linux set the user/group in advance in.env, see bellow. - Develop.
- Run
make docker-clean-allto clean unused containers and volumes.
Note the following:
-
Under Linux you may encounter a problem that a user in the docker guest container doesn't have permissions to write to the local directory. This leads to problems with
pytestunable to create cache directories. The solution is to have the followig variables in.envfile:DEV_USER_ID=1000 DEV_GROUP_ID=1000Substitute 1000 by
uidandgid(introduction in this article) of the user on a host machine. The ids can be obtained by runningidcommand. See explanation for why we do that. With Docker under MacOS or Windows you won't need this. -
You can browse the Dashboards from the web browser. In case of running Docker on localhost you can browse the Dashboards on http://localhost:5061.
-
All indexed data and dashboards are persistent in a docker volume, i.e. when you stop the OpenSearch containers the data are not lost.
In order to run notebook inside the docker container, use the following command
(and ensure notebook is in your dependencies):
jupyter notebook --ip 0.0.0.0 --no-browser --allow-root
virtualenv
- Create virtual environment called venv:
virtualenv --python=python3.8 venv - Activate it:
. ./venv/bin/activate - Install python package:
pip install -e .
In order to deactivate the environment, run deactivate command.
You can also delete the environment as following: rm -r ./venv/
:traffic_light: Testing
Run pytest in your devel environment to run all tests.
The OsmanConfig class can be initialized from the environment by the following variables:
| Variable | Default value | Type | Description |
|---|---|---|---|
| AUTH_METHOD | http |
string | http for username/password authentication, awsauthfor authentication with AWS user credentials |
| OPENSEARCH_HOST | None | string | address of OpenSearch host |
| OPENSEARCH_PORT | 443 | int | port number |
| OPENSEARCH_SSL_ENABLED | True | bool | use SSL? |
| OPENSEARCH_USER | None | string | username, for http AUTH_METHOD |
| OPENSEARCH_SECRET | None | string | password, for http AUTH_METHOD |
| AWS_ACCESS_KEY_ID | None | string | access key id for awsauth AUTH_METHOD, see AWS4Auth |
| AWS_SECRET_ACCESS_KEY | None | string | secret key for awsauth AUTH_METHOD |
| AWS_REGION | us-east-1 |
string | AWS region for awsauth AUTH_METHOD |
| AWS_SERVICE | es |
string | AWS service for awsauth AUTH_METHOD |
You can add these variables to your .env file, make dev-env will pass
them to the devel Docker image. There is a test in test_osman.py creating Osman instance
using environment variables so you can use any OpenSearch instance for testing.
:broom: Local lintering
For running linters from GitHub actions locally, you need to do the following.
- Install
pre-commitlibrary. - From root project directory, run:
pre-commit run --all-files
:heavy_plus_sign: Versioning
For information on semantic versioning, see semver.org.
Given a version number MAJOR.MINOR.PATCH, increment the:
- MAJOR version when you make incompatible API changes
- MINOR version when you add functionality in a backwards compatible manner
- PATCH version when you make backwards compatible bug fixes
When incrementing the MAJOR version, reset the MINOR and PATCH versions to 0. When incrementing the MINOR version, reset the PATCH version to 0.
When a version is released, a tag should be created in the format vMAJOR.MINOR.PATCH.
Follow the steps below to create a new release:
- Update
package.jsonwith the new version number. - Add the tag to the current branch like this:
git tag -a v1.0.0 -m "Release version 1.0.0"
- Push the tag to the remote repository:
git push origin --tags
- Create a new pull request with the new version number and merge it to the
masterbranch.
:pencil: Contributors
:scroll: OpenSearch
This section contains a brief overview of OpenSearch. This section is by no means complete or exhaustive. However, it contains a few tips and tricks that might be useful for the development of your OpenSearch project. For more information, visit the OpenSearch documentation.
:clipboard: Analyzers
Analyzers are used to process text fields during indexing and search time. Analyzers are composed of one or more tokenizer and zero or more token filters. The tokenizer breaks the text into individual terms and the token filters potentially remove them from the analyzed field. Fields can have more than one field type, so different analyzers can be utilised for different situations.
This section contains a small overview of the analyzers available in OpenSearch. For analyzers in general, see Analyzers in OpenSearch.
- Standard
- Standard analyzers are used to index and search for complete words. Standard analyzers are composed of a tokenizer and a token filter. The standard analyzer uses the standard tokenizer. The standard tokenizer breaks text into terms on word boundaries. The standard analyzer also uses the lowercase token filter. The lowercase token filter converts all tokens to lowercase.
- The standard tokenizer breaks down words on word boundaries. It breaks words down on whitespaces, as well as many special characters.
- The Standard analyzer might not be useful for fields like Email addresses. An email address like "123@strv-2.com" is broken down into "123", "strv", "2", "com".
- Whitespace
- Whitespace analyzers are used to index and search for complete words. Whitespace analyzers are composed of a tokenizer and a token filter. The whitespace analyzer uses the whitespace tokenizer.
- Because the Whitespace analyzer does not break words on special characters, it is more suitable for data like Email addresses or URLs.
- N-gram
- N-gram analyzers are used to index and search for partial words. N-gram analyzers are composed of a tokenizer and a token filter. The tokenizer breaks the text into individual terms and the token filter creates n-grams for each term. N-gram analyzers are used to index and search for partial words.
- N-gram analyzers should not be used for large text fields. This is because n-grams can be very large and can consume a lot of disk space.
- Words lose their meaning when they are broken into n-grams. For example, the word "search" is broken into "se", "ea", "ar", "rc", "ch". This means that a search for "sea" will match the word "search".
- N-gram analyzers could be useful when adressing a "username" field. This is because usernames are of limited length and users should probably be found by just searching for the middle part of their name. For example "xSuperUser" should be found by using "sup".
| Not analysed | Standard | Whitespace | N-gram |
|---|---|---|---|
| 123@strv-2.com | [123, strv, 2, com] | [123@strv-2.com] | [1, 12, 2, 23, 3...] |
| How's it going? | [How, s, it, going] | [How's, it, going?] | [H, Ho, o, ow, w...] |
| xUser-123 | [xUser, 123] | [xUser-123] | [x, xu, u, us, s...] |
:card_index: Field Types
This section contains a small overview of the field types that are available in OpenSearch. For field types that are not discussed here, please refer to the official documentation.
-
Keyword
- Keyword fields are used to store data that is not meant to be analyzed. Keyword fields are not analyzed and support exact value searches.
- Keyword fields are not used for full-text search.
- Keyword fields are for example useful for "id" fields.
-
Text
- Text fields are used to store textual data, such as the body of an email or the description of a product in an e-commerce store. Text fields are analyzed and support full-text search.
- Text fields are used when exact value searches are not wanted.
:books: Specs
Requirements
Library
The goal is to extend opensearch-py library features that we found useful.
- Connect to opensearch using url (user/pass) or service account
- Upload & remove search template
- Upload & remove function
- Create & drop index with mapping
- Load json file with template source element
- Load json file with function
- Load json file with mapping
- Run local search template with sample parameters to see if all work (without upload)
- Run local function with sample parameters (without upload)
- Run local index with sample doc with given data types (without upload)
- Check local file vs opensearch file and show differences
- search template
- function
- index mapping
Application
The goal is to allow teams manage opensearch instances easily with templated project setup.
- Have yaml config, yaml list of files
- Create sample json structure for all files
- Sync things based on yaml file:
- sync between local json files and yaml list
- sync between local yaml list and opensearch
- Split everything between envs: have yaml file with env definitions
- Nice to have lint template?
- vscode default json linter fails to lint template due to parameters syntax: {{#bla}}
Tasks
Tasks are kept in github project Osman
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osmanager-1.0.3.tar.gz.
File metadata
- Download URL: osmanager-1.0.3.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ae6ca1cfa59ec26bd3bace81822ce3f3ef2259d2b493a7c266c332171f64b33
|
|
| MD5 |
0cc051698b7ced34befdff7303024197
|
|
| BLAKE2b-256 |
d474e7d2113d3551a4d62c3b60f8ab778476e98218d1849530e48fa3b32a5294
|
File details
Details for the file osmanager-1.0.3-py3-none-any.whl.
File metadata
- Download URL: osmanager-1.0.3-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21c7b6e580394d99a39add890a1d267f063994d8149d93920803d201217c05fd
|
|
| MD5 |
450508ee26e3842050c658f9198e6f3f
|
|
| BLAKE2b-256 |
d57371602fcb606ae900af50509fe6704d45e84133113be6a4e10e4438a12208
|