Software Heritage search service
Project description
swh-search
Search service for the Software Heritage archive.
It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search.
Currently uses ElasticSearch, and provides only origin search (by URL and metadata)
Dependencies
-
Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server).
-
Debian-like host
The elasticsearch package is required. As it's not part of debian-stable, another debian repository is required to be configured
-
Non Debian-like host
The tests expect:
/usr/share/elasticsearch/jdk/bin/java
to exist.org.elasticsearch.bootstrap.Elasticsearch
to be in java's classpath.
-
-
Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:
cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \ ./emsdk install latest && ./emsdk activate latest PATH="${PATH}:/opt/emsdk/upstream/emscripten"
Note: If emsdk isn't found in the PATH, the tree-sitter cli automatically pulls
emscripten/emsdk
image from docker hub whenmake ts-build-wasm
ormake ts-build
is used.
Make targets
Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:
-
ts-install: Install node_modules and emscripten SDK required for TreeSitter
-
ts-generate: Generate parser files(C and JSON) from the grammar
-
ts-repl: Starts a web based playground for the TreeSitter grammar. It's the recommended way for developing TreeSitter grammar.
-
ts-dev: Parse the
query_language/sample_query
and print the corresponding syntax expression along with the start and end positions of all the nodes. -
ts-dev sanitize=1: Same as ts-dev but without start and end position of the nodes. This format is expected by TreeSitter's native test command.
sanitize=1
cleans the output of ts-dev usingsed
to achieve the desired format. -
ts-test: executes TreeSitter's native tests
-
ts-build-so: Generates
swh_ql.so
file from the previously generated parser using py-tree-sitter -
ts-build-so: Generates
swh_ql.wasm
file from the previously generated parser using emscripten -
ts-build: Executes both ts-build-so and ts-build-so
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for swh.search-0.13.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eefdcb423ae25c2f5b1f1d618a3e8efce2a7a46549bb0ed3b5f725324cfbd395 |
|
MD5 | 81e7c5b58b1909f26a6c6a01b0da0e94 |
|
BLAKE2b-256 | e5a53e5e87c3e91771b7badb5e57eeceb1d8a479cf5fb664dffefca34697e909 |