Software Heritage search service
Project description
Search service for the Software Heritage archive.
It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search.
Currently uses ElasticSearch, and provides only origin search (by URL and metadata).
Dependencies
Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server).
Debian-like host
The elasticsearch package is required. As it’s not part of debian-stable, another debian repository is required to be configured
Non Debian-like host
The tests expect:
/usr/share/elasticsearch/jdk/bin/java to exist.
org.elasticsearch.bootstrap.Elasticsearch to be in java’s classpath.
Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:
cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \ ./emsdk install latest && ./emsdk activate latest PATH="${PATH}:/opt/emsdk/upstream/emscripten"Note: If emsdk isn’t found in the PATH, the tree-sitter cli automatically pulls emscripten/emsdk image from docker hub when make ts-build-wasm or make ts-build is used.
Make targets
Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:
ts-install: Install node_modules and emscripten SDK required for TreeSitter
ts-generate: Generate parser files(C and JSON) from the grammar
ts-repl: Starts a web based playground for the TreeSitter grammar. It’s the recommended way for developing TreeSitter grammar.
ts-dev: Parse the query_language/sample_query and print the corresponding syntax expression along with the start and end positions of all the nodes.
ts-dev sanitize=1: Same as ts-dev but without start and end position of the nodes. This format is expected by TreeSitter’s native test command. sanitize=1 cleans the output of ts-dev using sed to achieve the desired format.
ts-test: executes TreeSitter’s native tests
ts-build-so: Generates swh_ql.so file from the previously generated parser using py-tree-sitter
ts-build-so: Generates swh_ql.wasm file from the previously generated parser using emscripten
ts-build: Executes both ts-build-so and ts-build-so
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swh_search-0.23.0.tar.gz.
File metadata
- Download URL: swh_search-0.23.0.tar.gz
- Upload date:
- Size: 83.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f26fed21a293d403bbf87a4f1e9accb5ee3e5e3dc65f290fdde3cdf16b3398b
|
|
| MD5 |
3ecd1516ee970b55e1b0ce4c3fd23322
|
|
| BLAKE2b-256 |
fe899f4e77d12768fe55c1441cb256188bff9de26577170e01a05d02051f6e29
|
File details
Details for the file swh_search-0.23.0-py3-none-any.whl.
File metadata
- Download URL: swh_search-0.23.0-py3-none-any.whl
- Upload date:
- Size: 94.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd2187939d522fba08d7053d94bdc8a808cce786ca76df8ed827188727550940
|
|
| MD5 |
d608b8ea2bce2e96f7e15209c2300cc5
|
|
| BLAKE2b-256 |
7ab94d436a391a786499571cd8d1958a1fb9557558fbc0834643159a017296a9
|