Skip to main content

Software Heritage search service

Project description

Search service for the Software Heritage archive.

It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search.

Currently uses ElasticSearch, and provides only origin search (by URL and metadata).

Dependencies

  • Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server).

    • Debian-like host

      The elasticsearch package is required. As it’s not part of debian-stable, another debian repository is required to be configured

    • Non Debian-like host

      The tests expect:

      • /usr/share/elasticsearch/jdk/bin/java to exist.

      • org.elasticsearch.bootstrap.Elasticsearch to be in java’s classpath.

  • Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:

    cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \
    ./emsdk install latest && ./emsdk activate latest
    PATH="${PATH}:/opt/emsdk/upstream/emscripten"

    Note: If emsdk isn’t found in the PATH, the tree-sitter cli automatically pulls emscripten/emsdk image from docker hub when make ts-build-wasm or make ts-build is used.

Make targets

Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:

  • ts-install: Install node_modules and emscripten SDK required for TreeSitter

  • ts-generate: Generate parser files(C and JSON) from the grammar

  • ts-repl: Starts a web based playground for the TreeSitter grammar. It’s the recommended way for developing TreeSitter grammar.

  • ts-dev: Parse the query_language/sample_query and print the corresponding syntax expression along with the start and end positions of all the nodes.

  • ts-dev sanitize=1: Same as ts-dev but without start and end position of the nodes. This format is expected by TreeSitter’s native test command. sanitize=1 cleans the output of ts-dev using sed to achieve the desired format.

  • ts-test: executes TreeSitter’s native tests

  • ts-build-so: Generates swh_ql.so file from the previously generated parser using py-tree-sitter

  • ts-build-so: Generates swh_ql.wasm file from the previously generated parser using emscripten

  • ts-build: Executes both ts-build-so and ts-build-so

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_search-0.23.0.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh_search-0.23.0-py3-none-any.whl (94.7 kB view details)

Uploaded Python 3

File details

Details for the file swh_search-0.23.0.tar.gz.

File metadata

  • Download URL: swh_search-0.23.0.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for swh_search-0.23.0.tar.gz
Algorithm Hash digest
SHA256 7f26fed21a293d403bbf87a4f1e9accb5ee3e5e3dc65f290fdde3cdf16b3398b
MD5 3ecd1516ee970b55e1b0ce4c3fd23322
BLAKE2b-256 fe899f4e77d12768fe55c1441cb256188bff9de26577170e01a05d02051f6e29

See more details on using hashes here.

File details

Details for the file swh_search-0.23.0-py3-none-any.whl.

File metadata

  • Download URL: swh_search-0.23.0-py3-none-any.whl
  • Upload date:
  • Size: 94.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for swh_search-0.23.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd2187939d522fba08d7053d94bdc8a808cce786ca76df8ed827188727550940
MD5 d608b8ea2bce2e96f7e15209c2300cc5
BLAKE2b-256 7ab94d436a391a786499571cd8d1958a1fb9557558fbc0834643159a017296a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page