Skip to main content

Software Heritage search service

Project description

Search service for the Software Heritage archive.

It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search.

Currently uses ElasticSearch, and provides only origin search (by URL and metadata).

Dependencies

  • Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server).

    • Debian-like host

      The elasticsearch package is required. As it’s not part of debian-stable, another debian repository is required to be configured

    • Non Debian-like host

      The tests expect:

      • /usr/share/elasticsearch/jdk/bin/java to exist.

      • org.elasticsearch.bootstrap.Elasticsearch to be in java’s classpath.

  • Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:

    cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \
    ./emsdk install latest && ./emsdk activate latest
    PATH="${PATH}:/opt/emsdk/upstream/emscripten"

    Note: If emsdk isn’t found in the PATH, the tree-sitter cli automatically pulls emscripten/emsdk image from docker hub when make ts-build-wasm or make ts-build is used.

Make targets

Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:

  • ts-install: Install node_modules and emscripten SDK required for TreeSitter

  • ts-generate: Generate parser files(C and JSON) from the grammar

  • ts-repl: Starts a web based playground for the TreeSitter grammar. It’s the recommended way for developing TreeSitter grammar.

  • ts-dev: Parse the query_language/sample_query and print the corresponding syntax expression along with the start and end positions of all the nodes.

  • ts-dev sanitize=1: Same as ts-dev but without start and end position of the nodes. This format is expected by TreeSitter’s native test command. sanitize=1 cleans the output of ts-dev using sed to achieve the desired format.

  • ts-test: executes TreeSitter’s native tests

  • ts-build-so: Generates swh_ql.so file from the previously generated parser using py-tree-sitter

  • ts-build-so: Generates swh_ql.wasm file from the previously generated parser using emscripten

  • ts-build: Executes both ts-build-so and ts-build-so

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.search-0.18.1.tar.gz (85.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.search-0.18.1-py3-none-any.whl (91.5 kB view details)

Uploaded Python 3

File details

Details for the file swh.search-0.18.1.tar.gz.

File metadata

  • Download URL: swh.search-0.18.1.tar.gz
  • Upload date:
  • Size: 85.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for swh.search-0.18.1.tar.gz
Algorithm Hash digest
SHA256 d3036835c36a4a581562f819772d5715cd203f5a6c54bf71e333463317cb04a2
MD5 d18b1e72b074899cd5b84406171ccef1
BLAKE2b-256 3f7391df4aaeed18857a61b4fba2812b56fffc93a943d4cef14bc5614cdb31bf

See more details on using hashes here.

File details

Details for the file swh.search-0.18.1-py3-none-any.whl.

File metadata

  • Download URL: swh.search-0.18.1-py3-none-any.whl
  • Upload date:
  • Size: 91.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for swh.search-0.18.1-py3-none-any.whl
Algorithm Hash digest
SHA256 292b57a11a473e968c34447c7fc46a08739f0f59f84ad8791a12cc408b74fc8d
MD5 901cb3c9d36a828b69f2a52c721a4343
BLAKE2b-256 8a1fb6164ca4120666f75f8fade8e75907edaf5999ec76cf649d407aa1a9a0c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page