Skip to main content

A Python client for interacting with ScribeHub.

Project description

Scribe Python Client

The Scribe Python Client is a library for interacting with the ScribeHub API. It provides a simple interface for accessing datasets, querying vulnerabilities, and managing products.

Installation

Install the package using pip:

pip install scribe-python-client

Usage

The client requires an API token for authentication. You can obtain your API token from the ScribeHub dashboard. The CLI supports providing the SCRIBE_TOKEN as an argument, --api-key. You can set the SCRIBE_TOKEN environment variable to avoid passing the --api_token argument:

export SCRIBE_TOKEN=YOUR_API_TOKEN
scribe-client --api_call get-products

CLI Usage

The package includes a CLI tool for quick interactions. After installation, you can use the scribe-client command. Below are examples for all supported commands:

Examples

Get Products

Retrieve a list of products managed in Scribe:

scribe-client --api-call get-products --api-token YOUR_API_TOKEN

Get Product Vulnerabilities

Retrieve vulnerabilities for a specific product:

scribe-client --api-call get-product-vulnerabilities --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Get Policy Results

Retrieve policy results for a specific product:

scribe-client --api-call get-policy-results --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Get Datasets

Retrieve all datasets:

scribe-client --api-call get-datasets --api-token YOUR_API_TOKEN

List Attestations

List all attestations:

scribe-client --api-call list-attestations --api-token YOUR_API_TOKEN

Get Attestation

Retrieve a specific attestation by ID:

scribe-client --api-call get-attestation --attestation-id YOUR_ATTESTATION_ID --api-token YOUR_API_TOKEN

Attestation IDs ca n be obtained from the list of attestations - search for 'id' in the output.

Get Latest Attestation

Retrieve the latest attestation for a specific product:

scribe-client --api-call get-latest-attestation --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Specific Dataset Commands

The Scribe Python Client allows you to interact with specific datasets for advanced queries and data retrieval. Below are details about these commands and examples of how to use them.

Querying Specific Datasets

You can query specific datasets such as vulnerabilities, products, policies, and lineage. These commands allow you to run custom queries and retrieve detailed information.

Query Vulnerabilities Dataset

Run a custom query on the vulnerabilities dataset:

scribe-client --api-call query-vulnerabilities --query "{\"columns\": [\"vulnerability_id\", \"severity\"], \"filters\": [{\"col\": \"severity\", \"op\": \"==\", \"val\": \"High\"}], \"orderby\": [], \"row_limit\": 10}"

Query Products Dataset

Run a custom query on the products dataset:

scribe-client --api-call query-products --query "{\"columns\": [\"logical_app\", \"logical_app_version\"], \"filters\": [{\"col\": \"logical_app\", \"op\": \"like\", \"val\": \"%example%\"}], \"orderby\": [], \"row_limit\": 5}"

Query Policy Results Dataset

Run a custom query on the policy results dataset:

scribe-client --api-call query-policy-results --query "{\"columns\": [\"status\", \"time_evaluated\"], \"filters\": [{\"col\": \"status\", \"op\": \"==\", \"val\": \"Passed\"}], \"orderby\": [], \"row_limit\": 10}"

Query Lineage Dataset

Run a custom query on the lineage dataset:

scribe-client --api-call query-lineage --query "{\"columns\": [\"asset_name\", \"asset_type\"], \"filters\": [{\"col\": \"asset_type\", \"op\": \"==\", \"val\": \"repo\"}], \"orderby\": [], \"row_limit\": 10}"

Run a custom query on the lineage dataset and create a graph of the lineage:

scribe-client --api-call query-lineage --query "{\"columns\": [\"asset_name\", \"asset_type\", \"parent_name\", \"parent_type\", \"external_id\", \"parent_external_id\", \"uri\"], \"filters\": [{\"col\": \"logical_app\", \"op\": \"==\", \"val\": \"Astro-Analytics-Discovery\"}, {\"col\": \"logical_app_version\", \"op\": \"==\", \"val\": \"36\"}], \"orderby\": []}" --lineage-graph-file lineage-graph.html

Note that the columns in the query are the minimal set required to create a lineage graph.

Notes

  • Replace the --query argument with your desired query in JSON format.
  • Ensure that the query structure matches the dataset schema for accurate results.
  • Use the --api-token argument or set the SCRIBE_TOKEN environment variable for authentication.

Library Usage

You can also use the library programmatically in your Python code:

from scribe_python_client.client import ScribeClient

# Initialize the client
client = ScribeClient(api_token="YOUR_API_TOKEN")

# Get products
products = client.get_products()
print(products)

# Get datasets
datasets = client.get_datasets()
print(datasets)

Features

  • Get Products: Retrieve a list of products managed in Scribe.
  • Query Datasets: Query datasets for vulnerabilities, policy results, and more.
  • CLI Support: Use the scribe-client command for quick API interactions.

Function Groups

The library provides the following hierarchical function groups:

1. Product Management

  • Get Products: Retrieve a list of products managed in Scribe.
  • Get Product Vulnerabilities: Retrieve vulnerabilities for a specific product.

2. Dataset Management

  • Get Datasets: Retrieve all datasets.
  • Query Datasets: Query datasets for vulnerabilities, policy results, and more.

3. Policy Management

  • Get Policy Results: Retrieve policy results for a specific product.

4. Attestation Management

  • List Attestations: List all attestations.
  • Get Attestation: Retrieve a specific attestation by ID.
  • Get Latest Attestation: Retrieve the latest attestation for a specific product.

Tables Description

Table descriptions are part of this python package in the docs/ folder. Theses descriptions are consumed from ScribeHub Superset infrastructure, and require a username and password to the superset instance (not Scribe Token).

Prompt Templates for Dataset Queries

The Scribe Python Client supports prompt templates for dataset queries. This allows you to customize the instructions and context provided to users or models when interacting with specific datasets.

How It Works

  • For each dataset, you can provide a Markdown template file in the docs/ directory, named <dataset>-template.md (spaces replaced with underscores).
  • The template should contain the special placeholder {table} where the dataset's table description will be inserted.
  • If no template file is found, the default template is simply {table}.
  • The prompt is generated using the get_dataset_prompt method of ScribeClient.

Example: Lineage Dataset

For the lineage dataset, the template file is:

docs/extended_lineage_new-template.md

This file contains example queries and instructions for using the query_lineage method, followed by the placeholder {table}:

# Lineage Queryring

You can query the lineage dataset using the `query_lineage` method.
The query is a superset query json string like query keys the following examples:

... (example queries) ...

The full table is here: {table}

When you call:

client.get_dataset_prompt("extended lineage new")

The client will load docs/extended_lineage_new-template.md, insert the lineage table description at {table}, and return the full prompt string.

This makes it easy to provide rich, context-aware instructions for any dataset in your project.


Folowing is a sample of table desctiptions:

### `query_vulnerabilities` Columns
| Column Name                     | Description                                    |
|---------------------------------|------------------------------------------------|
| `advisory_justification`        | Justification for advisory decision           |
| `advisory_modified`             | Advisory creation timestamp                   |
| `advisory_status`               | Advisory decision status                      |
| `advisory_text`                 | Additional advisory information               |
| `attestation_ids`               | IDs for SBOM attestations                     |
| `attestation_name`              | SBOM attestation name                         |
| `base_score`                    | CVSS base score                               |
| `component_id`                  | Dependency ID                                 |
| `component_locations`           | Dependency locations in the product           |
| `component_name`                | Dependency name                               |
| `component_purl`                | Dependency Package URL                        |
| `component_version`             | Dependency version                            |
| `cvss_score`                    | CVSS score                                    |
| `epssProbability`               | Exploitability probability                    |
| `final_severity`                | Updated severity by user                      |
| `has_fix`                       | Is a patch available?                         |
| `has_kev`                       | Known Exploited Vulnerability?                |
| `id`                            | ID                                            |
| `is_latest_logical_version`     | Is this the latest product version?           |
| `labels`                        | User-defined labels for SBOM                  |
| `logical_app`                   | Product name                                  |
| `logical_app_version`           | Product version                               |
| `severity`                      | Original severity (integer, cvss score)       |
| `source_layer`                  | Image layer source of vulnerability           |
| `targetName`                    | Container/component name                      |
| `vector`                        | CVSS vector                                   |
| `version_timestamp`             | Timestamp of version                          |
| `vul_component_created`         | Dependency creation date                      |
| `vul_component_fixed_in_versions` | Fixed versions for the vulnerability        |
| `vul_published_on`              | Vulnerability publication date                |
| `vulnerability_id`              | Vulnerability ID (e.g., CVE-2024-5535)        |

### `query_products` Columns

When a user says component he means a container, and when he says dependency he means what the table calls components.
All conditions should be in the filter part, NOT in the group by.

| Column Name            | Description                                          |
|------------------------|------------------------------------------------------|
| `base_layer`           | "TRUE" if the dependency is part of the base layer, otherwise "FALSE" |
| `component_name`       | Dependency name                                      |
| `component_purl`       | Dependency URL                                       |
| `component_version`    | Dependency version                                   |
| `license_expression`   | License information                                  |
| `logical_app`          | Product name                                         |
| `logical_app_version`  | Product version                                      |
| `high_severity_cves`   | Count of critical/high vulnerabilities               |
| `labels`               | User-defined labels                                  |
| `version_is_up_to_date`| Is the dependency version up-to-date?                |
| `targetName`           | The name of a part of a products (high level compoenent, like a docker image) |
| `tag`                  | The tag/version of a part of a products (high level compoenent, like a docker image) |

### `query_policy_results` Columns
| Column Name           | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| `time_evaluated`      | Timestamp when the policy was evaluated.                                   |
| `logical_app`         | Product name.                                                              |
| `logical_app_version` | Product version.                                                           |
| `initiative_id`       | Identifier for the specific initiative associated with the policy.         |
| `version_id`          | Identifier for the version of the initiative or rule.                      |
| `gen_rule_id`         | Unique identifier for the general rule.                                    |
| `gen_rule_name`       | Name of the general rule.                                                  |
| `status`              | Rule result (e.g., pass, fail).                                            |
| `status_string`       | Detailed textual description of the rule result status.                    |
| `targetName`          | Name of the specific target being evaluated (component)                    |
| `gate`                | Checkpoint where the rule was evaluated.                                   |
| `count`               | Number of results.                                                         |
| `more`                | Additional information or metadata about the evaluation (if available).    |

### `query_lineage` Columns

When asked about products - use the logical_app and logical_app version columns and not the parent_name.

| Column Name           | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| `asset_name`         | Name of the asset.                                                          |
| `asset_type`         | Type of the asset (e.g., repo, image, pod).                                 |
| `external_id`        | External identifier for the asset.                                          |
| `logical_app`        | Product name.                                                              |
| `logical_app_version` | Product version.                                                           |
| `owner`              | Owner of the asset, if applicable.                                          |
| `parent_external_id` | External identifier of the parent asset.                                    |
| `parent_id`          | Unique identifier of the parent asset.                                      |
| `parent_name`        | Name of the parent asset.                                                   |
| `parent_type`        | Type of the parent asset.                                                   |
| `path`              | Relative or absolute path to the asset.                                     |
| `platform_name`      | Name of the platform hosting the asset.                                     |
| `platform_type`      | Type of platform (e.g., SCM, namespace).                                    |
| `product_id`        | Unique identifier for the product.                                          |
| `properties`         | Additional properties of the asset, as a json string                       |
| `timestamp`         | Timestamp when the asset was recorded.                                      |
| `uri`               | URI linking to the asset, if available.                                     |


## Prompt Templates for Dataset Queries

The Scribe Python Client supports prompt templates for dataset queries. This allows you to customize the instructions and context provided to users or models when interacting with specific datasets.

### How It Works
- For each dataset, you can provide a Markdown template file in the `docs/` directory, named `<dataset>-template.md` (spaces replaced with underscores).
- The template should contain the special placeholder `{table}` where the dataset's table description will be inserted.
- If no template file is found, the default template is simply `{table}`.
- The prompt is generated using the `get_dataset_prompt` method of `ScribeClient`.

### Example: Lineage Dataset
For the lineage dataset, the template file is:

```
docs/extended_lineage_new-template.md
```

This file contains example queries and instructions for using the `query_lineage` method, followed by the placeholder `{table}`:

```markdown
# Lineage Queryring

You can query the lineage dataset using the `query_lineage` method.
The query is a superset query json string like query keys the following examples:

... (example queries) ...

The full table is here: {table}
```

When you call:

```python
client.get_dataset_prompt("extended lineage new")
```

The client will load `docs/extended_lineage_new-template.md`, insert the lineage table description at `{table}`, and return the full prompt string.

This makes it easy to provide rich, context-aware instructions for any dataset in your project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scribe_python_client-0.2.10.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scribe_python_client-0.2.10-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file scribe_python_client-0.2.10.tar.gz.

File metadata

  • Download URL: scribe_python_client-0.2.10.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for scribe_python_client-0.2.10.tar.gz
Algorithm Hash digest
SHA256 fd9127915f27c993e5d0655294db1487c4bc54968786fc7e78809d01c81a2034
MD5 7bd744764ba8bc359f439102978abe8f
BLAKE2b-256 f81486b7f6008b0040a3108863ccb56cc18c9d9d12d46aefefcee433ce74cfe0

See more details on using hashes here.

File details

Details for the file scribe_python_client-0.2.10-py3-none-any.whl.

File metadata

File hashes

Hashes for scribe_python_client-0.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 f5b3d3dc48c634f285e62d85adae1624df6fde2f2e14f8d9ed7fb73e7060eb45
MD5 841d10b715ac995ccbe762bc916b77d6
BLAKE2b-256 a28e4808b9ecbd2938c5cae594593c111470e06cba5790a5de4758e0a67dc4ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page