Skip to main content

A Python client for interacting with ScribeHub.

Project description

Scribe Python Client

The Scribe Python Client is a library for interacting with the ScribeHub API. It provides a simple interface for accessing datasets, querying vulnerabilities, and managing products.

Installation

Install the package using pip:

pip install scribe-python-client

Usage

The client requires an API token for authentication. You can obtain your API token from the ScribeHub dashboard. The CLI supports providing the SCRIBE_TOKEN as an argument, --api-key. You can set the SCRIBE_TOKEN environment variable to avoid passing the --api_token argument:

export SCRIBE_TOKEN=YOUR_API_TOKEN
scribe-client --api_call get-products

CLI Usage

The package includes a CLI tool for quick interactions. After installation, you can use the scribe-client command. Below are examples for all supported commands:

Examples

Get Products

Retrieve a list of products managed in Scribe:

scribe-client --api-call get-products --api-token YOUR_API_TOKEN

Get Product Vulnerabilities

Retrieve vulnerabilities for a specific product:

scribe-client --api-call get-product-vulnerabilities --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Get Policy Results

Retrieve policy results for a specific product:

scribe-client --api-call get-policy-results --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Get Datasets

Retrieve all datasets:

scribe-client --api-call get-datasets --api-token YOUR_API_TOKEN

List Attestations

List all attestations:

scribe-client --api-call list-attestations --api-token YOUR_API_TOKEN

Get Attestation

Retrieve a specific attestation by ID:

scribe-client --api-call get-attestation --attestation-id YOUR_ATTESTATION_ID --api-token YOUR_API_TOKEN

Attestation IDs ca n be obtained from the list of attestations - search for 'id' in the output.

Get Latest Attestation

Retrieve the latest attestation for a specific product:

scribe-client --api-call get-latest-attestation --product-name YOUR_PRODUCT_NAME --api-token YOUR_API_TOKEN

Specific Dataset Commands

The Scribe Python Client allows you to interact with specific datasets for advanced queries and data retrieval. Below are details about these commands and examples of how to use them.

Querying Specific Datasets

You can query specific datasets such as vulnerabilities, products, policies, and lineage. These commands allow you to run custom queries and retrieve detailed information.

Query Vulnerabilities Dataset

Run a custom query on the vulnerabilities dataset:

scribe-client --api-call query-vulnerabilities --query "{\"columns\": [\"vulnerability_id\", \"severity\"], \"filters\": [{\"col\": \"severity\", \"op\": \"==\", \"val\": \"High\"}], \"orderby\": [], \"row_limit\": 10}"

Query Products Dataset

Run a custom query on the products dataset:

scribe-client --api-call query-products --query "{\"columns\": [\"logical_app\", \"logical_app_version\"], \"filters\": [{\"col\": \"logical_app\", \"op\": \"like\", \"val\": \"%example%\"}], \"orderby\": [], \"row_limit\": 5}"

Query Policy Results Dataset

Run a custom query on the policy results dataset:

scribe-client --api-call query-policy-results --query "{\"columns\": [\"status\", \"time_evaluated\"], \"filters\": [{\"col\": \"status\", \"op\": \"==\", \"val\": \"Passed\"}], \"orderby\": [], \"row_limit\": 10}"

Query Lineage Dataset

Run a custom query on the lineage dataset:

scribe-client --api-call query-lineage --query "{\"columns\": [\"asset_name\", \"asset_type\"], \"filters\": [{\"col\": \"asset_type\", \"op\": \"==\", \"val\": \"repo\"}], \"orderby\": [], \"row_limit\": 10}"

Run a custom query on the lineage dataset and create a graph of the lineage:

scribe-client --api-call query-lineage --query "{\"columns\": [\"asset_name\", \"asset_type\", \"parent_name\", \"parent_type\", \"external_id\", \"parent_external_id\", \"uri\"], \"filters\": [{\"col\": \"logical_app\", \"op\": \"==\", \"val\": \"Astro-Analytics-Discovery\"}, {\"col\": \"logical_app_version\", \"op\": \"==\", \"val\": \"36\"}], \"orderby\": []}" --lineage-graph-file lineage-graph.html

Note that the columns in the query are the minimal set required to create a lineage graph.

Notes

  • Replace the --query argument with your desired query in JSON format.
  • Ensure that the query structure matches the dataset schema for accurate results.
  • Use the --api-token argument or set the SCRIBE_TOKEN environment variable for authentication.

Library Usage

You can also use the library programmatically in your Python code:

from scribe_python_client.client import ScribeClient

# Initialize the client
client = ScribeClient(api_token="YOUR_API_TOKEN")

# Get products
products = client.get_products()
print(products)

# Get datasets
datasets = client.get_datasets()
print(datasets)

Features

  • Get Products: Retrieve a list of products managed in Scribe.
  • Query Datasets: Query datasets for vulnerabilities, policy results, and more.
  • CLI Support: Use the scribe-client command for quick API interactions.

Function Groups

The library provides the following hierarchical function groups:

1. Product Management

  • Get Products: Retrieve a list of products managed in Scribe.
  • Get Product Vulnerabilities: Retrieve vulnerabilities for a specific product.

2. Dataset Management

  • Get Datasets: Retrieve all datasets.
  • Query Datasets: Query datasets for vulnerabilities, policy results, and more.

3. Policy Management

  • Get Policy Results: Retrieve policy results for a specific product.

4. Attestation Management

  • List Attestations: List all attestations.
  • Get Attestation: Retrieve a specific attestation by ID.
  • Get Latest Attestation: Retrieve the latest attestation for a specific product.

Tables Description

Table descriptions are part of this python package in the docs/ folder. Theses descriptions are consumed from ScribeHub Superset infrastructure, and require a username and password to the superset instance (not Scribe Token).

Prompt Templates for Dataset Queries

The Scribe Python Client supports prompt templates for dataset queries. This allows you to customize the instructions and context provided to users or models when interacting with specific datasets.

How It Works

  • For each dataset, you can provide a Markdown template file in the docs/ directory, named <dataset>-template.md (spaces replaced with underscores).
  • The template should contain the special placeholder {table} where the dataset's table description will be inserted.
  • If no template file is found, the default template is simply {table}.
  • The prompt is generated using the get_dataset_prompt method of ScribeClient.

Example: Lineage Dataset

For the lineage dataset, the template file is:

docs/extended_lineage_new-template.md

This file contains example queries and instructions for using the query_lineage method, followed by the placeholder {table}:

# Lineage Queryring

You can query the lineage dataset using the `query_lineage` method.
The query is a superset query json string like query keys the following examples:

... (example queries) ...

The full table is here: {table}

When you call:

client.get_dataset_prompt("extended lineage new")

The client will load docs/extended_lineage_new-template.md, insert the lineage table description at {table}, and return the full prompt string.

This makes it easy to provide rich, context-aware instructions for any dataset in your project.


Folowing is a sample of table desctiptions:

### `query_vulnerabilities` Columns
| Column Name                     | Description                                    |
|---------------------------------|------------------------------------------------|
| `advisory_justification`        | Justification for advisory decision           |
| `advisory_modified`             | Advisory creation timestamp                   |
| `advisory_status`               | Advisory decision status                      |
| `advisory_text`                 | Additional advisory information               |
| `attestation_ids`               | IDs for SBOM attestations                     |
| `attestation_name`              | SBOM attestation name                         |
| `base_score`                    | CVSS base score                               |
| `component_id`                  | Dependency ID                                 |
| `component_locations`           | Dependency locations in the product           |
| `component_name`                | Dependency name                               |
| `component_purl`                | Dependency Package URL                        |
| `component_version`             | Dependency version                            |
| `cvss_score`                    | CVSS score                                    |
| `epssProbability`               | Exploitability probability                    |
| `final_severity`                | Updated severity by user                      |
| `has_fix`                       | Is a patch available?                         |
| `has_kev`                       | Known Exploited Vulnerability?                |
| `id`                            | ID                                            |
| `is_latest_logical_version`     | Is this the latest product version?           |
| `labels`                        | User-defined labels for SBOM                  |
| `logical_app`                   | Product name                                  |
| `logical_app_version`           | Product version                               |
| `severity`                      | Original severity (integer, cvss score)       |
| `source_layer`                  | Image layer source of vulnerability           |
| `targetName`                    | Container/component name                      |
| `vector`                        | CVSS vector                                   |
| `version_timestamp`             | Timestamp of version                          |
| `vul_component_created`         | Dependency creation date                      |
| `vul_component_fixed_in_versions` | Fixed versions for the vulnerability        |
| `vul_published_on`              | Vulnerability publication date                |
| `vulnerability_id`              | Vulnerability ID (e.g., CVE-2024-5535)        |

### `query_products` Columns

When a user says component he means a container, and when he says dependency he means what the table calls components.
All conditions should be in the filter part, NOT in the group by.

| Column Name            | Description                                          |
|------------------------|------------------------------------------------------|
| `base_layer`           | "TRUE" if the dependency is part of the base layer, otherwise "FALSE" |
| `component_name`       | Dependency name                                      |
| `component_purl`       | Dependency URL                                       |
| `component_version`    | Dependency version                                   |
| `license_expression`   | License information                                  |
| `logical_app`          | Product name                                         |
| `logical_app_version`  | Product version                                      |
| `high_severity_cves`   | Count of critical/high vulnerabilities               |
| `labels`               | User-defined labels                                  |
| `version_is_up_to_date`| Is the dependency version up-to-date?                |
| `targetName`           | The name of a part of a products (high level compoenent, like a docker image) |
| `tag`                  | The tag/version of a part of a products (high level compoenent, like a docker image) |

### `query_policy_results` Columns
| Column Name           | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| `time_evaluated`      | Timestamp when the policy was evaluated.                                   |
| `logical_app`         | Product name.                                                              |
| `logical_app_version` | Product version.                                                           |
| `initiative_id`       | Identifier for the specific initiative associated with the policy.         |
| `version_id`          | Identifier for the version of the initiative or rule.                      |
| `gen_rule_id`         | Unique identifier for the general rule.                                    |
| `gen_rule_name`       | Name of the general rule.                                                  |
| `status`              | Rule result (e.g., pass, fail).                                            |
| `status_string`       | Detailed textual description of the rule result status.                    |
| `targetName`          | Name of the specific target being evaluated (component)                    |
| `gate`                | Checkpoint where the rule was evaluated.                                   |
| `count`               | Number of results.                                                         |
| `more`                | Additional information or metadata about the evaluation (if available).    |

### `query_lineage` Columns

When asked about products - use the logical_app and logical_app version columns and not the parent_name.

| Column Name           | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| `asset_name`         | Name of the asset.                                                          |
| `asset_type`         | Type of the asset (e.g., repo, image, pod).                                 |
| `external_id`        | External identifier for the asset.                                          |
| `logical_app`        | Product name.                                                              |
| `logical_app_version` | Product version.                                                           |
| `owner`              | Owner of the asset, if applicable.                                          |
| `parent_external_id` | External identifier of the parent asset.                                    |
| `parent_id`          | Unique identifier of the parent asset.                                      |
| `parent_name`        | Name of the parent asset.                                                   |
| `parent_type`        | Type of the parent asset.                                                   |
| `path`              | Relative or absolute path to the asset.                                     |
| `platform_name`      | Name of the platform hosting the asset.                                     |
| `platform_type`      | Type of platform (e.g., SCM, namespace).                                    |
| `product_id`        | Unique identifier for the product.                                          |
| `properties`         | Additional properties of the asset, as a json string                       |
| `timestamp`         | Timestamp when the asset was recorded.                                      |
| `uri`               | URI linking to the asset, if available.                                     |


## Prompt Templates for Dataset Queries

The Scribe Python Client supports prompt templates for dataset queries. This allows you to customize the instructions and context provided to users or models when interacting with specific datasets.

### How It Works
- For each dataset, you can provide a Markdown template file in the `docs/` directory, named `<dataset>-template.md` (spaces replaced with underscores).
- The template should contain the special placeholder `{table}` where the dataset's table description will be inserted.
- If no template file is found, the default template is simply `{table}`.
- The prompt is generated using the `get_dataset_prompt` method of `ScribeClient`.

### Example: Lineage Dataset
For the lineage dataset, the template file is:

```
docs/extended_lineage_new-template.md
```

This file contains example queries and instructions for using the `query_lineage` method, followed by the placeholder `{table}`:

```markdown
# Lineage Queryring

You can query the lineage dataset using the `query_lineage` method.
The query is a superset query json string like query keys the following examples:

... (example queries) ...

The full table is here: {table}
```

When you call:

```python
client.get_dataset_prompt("extended lineage new")
```

The client will load `docs/extended_lineage_new-template.md`, insert the lineage table description at `{table}`, and return the full prompt string.

This makes it easy to provide rich, context-aware instructions for any dataset in your project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scribe_python_client-0.2.12.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scribe_python_client-0.2.12-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file scribe_python_client-0.2.12.tar.gz.

File metadata

  • Download URL: scribe_python_client-0.2.12.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for scribe_python_client-0.2.12.tar.gz
Algorithm Hash digest
SHA256 eca20f0633792d2916cb5f222d7877a109afc202522652cbe0a09345adc7181d
MD5 d08df564758f0104d3b59a51a2f867f3
BLAKE2b-256 dc594ca629a459b3a074e86cde2f9864075fefbc10842a655e9bccc3d1e5cbe2

See more details on using hashes here.

File details

Details for the file scribe_python_client-0.2.12-py3-none-any.whl.

File metadata

File hashes

Hashes for scribe_python_client-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 db24b7f4dc52f5f182ac50bf5b8b8367a87dd9f4f1b4e6101b077a1685d62489
MD5 72949416b19a7b0a6f031d42c7060747
BLAKE2b-256 56870cc24d1c7c528d7c80d06c1d29bdbdbd2b8eb0e08b3b062d20541f496b01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page