recap-core

Recap is a tiny metadata service built for engineers.

These details have not been verified by PyPI

Project links

Project description

Recap crawls and serves metadata

There are a ton of data catalogs out there, but most are complex, bloated, and built for a wide audience. Recap is the opposite of that; it's tiny and built specifically for engineers.

You can use Recap to build data tools for:

Observability
Monitoring
Debugging
Security
Compliance
Governance
Cost

Principles

Recap was designed to be small and flexible.

Lightweight: Recap starts with a single command and doesn't require other infrastructure (not even Docker).
CLI-first: Recap doesn't even have a GUI!
RESTful: Recap comes with a REST API to ease integration with outside tools.
Automated: Recap is not meant for manual taxonomy curation.
Modular: Recap's storage and crawling layers are fully pluggable.
Programmable: Recap is a Python library, so you can invoke it directly from your code.

Integrations

Recap uses SQLAlchemy to access databases, so any SQLAlchemy dialect should work. Recap has been tested with:

Snowflake
Bigquery
PostgreSQL

Stream and filesystem crawling is in the works.

Usage

Quickstart

Start by installing Recap. Python 3.9 or above is required.

pip install recap-core

Now let's crawl a database:

recap refresh postgresql://username@localhost/some_db

You can use any SQLAlchemy connect string.

recap refresh bigquery://some-project-12345
recap refresh snowflake://some_user:some_pass@some_account_id

For Snowflake and BigQuery, you'll have to pip install snowflake-sqlalchemy or pip install sqlalchemy-bigquery, respectively.

Crawled metadata is stored in a directory structure. See what's available using:

recap list /

Recap will respond with a JSON list:

[
  "databases"
]

Append children to the list path to browse around:

recap list /databases

After you poke around, try and read some metadata. Every node in the path can have metadata, but only table and view children contain metadata. You can look at metadata using the recap read command:

recap read /databases/postgresql/instances/localhost/schemas/some_db/tables/some_table

Recap will print all of some_table's metadata to the CLI in JSON format:

{
  "access": {
    "username": {
      "privileges": [
        "INSERT",
        "SELECT",
        "UPDATE",
        "DELETE",
        "TRUNCATE",
        "REFERENCES",
        "TRIGGER"
      ],
      "read": true,
      "write": true
    }
  },
  "columns": {
    "email": {
      "autoincrement": false,
      "default": null,
      "generic_type": "VARCHAR",
      "nullable": false,
      "type": "VARCHAR"
    },
    "id": {
      "autoincrement": true,
      "default": "nextval('\"some_db\".some_table_id_seq'::regclass)",
      "generic_type": "BIGINT",
      "nullable": false,
      "type": "BIGINT"
    }
  },
  "data_profile": {
    "email": {
      "count": 10,
      "distinct": 10,
      "empty_strings": 0,
      "max_length": 32,
      "min_length": 13,
      "nulls": 0
    },
    "id": {
      "average": 5.5,
      "count": 10,
      "max": 10,
      "min": 1,
      "negatives": 0,
      "nulls": 0,
      "sum": 55.0,
      "zeros": 0
    }
  },
  "indexes": {
    "index_some_table_on_email": {
      "columns": [
        "email"
      ],
      "unique": false
    }
  },
  "location": {
    "database": "postgresql",
    "instance": "localhost",
    "schema": "some_db",
    "table": "some_table"
  },
  "primary_key": {
    "constrained_columns": [
      "id"
    ],
    "name": "some_table_pkey"
  }
}

You can search for metadata, too. Recap stores its metadata in DuckDB by default, so you can use DuckDB's JSON path syntax to search the catalog:

recap search "metadata->'$.location'->>'$.table' = 'some_table'"

The database file defaults to ~/.recap/catalog/recap.duckdb, if you wish to open a DuckDB client directly.

API

Server

Recap comes with an API out of the box. You can start it with:

recap api

A uvicorn server will bind to http://localhost:8000 by default. You can look at the API endpoints using http://localhost:8000/docs.

You can pass custom uvicorn configuration by creating ~/.recap/settings.toml and setting parameters under the api space like:

api.host = "0.0.0.0"

Client

You can use Recap's CLI to query a remote Recap API server if you wish. Set catalog.url in settings.toml to point to your Recap API location.

catalog.url = "http://localhost:8000"

Configuration

Recap uses Dynaconf to manage configuration. By default, Recap can see anything you put into ~/recap/settings.toml.

You can customize your settings.toml location using the SETTINGS_FILE_FOR_DYNACONF environment variable:

SETTINGS_FILE_FOR_DYNACONF=/tmp/api.toml recap list

You can also configure your catalog and crawlers in settings.toml. Here's an example:

[api]
host = "0.0.0.0"

[catalog]
url = file:///tmp/recap.duckdb

[[crawlers]]
url = "bigquery://some-project-12345"

[[crawlers]]
url = "postgresql://username@localhost/some_db"

[[crawlers]]
url = "snowflake://some_user:some_pass@some_account_id"

Warning

Recap is still a little baby application. It's going to wake up crying in the middle of the night. It's going to vomit on the floor once in a while. But if you give it some love and care, it'll be worth it. As time goes on, it'll grow up and be more mature. Bear with it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.12.0

Feb 29, 2024

0.11.1

Feb 29, 2024

0.11.0

Feb 5, 2024

0.10.2

Feb 5, 2024

0.10.1

Jan 19, 2024

0.10.0

Jan 18, 2024

0.9.6

Jan 3, 2024

0.9.5

Dec 20, 2023

0.9.4

Dec 20, 2023

0.9.3

Dec 20, 2023

0.9.2

Dec 20, 2023

0.9.1

Oct 10, 2023

0.9.0

Oct 10, 2023

0.8.9

Sep 1, 2023

0.8.8

Sep 1, 2023

0.8.7

Sep 1, 2023

0.8.6

Sep 1, 2023

0.8.5

Sep 1, 2023

0.8.4

Sep 1, 2023

0.8.3

Sep 1, 2023

0.8.2

Sep 1, 2023

0.8.1

Sep 1, 2023

0.8.0

Sep 1, 2023

0.7.5

Sep 1, 2023

0.7.4

Aug 17, 2023

0.7.3

Aug 16, 2023

0.7.2

Aug 16, 2023

0.7.1

Aug 16, 2023

0.7.0

Aug 16, 2023

0.6.0

Jul 12, 2023

0.5.2

Feb 27, 2023

0.5.1

Feb 27, 2023

0.5.0

Feb 27, 2023

0.4.1

Feb 3, 2023

0.4.0

Jan 30, 2023

0.3.1

Jan 24, 2023

0.3.0

Jan 20, 2023

0.2.1

Jan 6, 2023

0.2.0

Jan 5, 2023

This version

0.1.1

Dec 22, 2022

0.1.0

Dec 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recap-core-0.1.1.tar.gz (25.4 kB view hashes)

Uploaded Dec 22, 2022 Source

Built Distribution

recap_core-0.1.1-py3-none-any.whl (28.4 kB view hashes)

Uploaded Dec 22, 2022 Python 3

Hashes for recap-core-0.1.1.tar.gz

Hashes for recap-core-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`08705c0f94d3be373a178efab1a8624e7cf54bbaf92e382dd617363557d9033a`
MD5	`c60620e406a73fe4d972a327539458a1`
BLAKE2b-256	`9346268d6ff191f632c1144c3c0b6b1e53e455199abc38fd395e954964b6ce2f`

Hashes for recap_core-0.1.1-py3-none-any.whl

Hashes for recap_core-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`803631fbd8d0eeaca760b2a39c6d1758ac85e96518c5dd3b6e252b7141bf8ad3`
MD5	`bcb5cb2c042c75174b98da3ecccb931a`
BLAKE2b-256	`693a92b621a5460c03677c97788227c8fff6b33e4d00792a6405410c62372a62`