Skip to main content

Redactive Python SDK

Project description

Redactive Python SDK

The Redactive Python SDK provides a robust and intuitive interface for interacting with the Redactive platform, enabling developers to seamlessly integrate powerful data redaction and anonymization capabilities into their Python applications.

Installation

In order to use the package to integrate with Redactive.ai, run:

pip install --upgrade redactive

There is no need to clone this repository.

If you would like to modify this package, clone the repo and install from source:

python -m pip install .

Requirements

  • Python 3.11+

Usage

The library has the following components:

  • AuthClient - provides functionality to interact with data sources
  • SearchClient - provides functionality to search chunks with Redactive search service
  • MultiUserClient - provides functionality manage multi-user search with Redactive search service
  • RerankingSearchClient [Experimental] - retrieves extra results, then re-ranks them using a more precise ranking function, returning the top_k results

AuthClient

AuthClient needs to be configured with your account's API key which is available in the Apps page at Redactive Dashboard.

The AuthClient can be used to present users with the data providers' OAuth consent pages:

from redactive.auth_client import AuthClient

client = AuthClient(api_key="YOUR-APP'S-API-KEY")

# This value must _exactly_ match the redirect URI you provided when creating your
# Redactive app.
redirect_uri = "YOUR-APP'S-REDIRECT-URI"

# Possible data sources: confluence, sharepoint
provider = "confluence"

sign_in_url = await client.begin_connection(
    provider=provider, redirect_uri=redirect_uri
)

# Now redirect your user to sign_in_url 

The user will be redirected back to your app's configured redirect uri after they have completed the steps on the data provider's OAuth consent page. There will be a signin code present in the code parameter of the query string e.g. https://your-redirect-page.com?code=abcde12345.

This code may be exchanged for a user access token (which the user may use to issue queries against their data):

# Exchange signin code for a Redactive ID token
response = await client.exchange_tokens(code="SIGNIN-CODE")
access_token = response.idToken

Once a user has completed the OAuth flow, the data source should show up in their connected data sources:

response = await client.list_connections(
    access_token=access_token
)

assert "confluence" in response.connections # ✅

Use the list_connections method to keep your user's connection status up to date, and provide mechanisms to re-connect data sources.

SearchClient

With a Redactive access_token, you can perform two types of search

Query-based Search

Retrieve relevant chunks of information that are related to a user query.

from redactive.search_client import SearchClient

client = SearchClient()

# Semantic Search: retrieve text extracts (chunks) from various documents pertaining to the user query
client.search_chunks(
    access_token=access_token,
    query="Tell me about AI"
)

Filters may be applied to query-based search operations. At present, the following fields may be provided as filter predicates:

message Filters {
    // Scope of the query. This may either be the name of a provider, or a subspace of documents.
    // Subspaces take the form of <provider>://<tenancy>/<path>
    // e.g. for Confluence: 'confluence://redactiveai.atlassian.net/Engineering/Engineering Onboarding Guide'
    // for Sharepoint: 'sharepoint://redactiveai.sharepoint.com/Shared Documents/Engineering/Onboarding Guide.pdf'
    repeated string scope = 1;
    // Timespan of response chunk's creation
    optional TimeSpan created = 2;
    // Timespan of response chunk's last modification
    optional TimeSpan modified = 3;
    // List of user emails associated with response chunk
    repeated string user_emails = 4;
    // Include content from documents in trash
    optional bool include_content_in_trash = 5;
}

The query will only return results which match ALL filter predicates i.e. if multiple fields are populated in the filter object, the resulting filter is the logical 'AND' of all the fields. If a data source provider does not support a filter-type, then no results from that provider are returned.

Filters may be populated and provided to a query in the following way for the Python SDK:

from datetime import datetime, timedelta
from redactive.search_client import SearchClient
from redactive.grpc.v2 import Filters

client = SearchClient()

# Query chunks from Confluence only, that are from documents created before last week, modified since last week,
# and that are from documents associated with a user's email. Include chunks from trashed documents.
last_week = datetime.now() - timedelta(weeks=1)
filters = Filters().from_dict({
  "scope": ["confluence"],
  "created": {
    "before": last_week,
  },
  "modified": {
    "after": last_week,
  },
  "userEmails": ["myEmail@example.com"],
  "includeContentInTrash": True,
})
client.search_chunks(
    access_token="REDACTIVE-USER-ACCESS-TOKEN",
    semantic_query="Tell me about AI",
    filters=filters
)

Document Fetch

Obtain all the chunks from a specific document by specifying a unique reference (i.e. a URL).

# URL-based Search: retrieve all chunks of the document at that URL
client.get_document(
    access_token="REDACTIVE-USER-ACCESS-TOKEN",
    ref="https://example.com/document"
)

Multi-User Client

The MultiUserClient class helps manage multiple users' authentication and access to the Redactive search service.

from redactive.multi_user_client import MultiUserClient

multi_user_client = MultiUserClient(
    api_key="REDACTIVE-API-KEY",
    callback_uri="https://example.com/callback/",
    read_user_data=...,
    write_user_data=...,
)

# Present `connection_url` in browser for user to interact with:
user_id = ...
connection_url = await multi_user_client.get_begin_connection_url(user_id=user_id, provider="confluence")

# On user return from OAuth connection flow:
sign_in_code, state = ..., ...  # from URL query parameters
is_connection_successful = await multi_user_client.handle_connection_callback(
    user_id=user_id,
    sign_in_code=sign_in_code,
    state=state
)

# User can now use Redactive search service via `MultiUserClient`'s other methods:
query = "Tell me about the missing research vessel, the Borealis"
chunks = await multi_user_client.search_chunks(user_id=user_id, query=query)

Development

The Python SDK code can be found thesdks/python directory in Redactive Github Repository.

In order to comply with the repository style guide, we recommend running the following tools.

To format your code, run:

hatch fmt

To check type, run:

hatch run types:check

To test changes, run:

hatch test

To build Python SDK, run:

hatch build

To install local version, run:

python -m pip install -e .

Contribution Guide

Please check here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactive-2.0.1.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactive-2.0.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file redactive-2.0.1.tar.gz.

File metadata

  • Download URL: redactive-2.0.1.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for redactive-2.0.1.tar.gz
Algorithm Hash digest
SHA256 587edfef7c2de4cf681747d65580c36f5a747b3a401484f7e9f368c99001646f
MD5 8cb640215c7c3556c171968853e776fb
BLAKE2b-256 e9d30df387709057e69836c80605d93029707b01da1c2a2f2f2da8b1b8812577

See more details on using hashes here.

File details

Details for the file redactive-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: redactive-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for redactive-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3e2f8700abf70f85e597982bbed1605870130bd140bde8cf6c3f612fb33d159
MD5 d5d82d18e714752af9ffb672678f8107
BLAKE2b-256 668d1d822898b8d38e507a461b0725042cfd6de806304cd0423839f95265e50a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page