Redactive Python SDK
Project description
Redactive Python SDK
The Redactive Python SDK provides a robust and intuitive interface for interacting with the Redactive platform, enabling developers to seamlessly integrate powerful data redaction and anonymization capabilities into their Python applications.
Installation
In order to use the package to integrate with Redactive.ai, run:
pip install --upgrade redactive
There is no need to clone this repository.
If you would like to modify this package, clone the repo and install from source:
python -m pip install .
Requirements
- Python 3.11+
Usage
The library has the following components:
- AuthClient - provides functionality to interact with data sources
- SearchClient - provides functionality to search chunks with Redactive search service
- MultiUserClient - provides functionality manage multi-user search with Redactive search service
- RerankingSearchClient [Experimental] - retrieves extra results, then re-ranks them using a more precise ranking function, returning the top_k results
AuthClient
AuthClient needs to be configured with your account's API key which is available in the Apps page at Redactive Dashboard.
The AuthClient can be used to present users with the data providers' OAuth consent pages:
from redactive.auth_client import AuthClient
client = AuthClient(api_key="YOUR-APP'S-API-KEY")
# This value must _exactly_ match the redirect URI you provided when creating your
# Redactive app.
redirect_uri = "YOUR-APP'S-REDIRECT-URI"
# Possible data sources: confluence, sharepoint
provider = "confluence"
sign_in_url = await client.begin_connection(
provider=provider, redirect_uri=redirect_uri
)
# Now redirect your user to sign_in_url
The user will be redirected back to your app's configured redirect uri after they have completed the steps on
the data provider's OAuth consent page. There will be a signin code present in the code
parameter of the query string e.g.
https://your-redirect-page.com?code=abcde12345
.
This code may be exchanged for a user access token (which the user may use to issue queries against their data):
# Exchange signin code for a Redactive ID token
response = await client.exchange_tokens(code="SIGNIN-CODE")
access_token = response.idToken
Once a user has completed the OAuth flow, the data source should show up in their connected data sources:
response = await client.list_connections(
access_token=access_token
)
assert "confluence" in response.connections # ✅
Use the list_connections
method to keep your user's connection status up to date, and provide mechanisms to re-connect data sources.
SearchClient
With a Redactive access_token
, you can perform two types of search
Query-based Search
Retrieve relevant chunks of information that are related to a user query.
from redactive.search_client import SearchClient
client = SearchClient()
# Semantic Search: retrieve text extracts (chunks) from various documents pertaining to the user query
client.search_chunks(
access_token=access_token,
query="Tell me about AI"
)
Filters may be applied to query-based search operations. At present, the following fields may be provided as filter predicates:
message Filters {
// Scope of the query. This may either be the name of a provider, or a subspace of documents.
// Subspaces take the form of <provider>://<tenancy>/<path>
// e.g. for Confluence: 'confluence://redactiveai.atlassian.net/Engineering/Engineering Onboarding Guide'
// for Sharepoint: 'sharepoint://redactiveai.sharepoint.com/Shared Documents/Engineering/Onboarding Guide.pdf'
repeated string scope = 1;
// Timespan of response chunk's creation
optional TimeSpan created = 2;
// Timespan of response chunk's last modification
optional TimeSpan modified = 3;
// List of user emails associated with response chunk
repeated string user_emails = 4;
// Include content from documents in trash
optional bool include_content_in_trash = 5;
}
The query will only return results which match ALL filter predicates i.e. if multiple fields are populated in the filter object, the resulting filter is the logical 'AND' of all the fields. If a data source provider does not support a filter-type, then no results from that provider are returned.
Filters may be populated and provided to a query in the following way for the Python SDK:
from datetime import datetime, timedelta
from redactive.search_client import SearchClient
from redactive.grpc.v2 import Filters
client = SearchClient()
# Query chunks from Confluence only, that are from documents created before last week, modified since last week,
# and that are from documents associated with a user's email. Include chunks from trashed documents.
last_week = datetime.now() - timedelta(weeks=1)
filters = Filters().from_dict({
"scope": ["confluence"],
"created": {
"before": last_week,
},
"modified": {
"after": last_week,
},
"userEmails": ["myEmail@example.com"],
"includeContentInTrash": True,
})
client.search_chunks(
access_token="REDACTIVE-USER-ACCESS-TOKEN",
semantic_query="Tell me about AI",
filters=filters
)
Document Fetch
Obtain all the chunks from a specific document by specifying a unique reference (i.e. a URL).
# URL-based Search: retrieve all chunks of the document at that URL
client.get_document(
access_token="REDACTIVE-USER-ACCESS-TOKEN",
ref="https://example.com/document"
)
Multi-User Client
The MultiUserClient
class helps manage multiple users' authentication and access to the Redactive search service.
from redactive.multi_user_client import MultiUserClient
multi_user_client = MultiUserClient(
api_key="REDACTIVE-API-KEY",
callback_uri="https://example.com/callback/",
read_user_data=...,
write_user_data=...,
)
# Present `connection_url` in browser for user to interact with:
user_id = ...
connection_url = await multi_user_client.get_begin_connection_url(user_id=user_id, provider="confluence")
# On user return from OAuth connection flow:
sign_in_code, state = ..., ... # from URL query parameters
is_connection_successful = await multi_user_client.handle_connection_callback(
user_id=user_id,
sign_in_code=sign_in_code,
state=state
)
# User can now use Redactive search service via `MultiUserClient`'s other methods:
query = "Tell me about the missing research vessel, the Borealis"
chunks = await multi_user_client.search_chunks(user_id=user_id, query=query)
Development
The Python SDK code can be found thesdks/python
directory in Redactive Github Repository.
In order to comply with the repository style guide, we recommend running the following tools.
To format your code, run:
hatch fmt
To check type, run:
hatch run types:check
To test changes, run:
hatch test
To build Python SDK, run:
hatch build
To install local version, run:
python -m pip install -e .
Contribution Guide
Please check here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file redactive-2.0.0.tar.gz
.
File metadata
- Download URL: redactive-2.0.0.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ca6e61d99bf8262250c8391b57f2dbc8e0d19401967cfa673111de347321ede |
|
MD5 | 33663c57f8b797be839071bd6452fb1a |
|
BLAKE2b-256 | a5b21c04b6baa23bf336af331b42dab8d2a2735831fc9425c5dec0e6ffef906c |
File details
Details for the file redactive-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: redactive-2.0.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a26d7c4b430b855602c24699e3f9b11bcb267ce23c3cc38e6b0af8c406d513d3 |
|
MD5 | 5f0326c52c27844fa0951ec04a2b6b9e |
|
BLAKE2b-256 | cca9cbc9a2f0cc068f62783f997ff5ba1a81b913bfb03fb4a60b5902073f8e2c |