Skip to main content

Client for Carbon

Project description

Visit Carbon

Carbon

Connect external data to LLMs, no matter the source.

PyPI README.md

Table of Contents

Requirements

Python >=3.7

Installation

pip install carbon-python-sdk==0.1.18

Getting Started

from carbon import Carbon

# 1) Get an access token for a customer
carbon = Carbon(
    api_key="YOUR_API_KEY",
    customer_id="YOUR_CUSTOMER_ID",
)

token = carbon.auth.get_access_token()

# 2) Use the access token to authenticate moving forward
carbon = Carbon(access_token=token.access_token)

# use SDK as usual
white_labeling = carbon.auth.get_white_labeling()
# etc.

Async

async support is available by prepending a to any method.

import asyncio
from pprint import pprint
from carbon import Carbon, ApiException

carbon = Carbon(
    access_token="YOUR_API_KEY",
    api_key="YOUR_API_KEY",
    customer_id="YOUR_API_KEY",
)


async def main():
    try:
        # Get Access Token
        get_access_token_response = await carbon.auth.aget_access_token()
        print(get_access_token_response)
    except ApiException as e:
        print("Exception when calling AuthApi.get_access_token: %s\n" % e)
        pprint(e.body)
        if e.status == 422:
            pprint(e.body["detail"])
        pprint(e.headers)
        pprint(e.status)
        pprint(e.reason)
        pprint(e.round_trip_time)


asyncio.run(main())

Raw HTTP Response

To access raw HTTP response values, use the .raw namespace.

from pprint import pprint
from carbon import Carbon, ApiException

carbon = Carbon(
    access_token="YOUR_API_KEY",
    api_key="YOUR_API_KEY",
    customer_id="YOUR_API_KEY",
)

try:
    # Get Access Token
    get_access_token_response = carbon.auth.raw.get_access_token()
    pprint(get_access_token_response.body)
    pprint(get_access_token_response.body["access_token"])
    pprint(get_access_token_response.body["refresh_token"])
    pprint(get_access_token_response.headers)
    pprint(get_access_token_response.status)
    pprint(get_access_token_response.round_trip_time)
except ApiException as e:
    print("Exception when calling AuthApi.get_access_token: %s\n" % e)
    pprint(e.body)
    if e.status == 422:
        pprint(e.body["detail"])
    pprint(e.headers)
    pprint(e.status)
    pprint(e.reason)
    pprint(e.round_trip_time)

Reference

carbon.auth.get_access_token

Get Access Token

๐Ÿ› ๏ธ Usage

get_access_token_response = carbon.auth.get_access_token()

๐Ÿ”„ Return

TokenResponse

๐ŸŒ Endpoint

/auth/v1/access_token get

๐Ÿ”™ Back to Table of Contents


carbon.auth.get_white_labeling

Returns whether or not the organization is white labeled and which integrations are white labeled

:param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse

๐Ÿ› ๏ธ Usage

get_white_labeling_response = carbon.auth.get_white_labeling()

๐Ÿ”„ Return

WhiteLabelingResponse

๐ŸŒ Endpoint

/auth/v1/white_labeling get

๐Ÿ”™ Back to Table of Contents


carbon.data_sources.query_user_data_sources

User Data Sources

๐Ÿ› ๏ธ Usage

query_user_data_sources_response = carbon.data_sources.query_user_data_sources(
    pagination={
        "limit": 10,
        "offset": 0,
    },
    order_by="created_at",
    order_dir="desc",
    filters={
        "source": "GOOGLE_DRIVE",
    },
)

โš™๏ธ Parameters

pagination: Pagination
order_by: OrganizationUserDataSourceOrderByColumns
order_dir: OrderDir
filters: OrganizationUserDataSourceFilters

โš™๏ธ Request Body

OrganizationUserDataSourceQueryInput

๐Ÿ”„ Return

OrganizationUserDataSourceResponse

๐ŸŒ Endpoint

/user_data_sources post

๐Ÿ”™ Back to Table of Contents


carbon.data_sources.revoke_access_token

Revoke Access Token

๐Ÿ› ๏ธ Usage

revoke_access_token_response = carbon.data_sources.revoke_access_token(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: int

โš™๏ธ Request Body

RevokeAccessTokenInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/revoke_access_token post

๐Ÿ”™ Back to Table of Contents


carbon.embeddings.get_documents

For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2 and tags are specified, tags is ignored. tags_v2 enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:

{
    "OR": [
        {
            "key": "subject",
            "value": "holy-bible",
            "negate": false
        },
        {
            "key": "person-of-interest",
            "value": "jesus christ",
            "negate": false
        },
        {
            "key": "genre",
            "value": "religion",
            "negate": true
        }
        {
            "AND": [
                {
                    "key": "subject",
                    "value": "tao-te-ching",
                    "negate": false
                },
                {
                    "key": "author",
                    "value": "lao-tzu",
                    "negate": false
                }
            ]
        }
    ]
}

In this case, files will be filtered such that:

  1. "subject" = "holy-bible" OR
  2. "person-of-interest" = "jesus christ" OR
  3. "genre" != "religion" OR
  4. "subject" = "tao-te-ching" AND "author" = "lao-tzu"

Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:

  1. "key" isn't optional and must be a string
  2. "value" isn't optional and can be any or list[any]
  3. "negate" is optional and must be true or false. If present and true, then the filter block is negated in the resulting query. It is false by default.

When querying embeddings, you can optionally specify the media_type parameter in your request. By default (if not set), it is equal to "TEXT". This means that the query will be performed over files that have been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE", the query will be performed over image files (for now, .jpg and .png files). You can think of this field as an additional filter on top of any filters set in file_ids and

When hybrid_search is set to true, a combination of keyword search and semantic search are used to rank and select candidate embeddings during information retrieval. By default, these search methods are weighted equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use the hybrid_search_tuning_parameters property. The description for the different tuning parameters are:

  • weight_a: weight to assign to semantic search
  • weight_b: weight to assign to keyword search

You must ensure that sum(weight_a, weight_b,..., weight_n) for all n weights is equal to 1. The equality has an error tolerance of 0.001 to account for possible floating point issues.

In order to use hybrid search for a customer across a set of documents, two flags need to be enabled:

  1. Use the /modify_user_configuration endpoint to to enable sparse_vectors for the customer. The payload body for this request is below:
{
  "configuration_key_name": "sparse_vectors",
  "value": {
    "enabled": true
  }
}
  1. Make sure hybrid search is enabled for the documents across which you want to perform the search. For the /uploadfile endpoint, this can be done by setting the following query parameter: generate_sparse_vectors=true

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

๐Ÿ› ๏ธ Usage

get_documents_response = carbon.embeddings.get_documents(
    query="a",
    k=1,
    tags={
        "key": "string_example",
    },
    query_vector=[3.14],
    file_ids=[1],
    parent_file_ids=[1],
    tags_v2={},
    include_tags=True,
    include_vectors=True,
    include_raw_file=True,
    hybrid_search=True,
    hybrid_search_tuning_parameters={
        "weight_a": 0.5,
        "weight_b": 0.5,
    },
    media_type="TEXT",
    embedding_model="OPENAI",
)

โš™๏ธ Parameters

query: str

Query for which to get related chunks and embeddings.

k: int

Number of related chunks to return.

tags: GetEmbeddingDocumentsBodyTags
query_vector: GetEmbeddingDocumentsBodyQueryVector
file_ids: GetEmbeddingDocumentsBodyFileIds
parent_file_ids: GetEmbeddingDocumentsBodyParentFileIds
tags_v2: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]

A set of tags to limit the search to. Use this instead of tags, which is deprecated.

include_tags: Optional[bool]

Flag to control whether or not to include tags for each chunk in the response.

include_vectors: Optional[bool]

Flag to control whether or not to include embedding vectors in the response.

include_raw_file: Optional[bool]

Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response.

hybrid_search: Optional[bool]

Flag to control whether or not to perform hybrid search.

hybrid_search_tuning_parameters: HybridSearchTuningParamsNullable
media_type: FileContentTypesNullable
embedding_model: EmbeddingGeneratorsNullable

โš™๏ธ Request Body

GetEmbeddingDocumentsBody

๐Ÿ”„ Return

DocumentResponseList

๐ŸŒ Endpoint

/embeddings post

๐Ÿ”™ Back to Table of Contents


carbon.embeddings.get_embeddings_and_chunks

Retrieve Embeddings And Content

๐Ÿ› ๏ธ Usage

get_embeddings_and_chunks_response = carbon.embeddings.get_embeddings_and_chunks(
    filters={
        "user_file_id": 1,
        "embedding_model": "OPENAI",
    },
    pagination={
        "limit": 10,
        "offset": 0,
    },
    order_by="created_at",
    order_dir="desc",
    include_vectors=False,
)

โš™๏ธ Parameters

filters: EmbeddingsAndChunksFilters
pagination: Pagination
order_by: EmbeddingsAndChunksOrderByColumns
order_dir: OrderDir
include_vectors: bool

โš™๏ธ Request Body

EmbeddingsAndChunksQueryInput

๐Ÿ”„ Return

EmbeddingsAndChunksResponse

๐ŸŒ Endpoint

/text_chunks post

๐Ÿ”™ Back to Table of Contents


carbon.embeddings.upload_chunks_and_embeddings

Upload Chunks And Embeddings

๐Ÿ› ๏ธ Usage

upload_chunks_and_embeddings_response = carbon.embeddings.upload_chunks_and_embeddings(
    embedding_model="OPENAI",
    chunks_and_embeddings=[
        {
            "file_id": 1,
            "chunks_and_embeddings": [
                {
                    "chunk_number": 1,
                    "chunk": "chunk_example",
                }
            ],
        }
    ],
    overwrite_existing=False,
    chunks_only=False,
    custom_credentials={},
)

โš™๏ธ Parameters

embedding_model: EmbeddingGenerators
chunks_and_embeddings: List[SingleChunksAndEmbeddingsUploadInput]
overwrite_existing: bool
chunks_only: bool
custom_credentials: Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]

โš™๏ธ Request Body

ChunksAndEmbeddingsUploadInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/upload_chunks_and_embeddings post

๐Ÿ”™ Back to Table of Contents


carbon.files.create_user_file_tags

A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used:

  • db_embedding_id
  • organization_id
  • user_id
  • organization_user_file_id

Carbon currently supports two data types for tag values - string and list<string>. Keys can only be string. If values other than string and list<string> are used, they're automatically converted to strings (e.g. 4 will become "4").

๐Ÿ› ๏ธ Usage

create_user_file_tags_response = carbon.files.create_user_file_tags(
    tags={
        "key": "string_example",
    },
    organization_user_file_id=1,
)

โš™๏ธ Parameters

tags: OrganizationUserFileTagCreateTags
organization_user_file_id: int

โš™๏ธ Request Body

OrganizationUserFileTagCreate

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/create_user_file_tags post

๐Ÿ”™ Back to Table of Contents


carbon.files.delete

Delete File Endpoint

๐Ÿ› ๏ธ Usage

delete_response = carbon.files.delete(
    file_id=1,
)

โš™๏ธ Parameters

file_id: int

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/deletefile/{file_id} delete

๐Ÿ”™ Back to Table of Contents


carbon.files.delete_file_tags

Delete File Tags

๐Ÿ› ๏ธ Usage

delete_file_tags_response = carbon.files.delete_file_tags(
    tags=["string_example"],
    organization_user_file_id=1,
)

โš™๏ธ Parameters

tags: OrganizationUserFileTagsRemoveTags
organization_user_file_id: int

โš™๏ธ Request Body

OrganizationUserFileTagsRemove

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/delete_user_file_tags post

๐Ÿ”™ Back to Table of Contents


carbon.files.delete_many

Delete Files Endpoint

๐Ÿ› ๏ธ Usage

delete_many_response = carbon.files.delete_many(
    file_ids=[1],
    sync_statuses=["string_example"],
    delete_non_synced_only=False,
    send_webhook=False,
    delete_child_files=False,
)

โš™๏ธ Parameters

file_ids: DeleteFilesQueryInputFileIds
sync_statuses: List[ExternalFileSyncStatuses]
delete_non_synced_only: bool
send_webhook: bool
delete_child_files: bool

โš™๏ธ Request Body

DeleteFilesQueryInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/delete_files post

๐Ÿ”™ Back to Table of Contents


carbon.files.delete_v2

Delete Files V2 Endpoint

๐Ÿ› ๏ธ Usage

delete_v2_response = carbon.files.delete_v2(
    filters={
        "include_all_children": False,
        "non_synced_only": False,
    },
    send_webhook=False,
)

โš™๏ธ Parameters

filters: OrganizationUserFilesToSyncFilters
send_webhook: bool

โš™๏ธ Request Body

DeleteFilesV2QueryInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/delete_files_v2 post

๐Ÿ”™ Back to Table of Contents


carbon.files.get_parsed_file

This route is deprecated. Use /user_files_v2 instead.

๐Ÿ› ๏ธ Usage

get_parsed_file_response = carbon.files.get_parsed_file(
    file_id=1,
)

โš™๏ธ Parameters

file_id: int

๐Ÿ”„ Return

PresignedURLResponse

๐ŸŒ Endpoint

/parsed_file/{file_id} get

๐Ÿ”™ Back to Table of Contents


carbon.files.get_raw_file

This route is deprecated. Use /user_files_v2 instead.

๐Ÿ› ๏ธ Usage

get_raw_file_response = carbon.files.get_raw_file(
    file_id=1,
)

โš™๏ธ Parameters

file_id: int

๐Ÿ”„ Return

PresignedURLResponse

๐ŸŒ Endpoint

/raw_file/{file_id} get

๐Ÿ”™ Back to Table of Contents


carbon.files.query_user_files

For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2 and tags are specified, tags is ignored. tags_v2 enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:

{
    "OR": [
        {
            "key": "subject",
            "value": "holy-bible",
            "negate": false
        },
        {
            "key": "person-of-interest",
            "value": "jesus christ",
            "negate": false
        },
        {
            "key": "genre",
            "value": "religion",
            "negate": true
        }
        {
            "AND": [
                {
                    "key": "subject",
                    "value": "tao-te-ching",
                    "negate": false
                },
                {
                    "key": "author",
                    "value": "lao-tzu",
                    "negate": false
                }
            ]
        }
    ]
}

In this case, files will be filtered such that:

  1. "subject" = "holy-bible" OR
  2. "person-of-interest" = "jesus christ" OR
  3. "genre" != "religion" OR
  4. "subject" = "tao-te-ching" AND "author" = "lao-tzu"

Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:

  1. "key" isn't optional and must be a string
  2. "value" isn't optional and can be any or list[any]
  3. "negate" is optional and must be true or false. If present and true, then the filter block is negated in the resulting query. It is false by default.

๐Ÿ› ๏ธ Usage

query_user_files_response = carbon.files.query_user_files(
    pagination={
        "limit": 10,
        "offset": 0,
    },
    order_by="created_at",
    order_dir="desc",
    filters={
        "include_all_children": False,
        "non_synced_only": False,
    },
    include_raw_file=True,
    include_parsed_text_file=True,
    include_additional_files=True,
)

โš™๏ธ Parameters

pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: Optional[bool]
include_parsed_text_file: Optional[bool]
include_additional_files: Optional[bool]

โš™๏ธ Request Body

OrganizationUserFilesToSyncQueryInput

๐Ÿ”„ Return

UserFilesV2

๐ŸŒ Endpoint

/user_files_v2 post

๐Ÿ”™ Back to Table of Contents


carbon.files.query_user_files_deprecated

This route is deprecated. Use /user_files_v2 instead.

๐Ÿ› ๏ธ Usage

query_user_files_deprecated_response = carbon.files.query_user_files_deprecated(
    pagination={
        "limit": 10,
        "offset": 0,
    },
    order_by="created_at",
    order_dir="desc",
    filters={
        "include_all_children": False,
        "non_synced_only": False,
    },
    include_raw_file=True,
    include_parsed_text_file=True,
    include_additional_files=True,
)

โš™๏ธ Parameters

pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: Optional[bool]
include_parsed_text_file: Optional[bool]
include_additional_files: Optional[bool]

โš™๏ธ Request Body

OrganizationUserFilesToSyncQueryInput

๐Ÿ”„ Return

FilesQueryUserFilesDeprecatedResponse

๐ŸŒ Endpoint

/user_files post

๐Ÿ”™ Back to Table of Contents


carbon.files.resync

Resync File

๐Ÿ› ๏ธ Usage

resync_response = carbon.files.resync(
    file_id=1,
    chunk_size=1,
    chunk_overlap=1,
    force_embedding_generation=False,
)

โš™๏ธ Parameters

file_id: int
chunk_size: Optional[int]
chunk_overlap: Optional[int]
force_embedding_generation: bool

โš™๏ธ Request Body

ResyncFileQueryInput

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/resync_file post

๐Ÿ”™ Back to Table of Contents


carbon.files.upload

This endpoint is used to directly upload local files to Carbon. The POST request should be a multipart form request. Note that the set_page_as_boundary query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters:

  • chunk_size: the chunk size (in tokens) applied when splitting the document
  • chunk_overlap: the chunk overlap (in tokens) applied when splitting the document
  • skip_embedding_generation: whether or not to skip the generation of chunks and embeddings
  • set_page_as_boundary: described above
  • embedding_model: the model used to generate embeddings for the document chunks
  • use_ocr: whether or not to use OCR as a preprocessing step prior to generating chunks (only valid for PDFs currently)
  • generate_sparse_vectors: whether or not to generate sparse vectors for the file. Required for hybrid search.
  • prepend_filename_to_chunks: whether or not to prepend the filename to the chunk text

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

๐Ÿ› ๏ธ Usage

upload_response = carbon.files.upload(
    file=open("/path/to/file", "rb"),
    chunk_size=1,
    chunk_overlap=1,
    skip_embedding_generation=False,
    set_page_as_boundary=False,
    embedding_model="OPENAI",
    use_ocr=False,
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

file: IO
chunk_size: Optional[int]

Chunk size in tiktoken tokens to be used when processing file.

chunk_overlap: Optional[int]

Chunk overlap in tiktoken tokens to be used when processing file.

skip_embedding_generation: bool

Flag to control whether or not embeddings should be generated and stored when processing file.

set_page_as_boundary: bool

Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.

embedding_model: TextEmbeddingGenerators

Embedding model that will be used to embed file chunks.

use_ocr: bool

Whether or not to use OCR when processing files. Only valid for PDFs. Useful for documents with tables, images, and/or scanned text.

generate_sparse_vectors: bool

Whether or not to generate sparse vectors for the file. This is required for the file to be a candidate for hybrid search.

prepend_filename_to_chunks: bool

Whether or not to prepend the file's name to chunks.

max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

parse_pdf_tables_with_ocr: bool

Whether to use rich table parsing when use_ocr is enabled.

โš™๏ธ Request Body

BodyCreateUploadFileUploadfilePost

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/uploadfile post

๐Ÿ”™ Back to Table of Contents


carbon.files.upload_from_url

Create Upload File From Url

๐Ÿ› ๏ธ Usage

upload_from_url_response = carbon.files.upload_from_url(
    url="string_example",
    file_name="string_example",
    chunk_size=1,
    chunk_overlap=1,
    skip_embedding_generation=False,
    set_page_as_boundary=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    use_textract=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

url: str
file_name: Optional[str]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: bool
set_page_as_boundary: bool
embedding_model: EmbeddingGenerators
generate_sparse_vectors: bool
use_textract: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

parse_pdf_tables_with_ocr: bool

โš™๏ธ Request Body

UploadFileFromUrlInput

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/upload_file_from_url post

๐Ÿ”™ Back to Table of Contents


carbon.files.upload_text

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

๐Ÿ› ๏ธ Usage

upload_text_response = carbon.files.upload_text(
    contents="string_example",
    name="string_example",
    chunk_size=1,
    chunk_overlap=1,
    skip_embedding_generation=False,
    overwrite_file_id=1,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
)

โš™๏ธ Parameters

contents: str
name: Optional[str]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: bool
overwrite_file_id: Optional[int]
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: Optional[bool]

โš™๏ธ Request Body

RawTextInput

๐Ÿ”„ Return

UserFile

๐ŸŒ Endpoint

/upload_text post

๐Ÿ”™ Back to Table of Contents


carbon.health.check

Health

๐Ÿ› ๏ธ Usage

check_response = carbon.health.check()

๐ŸŒ Endpoint

/health get

๐Ÿ”™ Back to Table of Contents


carbon.integrations.connect_data_source

Connect Data Source

๐Ÿ› ๏ธ Usage

connect_data_source_response = carbon.integrations.connect_data_source(
    authentication={
        "source": "GOOGLE_DRIVE",
        "access_token": "access_token_example",
    },
    sync_options={
        "chunk_size": 1500,
        "chunk_overlap": 20,
        "skip_embedding_generation": False,
        "embedding_model": "OPENAI",
        "generate_sparse_vectors": False,
        "prepend_filename_to_chunks": False,
        "sync_files_on_connection": True,
        "set_page_as_boundary": False,
    },
)

โš™๏ธ Parameters

authentication: Union[OAuthAuthentication, NotionAuthentication, SharepointAuthentication, ConfluenceAuthentication, ZendeskAuthentication, ZoteroAuthentication, GitbookAuthetication, SalesforceAuthentication, FreskdeskAuthentication, S3Authentication]
sync_options: SyncOptions

โš™๏ธ Request Body

ConnectDataSourceInput

๐Ÿ”„ Return

ConnectDataSourceResponse

๐ŸŒ Endpoint

/integrations/connect post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.connect_freshdesk

Refer this article to obtain an API key https://support.freshdesk.com/en/support/solutions/articles/215517. Make sure that your API key has the permission to read solutions from your account and you are on a paid plan. Once you have an API key, you can make a request to this endpoint along with your freshdesk domain. This will trigger an automatic sync of the articles in your "solutions" tab. Additional parameters below can be used to associate data with the synced articles or modify the sync behavior.

๐Ÿ› ๏ธ Usage

connect_freshdesk_response = carbon.integrations.connect_freshdesk(
    domain="string_example",
    api_key="string_example",
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    sync_files_on_connection=True,
    request_id="string_example",
)

โš™๏ธ Parameters

domain: str
api_key: str
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
sync_files_on_connection: Optional[bool]
request_id: Optional[str]

โš™๏ธ Request Body

FreshDeskConnectRequest

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/freshdesk post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.connect_gitbook

You will need an access token to connect your Gitbook account. Note that the permissions will be defined by the user generating access token so make sure you have the permission to access spaces you will be syncing. Refer this article for more details https://developer.gitbook.com/gitbook-api/authentication. Additionally, you need to specify the name of organization you will be syncing data from.

๐Ÿ› ๏ธ Usage

connect_gitbook_response = carbon.integrations.connect_gitbook(
    organization="string_example",
    access_token="string_example",
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    sync_files_on_connection=True,
    request_id="string_example",
)

โš™๏ธ Parameters

organization: str
access_token: str
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
sync_files_on_connection: Optional[bool]
request_id: Optional[str]

โš™๏ธ Request Body

GitbookConnectRequest

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/gitbook post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.create_aws_iam_user

Create a new IAM user with permissions to:

  1. List all buckets.
  2. Read from the specific buckets and objects to sync with Carbon. Ensure any future buckets or objects carry the same permissions.
Once created, generate an access key for this user and share the credentials with us. We recommend testing this key beforehand.

๐Ÿ› ๏ธ Usage

create_aws_iam_user_response = carbon.integrations.create_aws_iam_user(
    access_key="string_example",
    access_key_secret="string_example",
)

โš™๏ธ Parameters

access_key: str
access_key_secret: str

โš™๏ธ Request Body

S3AuthRequest

๐Ÿ”„ Return

OrganizationUserDataSourceAPI

๐ŸŒ Endpoint

/integrations/s3 post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.get_oauth_url

This endpoint can be used to generate the following URLs

  • An OAuth URL for OAuth based connectors
  • A file syncing URL which skips the OAuth flow if the user already has a valid access token and takes them to the success state.

๐Ÿ› ๏ธ Usage

get_oauth_url_response = carbon.integrations.get_oauth_url(
    service="GOOGLE_DRIVE",
    tags=None,
    scope="string_example",
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    zendesk_subdomain="string_example",
    microsoft_tenant="string_example",
    sharepoint_site_name="string_example",
    confluence_subdomain="string_example",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    salesforce_domain="string_example",
    sync_files_on_connection=True,
    set_page_as_boundary=False,
    data_source_id=1,
    connecting_new_account=False,
    request_id="string_example",
    use_ocr=False,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

service: DataSourceType
tags: Union[bool, date, datetime, dict, float, int, list, str, None]
scope: Optional[str]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGeneratorsNullable
zendesk_subdomain: Optional[str]
microsoft_tenant: Optional[str]
sharepoint_site_name: Optional[str]
confluence_subdomain: Optional[str]
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

salesforce_domain: Optional[str]
sync_files_on_connection: Optional[bool]

Used to specify whether Carbon should attempt to sync all your files automatically when authorization is complete. This is only supported for a subset of connectors and will be ignored for the rest. Supported connectors: Intercom, Zendesk, Gitbook, Confluence, Salesforce, Freshdesk

set_page_as_boundary: bool
data_source_id: Optional[int]

Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.

connecting_new_account: Optional[bool]

Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.

request_id: Optional[str]

This request id will be added to all files that get synced using the generated OAuth URL

use_ocr: Optional[bool]

Enable OCR for files that support it. Supported formats: pdf

parse_pdf_tables_with_ocr: Optional[bool]

โš™๏ธ Request Body

OAuthURLRequest

๐Ÿ”„ Return

OuthURLResponse

๐ŸŒ Endpoint

/integrations/oauth_url post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_confluence_pages

To begin listing a user's Confluence pages, at least a data_source_id of a connected Confluence account must be specified. This base request returns a list of root pages for every space the user has access to in a Confluence instance. To traverse further down the user's page directory, additional requests to this endpoint can be made with the same data_source_id and with parent_id set to the id of page from a previous request. For convenience, the has_children property in each directory item in the response list will flag which pages will return non-empty lists of pages when set as the parent_id.

๐Ÿ› ๏ธ Usage

list_confluence_pages_response = carbon.integrations.list_confluence_pages(
    data_source_id=1,
    parent_id="string_example",
)

โš™๏ธ Parameters

data_source_id: int
parent_id: Optional[str]

โš™๏ธ Request Body

ListRequest

๐Ÿ”„ Return

ListResponse

๐ŸŒ Endpoint

/integrations/confluence/list post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_data_source_items

List Data Source Items

๐Ÿ› ๏ธ Usage

list_data_source_items_response = carbon.integrations.list_data_source_items(
    data_source_id=1,
    parent_id="string_example",
    filters={},
    pagination={
        "limit": 10,
        "offset": 0,
    },
)

โš™๏ธ Parameters

data_source_id: int
parent_id: Optional[str]
filters: ListItemsFiltersNullable
pagination: Pagination

โš™๏ธ Request Body

ListDataSourceItemsRequest

๐Ÿ”„ Return

ListDataSourceItemsResponse

๐ŸŒ Endpoint

/integrations/items/list post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_folders

After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook. This includes both system folders like "inbox" and user created folders.

๐Ÿ› ๏ธ Usage

list_folders_response = carbon.integrations.list_folders(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: Optional[int]

๐ŸŒ Endpoint

/integrations/outlook/user_folders get

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_gitbook_spaces

After connecting your Gitbook account, you can use this endpoint to list all of your spaces under current organization.

๐Ÿ› ๏ธ Usage

list_gitbook_spaces_response = carbon.integrations.list_gitbook_spaces(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: int

๐ŸŒ Endpoint

/integrations/gitbook/spaces get

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_labels

After connecting your Gmail account, you can use this endpoint to list all of your labels. User created labels will have the type "user" and Gmail's default labels will have the type "system"

๐Ÿ› ๏ธ Usage

list_labels_response = carbon.integrations.list_labels(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: Optional[int]

๐ŸŒ Endpoint

/integrations/gmail/user_labels get

๐Ÿ”™ Back to Table of Contents


carbon.integrations.list_outlook_categories

After connecting your Outlook account, you can use this endpoint to list all of your categories on outlook. We currently support listing up to 250 categories.

๐Ÿ› ๏ธ Usage

list_outlook_categories_response = carbon.integrations.list_outlook_categories(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: Optional[int]

๐ŸŒ Endpoint

/integrations/outlook/user_categories get

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_confluence

After listing pages in a user's Confluence account, the set of selected page ids and the connected account's data_source_id can be passed into this endpoint to sync them into Carbon. Additional parameters listed below can be used to associate data to the selected pages or alter the behavior of the sync.

๐Ÿ› ๏ธ Usage

sync_confluence_response = carbon.integrations.sync_confluence(
    data_source_id=1,
    ids=["string_example"],
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    set_page_as_boundary=False,
    request_id="string_example",
    use_ocr=False,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

data_source_id: int
ids: Union[List[str], List[SyncFilesIds]]
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: bool
request_id: Optional[str]
use_ocr: Optional[bool]
parse_pdf_tables_with_ocr: Optional[bool]

โš™๏ธ Request Body

SyncFilesRequest

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/confluence/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_data_source_items

Sync Data Source Items

๐Ÿ› ๏ธ Usage

sync_data_source_items_response = carbon.integrations.sync_data_source_items(
    data_source_id=1,
)

โš™๏ธ Parameters

data_source_id: int

โš™๏ธ Request Body

SyncDirectoryRequest

๐Ÿ”„ Return

OrganizationUserDataSourceAPI

๐ŸŒ Endpoint

/integrations/items/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_files

After listing files and folders via /integrations/items/sync and integrations/items/list, use the selected items' external ids as the ids in this endpoint to sync them into Carbon. Sharepoint items take an additional parameter root_id, which identifies the drive the file or folder is in and is stored in root_external_id. That additional paramter is optional and excluding it will tell the sync to assume the item is stored in the default Documents drive.

๐Ÿ› ๏ธ Usage

sync_files_response = carbon.integrations.sync_files(
    data_source_id=1,
    ids=["string_example"],
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    set_page_as_boundary=False,
    request_id="string_example",
    use_ocr=False,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

data_source_id: int
ids: Union[List[str], List[SyncFilesIds]]
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: bool
request_id: Optional[str]
use_ocr: Optional[bool]
parse_pdf_tables_with_ocr: Optional[bool]

โš™๏ธ Request Body

SyncFilesRequest

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/files/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_gitbook

You can sync upto 20 Gitbook spaces at a time using this endpoint. Additional parameters below can be used to associate data with the synced pages or modify the sync behavior.

๐Ÿ› ๏ธ Usage

sync_gitbook_response = carbon.integrations.sync_gitbook(
    space_ids=["string_example"],
    data_source_id=1,
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    request_id="string_example",
)

โš™๏ธ Parameters

space_ids: GitbookSyncRequestSpaceIds
data_source_id: int
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
request_id: Optional[str]

โš™๏ธ Request Body

GitbookSyncRequest

๐ŸŒ Endpoint

/integrations/gitbook/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_gmail

Once you have successfully connected your gmail account, you can choose which emails to sync with us using the filters parameter. Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below.

label: Inbuilt Gmail labels, for example "Important" or a custom label you created.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period.
is: Can have the following values - starred, important, snoozed, and unread

Using keys or values outside of the specified values can lead to unexpected behaviour.

An example of a basic query with filters can be

{
    "filters": {
            "key": "label",
            "value": "Test"
        }
}

Which will list all emails that have the label "Test".

You can use AND and OR operation in the following way:

{
    "filters": {
        "AND": [
            {
                "key": "after",
                "value": "2024/01/07"
            },
            {
                "OR": [
                    {
                        "key": "label",
                        "value": "Personal"
                    },
                    {
                        "key": "is",
                        "value": "starred"
                    }
                ]
            }
        ]
    }
}

This will return emails after 7th of Jan that are either starred or have the label "Personal". Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.

๐Ÿ› ๏ธ Usage

sync_gmail_response = carbon.integrations.sync_gmail(
    filters={},
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    data_source_id=1,
    request_id="string_example",
    sync_attachments=False,
)

โš™๏ธ Parameters

filters: Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
data_source_id: Optional[int]
request_id: Optional[str]
sync_attachments: Optional[bool]

โš™๏ธ Request Body

GmailSyncInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/gmail/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_outlook

Once you have successfully connected your Outlook account, you can choose which emails to sync with us using the filters and folder parameter. "folder" should be the folder you want to sync from Outlook. By default we get messages from your inbox folder.
Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below.

category: Custom categories that you created in Outlook.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period.
is: Can have the following values: flagged

An example of a basic query with filters can be

{
    "filters": {
            "key": "category",
            "value": "Test"
        }
}

Which will list all emails that have the category "Test".

Specifying a custom folder in the same query

{
    "folder": "Folder Name",
    "filters": {
            "key": "category",
            "value": "Test"
        }
}

You can use AND and OR operation in the following way:

{
    "filters": {
        "AND": [
            {
                "key": "after",
                "value": "2024/01/07"
            },
            {
                "OR": [
                    {
                        "key": "category",
                        "value": "Personal"
                    },
                    {
                        "key": "category",
                        "value": "Test"
                    },
                ]
            }
        ]
    }
}

This will return emails after 7th of Jan that have either Personal or Test as category. Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.

๐Ÿ› ๏ธ Usage

sync_outlook_response = carbon.integrations.sync_outlook(
    filters={},
    tags={},
    folder="Inbox",
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    data_source_id=1,
    request_id="string_example",
    sync_attachments=False,
)

โš™๏ธ Parameters

filters: Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
folder: Optional[str]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
data_source_id: Optional[int]
request_id: Optional[str]
sync_attachments: Optional[bool]

โš™๏ธ Request Body

OutlookSyncInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/outlook/sync post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_rss_feed

Rss Feed

๐Ÿ› ๏ธ Usage

sync_rss_feed_response = carbon.integrations.sync_rss_feed(
    url="string_example",
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    request_id="string_example",
)

โš™๏ธ Parameters

url: str
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
request_id: Optional[str]

โš™๏ธ Request Body

RSSFeedInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/rss_feed post

๐Ÿ”™ Back to Table of Contents


carbon.integrations.sync_s3_files

After optionally loading the items via /integrations/items/sync and integrations/items/list, use the bucket name and object key as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior

๐Ÿ› ๏ธ Usage

sync_s3_files_response = carbon.integrations.sync_s3_files(
    ids=[{}],
    tags={},
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    embedding_model="OPENAI",
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    max_items_per_chunk=1,
    set_page_as_boundary=False,
    data_source_id=1,
    request_id="string_example",
    use_ocr=False,
    parse_pdf_tables_with_ocr=False,
)

โš™๏ธ Parameters

ids: List[S3GetFileInput]
tags: Optional[Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
embedding_model: EmbeddingGenerators
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
max_items_per_chunk: Optional[int]

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: bool
data_source_id: Optional[int]
request_id: Optional[str]
use_ocr: Optional[bool]
parse_pdf_tables_with_ocr: Optional[bool]

โš™๏ธ Request Body

S3FileSyncInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/integrations/s3/files post

๐Ÿ”™ Back to Table of Contents


carbon.organizations.get

Get Organization

๐Ÿ› ๏ธ Usage

get_response = carbon.organizations.get()

๐Ÿ”„ Return

OrganizationResponse

๐ŸŒ Endpoint

/organization get

๐Ÿ”™ Back to Table of Contents


carbon.users.delete

Delete Users

๐Ÿ› ๏ธ Usage

delete_response = carbon.users.delete(
    customer_ids=["string_example"],
)

โš™๏ธ Parameters

customer_ids: DeleteUsersInputCustomerIds

โš™๏ธ Request Body

DeleteUsersInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/delete_users post

๐Ÿ”™ Back to Table of Contents


carbon.users.get

User Endpoint

๐Ÿ› ๏ธ Usage

get_response = carbon.users.get(
    customer_id="string_example",
)

โš™๏ธ Parameters

customer_id: str

โš™๏ธ Request Body

UserRequestContent

๐Ÿ”„ Return

UserResponse

๐ŸŒ Endpoint

/user post

๐Ÿ”™ Back to Table of Contents


carbon.users.toggle_user_features

Toggle User Features

๐Ÿ› ๏ธ Usage

toggle_user_features_response = carbon.users.toggle_user_features(
    configuration_key_name="string_example",
    value={},
)

โš™๏ธ Parameters

configuration_key_name: str
value: Dict[str, Union[bool, date, datetime, dict, float, int, list, str, None]]

โš™๏ธ Request Body

ModifyUserConfigurationInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/modify_user_configuration post

๐Ÿ”™ Back to Table of Contents


carbon.users.update_users

Update Users

๐Ÿ› ๏ธ Usage

update_users_response = carbon.users.update_users(
    customer_ids=["string_example"],
    auto_sync_enabled_sources=["string_example"],
    file_upload_limit=1,
)

โš™๏ธ Parameters

customer_ids: UpdateUsersInputCustomerIds
auto_sync_enabled_sources: Union[List[DataSourceType], str]

List of data source types to enable auto sync for. Empty array will remove all sources and the string \"ALL\" will enable it for all data sources

file_upload_limit: Optional[int]

Custom file upload limit for the user. If set, then the user will not be allowed to upload more files than this limit

โš™๏ธ Request Body

UpdateUsersInput

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/update_users post

๐Ÿ”™ Back to Table of Contents


carbon.utilities.fetch_urls

Extracts all URLs from a webpage.

Args: url (str): URL of the webpage

Returns: FetchURLsResponse: A response object with a list of URLs extracted from the webpage and the webpage content.

๐Ÿ› ๏ธ Usage

fetch_urls_response = carbon.utilities.fetch_urls(
    url="url_example",
)

โš™๏ธ Parameters

url: str

๐Ÿ”„ Return

FetchURLsResponse

๐ŸŒ Endpoint

/fetch_urls get

๐Ÿ”™ Back to Table of Contents


carbon.utilities.fetch_youtube_transcripts

Fetches english transcripts from YouTube videos.

Args: id (str): The ID of the YouTube video. raw (bool): Whether to return the raw transcript or not. Defaults to False.

Returns: dict: A dictionary with the transcript of the YouTube video.

๐Ÿ› ๏ธ Usage

fetch_youtube_transcripts_response = carbon.utilities.fetch_youtube_transcripts(
    id="id_example",
    raw=False,
)

โš™๏ธ Parameters

id: str
raw: bool

๐Ÿ”„ Return

YoutubeTranscriptResponse

๐ŸŒ Endpoint

/fetch_youtube_transcript get

๐Ÿ”™ Back to Table of Contents


carbon.utilities.process_sitemap

Retrieves all URLs from a sitemap, which can subsequently be utilized with our web_scrape endpoint.

๐Ÿ› ๏ธ Usage

process_sitemap_response = carbon.utilities.process_sitemap(
    url="url_example",
)

โš™๏ธ Parameters

url: str

๐ŸŒ Endpoint

/process_sitemap get

๐Ÿ”™ Back to Table of Contents


carbon.utilities.scrape_sitemap

Extracts all URLs from a sitemap and performs a web scrape on each of them.

Args: sitemap_url (str): URL of the sitemap

Returns: dict: A response object with the status of the scraping job message.-->

๐Ÿ› ๏ธ Usage

scrape_sitemap_response = carbon.utilities.scrape_sitemap(
    url="string_example",
    tags={
        "key": "string_example",
    },
    max_pages_to_scrape=1,
    chunk_size=1500,
    chunk_overlap=20,
    skip_embedding_generation=False,
    enable_auto_sync=False,
    generate_sparse_vectors=False,
    prepend_filename_to_chunks=False,
    html_tags_to_skip=[],
    css_classes_to_skip=[],
    css_selectors_to_skip=[],
    embedding_model="OPENAI",
)

โš™๏ธ Parameters

url: str
tags: SitemapScrapeRequestTags
max_pages_to_scrape: Optional[int]
chunk_size: Optional[int]
chunk_overlap: Optional[int]
skip_embedding_generation: Optional[bool]
enable_auto_sync: Optional[bool]
generate_sparse_vectors: Optional[bool]
prepend_filename_to_chunks: Optional[bool]
html_tags_to_skip: SitemapScrapeRequestHtmlTagsToSkip
css_classes_to_skip: SitemapScrapeRequestCssClassesToSkip
css_selectors_to_skip: SitemapScrapeRequestCssSelectorsToSkip
embedding_model: EmbeddingGenerators

โš™๏ธ Request Body

SitemapScrapeRequest

๐ŸŒ Endpoint

/scrape_sitemap post

๐Ÿ”™ Back to Table of Contents


carbon.utilities.scrape_web

Conduct a web scrape on a given webpage URL. Our web scraper is fully compatible with JavaScript and supports recursion depth, enabling you to efficiently extract all content from the target website.

๐Ÿ› ๏ธ Usage

scrape_web_response = carbon.utilities.scrape_web(
    body=[
        {
            "url": "url_example",
            "recursion_depth": 3,
            "max_pages_to_scrape": 100,
            "chunk_size": 1500,
            "chunk_overlap": 20,
            "skip_embedding_generation": False,
            "enable_auto_sync": False,
            "generate_sparse_vectors": False,
            "prepend_filename_to_chunks": False,
            "html_tags_to_skip": [],
            "css_classes_to_skip": [],
            "css_selectors_to_skip": [],
            "embedding_model": "OPENAI",
        }
    ],
)

โš™๏ธ Request Body

UtilitiesScrapeWebRequest

๐ŸŒ Endpoint

/web_scrape post

๐Ÿ”™ Back to Table of Contents


carbon.utilities.search_urls

Perform a web search and obtain a list of relevant URLs.

As an illustration, when you perform a search for โ€œcontent related to MRNA,โ€ you will receive a list of links such as the following:

- https://tomrenz.substack.com/p/mrna-and-why-it-matters

- https://www.statnews.com/2020/11/10/the-story-of-mrna-how-a-once-dismissed-idea-became-a-leading-technology-in-the-covid-vaccine-race/

- https://www.statnews.com/2022/11/16/covid-19-vaccines-were-a-success-but-mrna-still-has-a-delivery-problem/

- https://joomi.substack.com/p/were-still-being-misled-about-how

Subsequently, you can submit these links to the web_scrape endpoint in order to retrieve the content of the respective web pages.

Args: query (str): Query to search for

Returns: FetchURLsResponse: A response object with a list of URLs for a given search query.

๐Ÿ› ๏ธ Usage

search_urls_response = carbon.utilities.search_urls(
    query="query_example",
)

โš™๏ธ Parameters

query: str

๐Ÿ”„ Return

FetchURLsResponse

๐ŸŒ Endpoint

/search_urls get

๐Ÿ”™ Back to Table of Contents


carbon.webhooks.add_url

Add Webhook Url

๐Ÿ› ๏ธ Usage

add_url_response = carbon.webhooks.add_url(
    url="string_example",
)

โš™๏ธ Parameters

url: str

โš™๏ธ Request Body

AddWebhookProps

๐Ÿ”„ Return

Webhook

๐ŸŒ Endpoint

/add_webhook post

๐Ÿ”™ Back to Table of Contents


carbon.webhooks.delete_url

Delete Webhook Url

๐Ÿ› ๏ธ Usage

delete_url_response = carbon.webhooks.delete_url(
    webhook_id=1,
)

โš™๏ธ Parameters

webhook_id: int

๐Ÿ”„ Return

GenericSuccessResponse

๐ŸŒ Endpoint

/delete_webhook/{webhook_id} delete

๐Ÿ”™ Back to Table of Contents


carbon.webhooks.urls

Webhook Urls

๐Ÿ› ๏ธ Usage

urls_response = carbon.webhooks.urls(
    pagination={
        "limit": 10,
        "offset": 0,
    },
    order_by="created_at",
    order_dir="desc",
    filters={
        "ids": [],
    },
)

โš™๏ธ Parameters

pagination: Pagination
order_by: WebhookOrderByColumns
order_dir: OrderDir
filters: WebhookFilters

โš™๏ธ Request Body

WebhookQueryInput

๐Ÿ”„ Return

WebhookQueryResponse

๐ŸŒ Endpoint

/webhooks post

๐Ÿ”™ Back to Table of Contents


Author

This Python package is automatically generated by Konfig

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carbon_python_sdk-0.1.18.tar.gz (266.1 kB view hashes)

Uploaded Source

Built Distribution

carbon_python_sdk-0.1.18-py3-none-any.whl (971.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page