Skip to main content

llama-index readers airbyte_cdk integration

Project description

Airbyte CDK Loader

pip install llama-index-readers-airbyte-cdk

The Airbyte CDK Loader is a shim for sources created using the Airbyte Python CDK. It allows you to load data from any Airbyte source into LlamaIndex.

Installation

  • Install llama-index reader: pip install llama-index-readers-airbyte-cdk
  • Install airbyte-cdk: pip install airbyte-cdk
  • Install a source via git (or implement your own): pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github

Usage

Implement and import your own source. You can find lots of resources for how to achieve this on the Airbyte documentation page.

Here's an example usage of the AirbyteCdkReader.

from llama_index.readers.airbyte_cdk import AirbyteCDKReader
from source_github.source import (
    SourceGithub,
)  # this is just an example, you can use any source here - this one is loaded from the Airbyte Github repo via pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github`


github_config = {
    # ...
}
reader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)
documents = reader.load_data(stream_name="issues")

By default all fields are stored as metadata in the documents and the text is set to the JSON representation of all the fields. Construct the text of the document by passing a record_handler to the reader:

def handle_record(record, id):
    return Document(
        doc_id=id, text=record.data["title"], extra_info=record.data
    )


reader = AirbyteCDKReader(
    source_class=SourceGithub,
    config=github_config,
    record_handler=handle_record,
)

Lazy loads

The reader.load_data endpoint will collect all documents and return them as a list. If there are a large number of documents, this can cause issues. By using reader.lazy_load_data instead, an iterator is returned which can be consumed document by document without the need to keep all documents in memory.

Incremental loads

If a stream supports it, this loader can be used to load data incrementally (only returning documents that weren't loaded last time or got updated in the meantime):

reader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)
documents = reader.load_data(stream_name="issues")
current_state = reader.last_state  # can be pickled away or stored otherwise

updated_documents = reader.load_data(
    stream_name="issues", state=current_state
)  # only loads documents that were updated since last time

This loader is designed to be used as a way to load data into LlamaIndex.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_airbyte_cdk-0.5.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_airbyte_cdk-0.5.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_airbyte_cdk-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_airbyte_cdk-0.5.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_airbyte_cdk-0.5.0.tar.gz
Algorithm Hash digest
SHA256 6ee3e6937c5edca7921a39a0d8371fd6e5468d0b687ce4432d79587dfeb15cf2
MD5 b42d73ac7ae9fca9e55939197bc451dd
BLAKE2b-256 dbbe9925254809d4d65555da7e6314c95d91593b23f8690fc65ccfcc06775497

See more details on using hashes here.

File details

Details for the file llama_index_readers_airbyte_cdk-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_airbyte_cdk-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_airbyte_cdk-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50eb84eeed16d96ee3f1982590f8fd90a9cda07be19e6c2498a83ef6be05d695
MD5 51759db90c911fcb6f2048a3464f71ce
BLAKE2b-256 9849faa1ecf7b74519c2f55bf27496bbea68770bca64964a990bc44b34af25d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page