Skip to main content

A framework for writing Airbyte Connectors.

Project description

Airbyte Python CDK and Low-Code CDK

Airbyte Python CDK is a framework for building Airbyte API Source Connectors. It provides a set of classes and helpers that make it easy to build a connector against an HTTP API (REST, GraphQL, etc), or a generic Python source connector.

Usage

If you're looking to build a connector, we highly recommend that you start with the Connector Builder. It should be enough for 90% connectors out there. For more flexible and complex connectors, use the low-code CDK and SourceDeclarativeManifest.

If that doesn't work, then consider building on top of the lower-level Python CDK itself.

Quick Start

To get started on a Python CDK based connector or a low-code connector, you can generate a connector project from a template:

# from the repo root
cd airbyte-integrations/connector-templates/generator
./generate.sh

Example Connectors

HTTP Connectors:

Python connectors using the bare-bones Source abstraction:

This will generate a project with a type and a name of your choice and put it in airbyte-integrations/connectors. Open the directory with your connector in an editor and follow the TODO items.

Python CDK Overview

Airbyte CDK code is within airbyte_cdk directory. Here's a high level overview of what's inside:

  • connector_builder. Internal wrapper that helps the Connector Builder platform run a declarative manifest (low-code connector). You should not use this code directly. If you need to run a SourceDeclarativeManifest, take a look at source-declarative-manifest connector implementation instead.
  • destinations. Basic Destination connector support! If you're building a Destination connector in Python, try that. Some of our vector DB destinations like destination-pinecone are using that code.
  • models expose airbyte_protocol.models as a part of airbyte_cdk package.
  • sources/concurrent_source is the Concurrent CDK implementation. It supports reading data from streams concurrently per slice / partition, useful for connectors with high throughput and high number of records.
  • sources/declarative is the low-code CDK. It works on top of Airbyte Python CDK, but provides a declarative manifest language to define streams, operations, etc. This makes it easier to build connectors without writing Python code.
  • sources/file_based is the CDK for file-based sources. Examples include S3, Azure, GCS, etc.

Contributing

Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to get you started:

  • We adhere to the code of conduct.
  • You can contribute by reporting bugs, posting github discussions, opening issues, improving documentation, and submitting pull requests with bugfixes and new features alike.
  • If you're changing the code, please add unit tests for your change.
  • When submitting issues or PRs, please add a small reproduction project. Using the changes in your connector and providing that connector code as an example (or a satellite PR) helps!

First time setup

Install the project dependencies and development tools:

poetry install --all-extras

Installing all extras is required to run the full suite of unit tests.

Running tests locally

  • Iterate on the CDK code locally
  • Run tests via poetry run poe unit-test-with-cov, or python -m pytest -s unit_tests if you want to pass pytest options.
  • Run poetry run poe check-local to lint all code, type-check modified code, and run unit tests with coverage in one command.

To see all available scripts, run poetry run poe.

Autogenerated files

Low-code CDK models are generated from sources/declarative/declarative_component_schema.yaml. If the iteration you are working on includes changes to the models or the connector generator, you might want to regenerate them. In order to do that, you can run:

poetry run poe build

This will generate the code generator docker image and the component manifest files based on the schemas and templates.

Testing

All tests are located in the unit_tests directory. Run poetry run poe unit-test-with-cov to run them. This also presents a test coverage report. For faster iteration with no coverage report and more options, python -m pytest -s unit_tests is a good place to start.

Building and testing a connector with your local CDK

When developing a new feature in the CDK, you may find it helpful to run a connector that uses that new feature. You can test this in one of two ways:

  • Running a connector locally
  • Building and running a source via Docker
Installing your local CDK into a local Python connector

Open the connector's pyproject.toml file and replace the line with airbyte_cdk with the following:

airbyte_cdk = { path = "../../../airbyte-cdk/python/airbyte_cdk", develop = true }

Then, running poetry update should reinstall airbyte_cdk from your local working directory.

Building a Python connector in Docker with your local CDK installed

Pre-requisite: Install the airbyte-ci CLI

You can build your connector image with the local CDK using

# from the airbytehq/airbyte base directory
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build

Note that the local CDK is injected at build time, so if you make changes, you will have to run the build command again to see them reflected.

Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed

Pre-requisite: Install the airbyte-ci CLI

To run acceptance tests for a single connectors using the local CDK, from the connector directory, run

airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test

When you don't have access to the API

There may be a time when you do not have access to the API (either because you don't have the credentials, network access, etc...) You will probably still want to do end-to-end testing at least once. In order to do so, you can emulate the server you would be reaching using a server stubbing tool.

For example, using mockserver, you can set up an expectation file like this:

{
  "httpRequest": {
    "method": "GET",
    "path": "/data"
  },
  "httpResponse": {
    "body": "{\"data\": [{\"record_key\": 1}, {\"record_key\": 2}]}"
  }
}

Assuming this file has been created at secrets/mock_server_config/expectations.json, running the following command will allow to match any requests on path /data to return the response defined in the expectation file:

docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0

HTTP requests to localhost:8113/data should now return the body defined in the expectations file. To test this, the implementer either has to change the code which defines the base URL for Python source or update the url_base from low-code. With the Connector Builder running in docker, you will have to use domain host.docker.internal instead of localhost as the requests are executed within docker.

Publishing a new version to PyPi

Python CDK has a GitHub workflow that manages the CDK changelog, making a new release for airbyte_cdk, publishing it to PyPI, and then making a commit to update (and subsequently auto-release) source-declarative-m anifest and Connector Builder (in the platform repository).

[!Note]: The workflow will handle the CHANGELOG.md entry for you. You should not add changelog lines in your PRs to the CDK itself.

[!Warning]: The workflow bumps version on it's own, please don't change the CDK version in pyproject.toml manually.

  1. You only trigger the release workflow once all the PRs that you want to be included are already merged into the master branch.
  2. The Publish CDK Manually workflow from master using release-type=major|manor|patch and setting the changelog message.
  3. When the workflow runs, it will commit a new version directly to master branch.
  4. The workflow will bump the version of source-declarative-manifest according to the release-type of the CDK, then commit these changes back to master. The commit to master will kick off a publish of the new version of source-declarative-manifest.
  5. The workflow will also add a pull request to airbyte-platform-internal repo to bump the dependency in Connector Builder.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airbyte_cdk-6.5.5.tar.gz (357.4 kB view details)

Uploaded Source

Built Distribution

airbyte_cdk-6.5.5-py3-none-any.whl (516.0 kB view details)

Uploaded Python 3

File details

Details for the file airbyte_cdk-6.5.5.tar.gz.

File metadata

  • Download URL: airbyte_cdk-6.5.5.tar.gz
  • Upload date:
  • Size: 357.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for airbyte_cdk-6.5.5.tar.gz
Algorithm Hash digest
SHA256 66458a20e2ee5c6ecce370d18f41ffa8c2f5ae9cb7d22e70cc3558e29b77a017
MD5 1ca4afe4a17f9d36053af3f2605acdfa
BLAKE2b-256 bcc9ec8f0ac1fe4aa9c61db0f4d4ca4fd234cd2c58fbf9184b9ee21a8192d333

See more details on using hashes here.

Provenance

The following attestation bundles were made for airbyte_cdk-6.5.5.tar.gz:

Publisher: GitHub
  • Repository: airbytehq/airbyte-python-cdk
  • Workflow: pypi_publish.yml
Attestations:
  • Statement type: https://in-toto.io/Statement/v1
    • Predicate type: https://docs.pypi.org/attestations/publish/v1
    • Subject name: airbyte_cdk-6.5.5.tar.gz
    • Subject digest: 66458a20e2ee5c6ecce370d18f41ffa8c2f5ae9cb7d22e70cc3558e29b77a017
    • Transparency log index: 148701486
    • Transparency log integration time:

File details

Details for the file airbyte_cdk-6.5.5-py3-none-any.whl.

File metadata

  • Download URL: airbyte_cdk-6.5.5-py3-none-any.whl
  • Upload date:
  • Size: 516.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for airbyte_cdk-6.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 725b15b213175df084e27f2c8b712611434915e9726fad713f00db92fa3b9925
MD5 62e1bc892feadda1ab3c1c9934ee8465
BLAKE2b-256 ff3164d91e9ee44ca0c78e9f63b503b3217337617e47d1d8dd1b783ef9214d96

See more details on using hashes here.

Provenance

The following attestation bundles were made for airbyte_cdk-6.5.5-py3-none-any.whl:

Publisher: GitHub
  • Repository: airbytehq/airbyte-python-cdk
  • Workflow: pypi_publish.yml
Attestations:
  • Statement type: https://in-toto.io/Statement/v1
    • Predicate type: https://docs.pypi.org/attestations/publish/v1
    • Subject name: airbyte_cdk-6.5.5-py3-none-any.whl
    • Subject digest: 725b15b213175df084e27f2c8b712611434915e9726fad713f00db92fa3b9925
    • Transparency log index: 148701489
    • Transparency log integration time:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page