Skip to main content

Source implementation for Box File Text Extraction.

Project description

Box Data Extract Source

This is the repository for the Box Generated source connector, written in Python. For information about how to use this connector within Airbyte, see the documentation.

This connector uses Box AI to extract data directly from documents stored in Box.

For example, a company managing lease contracts can automatically capture key details and populate their system of record. Likewise, a financial institution can extract critical data from loan application documents—such as bank statements and W-2s—and integrate it into approval workflows.

By prioritizing content intelligence, this connector unlocks new opportunities for automation and AI-driven insights.

Local development

Prerequisites

  • Python (^3.11)
  • Poetry (^1.8) - installation instructions here

Installing the connector

From this connector directory, run:

poetry install --with dev

Create credentials

If you are a community contributor, follow the instructions in the documentation to generate the necessary credentials. Then create a file secrets/config.json conforming to the src/source_box_data_extract/spec.yaml file. Note that any directory named secrets is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information. See sample_files/sample_config.json for a sample config file.

Locally running the connector

poetry run source-box-data-extract spec
poetry run source-box-data-extract check --config secrets/config.json
poetry run source-box-data-extract discover --config secrets/config.json
poetry run source-box-data-extract read --config secrets/config.json --catalog sample_files/configured_catalog.json

Running tests

To run tests locally, from the connector directory run:

poetry run pytest tests

Building the docker image

  1. Install airbyte-ci
  2. Run the following command to build the docker image:
airbyte-ci connectors --name=source-box-data-extract build

An image will be available on your host with the tag airbyte/source-box-data-extract:dev.

Running as a docker container

Then run any of the connector commands as follows:

docker run --rm airbyte/source-box-data-extract:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-box-data-extract:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-box-data-extract:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-box-data-extract:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json

Running our CI test suite

You can run our full test suite locally using airbyte-ci:

airbyte-ci connectors --name=source-box-data-extract test

Customizing acceptance Tests

Customize acceptance-test-config.yml file to configure acceptance tests. See Connector Acceptance Tests for more information. If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.

Dependency Management

All of your dependencies should be managed via Poetry. To add a new dependency, run:

poetry add <package-name>

Please commit the changes to pyproject.toml and poetry.lock files.

Publishing a new version of the connector

You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?

  1. Make sure your changes are passing our test suite: airbyte-ci connectors --name=source-box-data-extract test
  2. Bump the connector version (please follow semantic versioning for connectors):
    • bump the dockerImageTag value in in metadata.yaml
    • bump the version value in pyproject.toml
  3. Make sure the metadata.yaml content is up to date.
  4. Make sure the connector documentation and its changelog is up to date (docs/integrations/sources/box-data-extract.md).
  5. Create a Pull Request: use our PR naming conventions.
  6. Pat yourself on the back for being an awesome contributor.
  7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
  8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airbyte_source_box_data_extract-0.1.13.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file airbyte_source_box_data_extract-0.1.13.tar.gz.

File metadata

File hashes

Hashes for airbyte_source_box_data_extract-0.1.13.tar.gz
Algorithm Hash digest
SHA256 b563a1c46608cc94c2fcb47e25868e4dd70b4da6034d9019bd6f134de0d5371f
MD5 6fae136d042df37c9a46b2226f7a83ec
BLAKE2b-256 f3a1c2d52dac757f8772f85a73c5b8efaf826c5e38963e3a4898b2789ab5c7b8

See more details on using hashes here.

File details

Details for the file airbyte_source_box_data_extract-0.1.13-py3-none-any.whl.

File metadata

File hashes

Hashes for airbyte_source_box_data_extract-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 31260256f2ec2492207c67cee7ca6647b9bd4b93baade2a6d52443afca47cdcb
MD5 b4276888aadd8126fcc7c1f6c3a4d6e9
BLAKE2b-256 38a20d80954c4784cd71c5b6d672e0fd755e55d63e1033e8e8e23b06acca0e26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page