Skip to main content

llama-index readers box integration

Project description

LlamaIndex: Box Readers

This open-source integration brings the capabilities of Box.com to the LLama-Index, empowering developers building Retrieval Augmented Generation (RAG) and other LLM applications.

This README will guide you through installation, usage, and explore the functionalities of each reader.

Installation

pip install llama-index-readers-box

Available readers

We provide multiple readers, including:

  • Box Reader - Implementation of the SimpleReader interface to read files from Box.
  • Box Text Extraction - Uses Box text representation to extract text from document.
  • Box AI Prompt - Uses Box AI to extract context from documents
  • Box AI Extraction - Uses Box AI to extract structured data from documents

[!IMPORTANT] Box AI features are only available to E+ customers.

Authentication

Client credential gran (CCG)

Create a new application in the Box Developer Console and generate a new client ID and client secret. Create a .env file with the following content:

# CCG settings
BOX_CLIENT_ID = YOUR_CLIENT_ID
BOX_CLIENT_SECRET = YOUR_CLIENT_SECRET

# Common Settings
BOX_ENTERPRISE_ID = YOUR_BOX_ENTERPRISE_ID
BOX_USER_ID = YOUR_BOX_USER_ID (optional)

By default the CCG client will use a service account associated with the application. Depending on how the files are shared, the service account may not have access to all the files.

If you want to use a different user, you can specify the user ID in the .env file. In this case make sure your application can impersonate and/or generate user tokens in the scope.

Checkout this guide for more information on how to setup the CCG: Box CCG Guide

JSON web tokens (JWT)

Create a new application in the Box Developer Console and generate a new .config.json file. Create a .env file with the following content:

# Common settings
BOX_ENTERPRISE_ID = 877840855
BOX_USER_ID = 18622116055

# JWT Settings
JWT_CONFIG_PATH = /path/to/your/.config.json

By default the JWT client will use a service account associated with the application. Depending on how the files are shared, the service account may not have access to all the files.

If you want to use a different user, you can specify the user ID in the .env file. In this case make sure your application can impersonate and/or generate user tokens in the scope.

Checkout this guide for more information on how to setup the JWT: Box JWT Guide

[!WARNING] The JWT authentication requires extra dependencies in the SDK. You can install them by running:

pip install "box-sdk-gen[jwt]"

Box Client

To work with the box readers, you will need to provide a Box Client. The Box Client can be created using either the Client Credential Grant (CCG), JSON Web Tokens (JWT), OAuth 2.0, and developer token.

Using CCG authentication

from box_sdk_gen import CCGConfig, BoxCCGAuth, BoxClient

config = CCGConfig(
    client_id="your_client_id",
    client_secret="your_client_secret",
    enterprise_id="your_enterprise_id",
    user_id="your_ccg_user_id",  # Optional
)
auth = BoxCCGAuth(config)
if config.user_id:
    auth.with_user_subject(config.user_id)
client = BoxClient(auth)

reader = BoxReader(box_client=client)

Using JWT authentication

from box_sdk_gen import JWTConfig, BoxJWTAuth, BoxClient

# Using manual configuration
config = JWTConfig(
    client_id="YOUR_BOX_CLIENT_ID",
    client_secret="YOUR_BOX_CLIENT_SECRET",
    jwt_key_id="YOUR_BOX_JWT_KEY_ID",
    private_key="YOUR_BOX_PRIVATE_KEY",
    private_key_passphrase="YOUR_BOX_PRIVATE_KEY_PASSPHRASE",
    enterprise_id="YOUR_BOX_ENTERPRISE_ID",
    user_id="YOUR_BOX_USER_ID",
)

# Using configuration file
config = JWTConfig.from_config_file("path/to/your/.config.json")


user_id = "1234"
if user_id:
    config.user_id = user_id
    config.enterprise_id = None
auth = BoxJWTAuth(config)
client = BoxClient(auth)

reader = BoxReader(box_client=client)

Author

Box-Community This is an open source reader, contributions are welcome.

This loader is designed to be used as a way to load data into LlamaIndex.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_box-0.5.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_box-0.5.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_box-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_box-0.5.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_box-0.5.0.tar.gz
Algorithm Hash digest
SHA256 4171c00d94fac3b189a304a2ab987b2640e70713c075e7d2cf825d8e23a5ed64
MD5 c8be02b0d03d8adc0fecc5ceeff9d31d
BLAKE2b-256 87bcc0970e7d8a01236540051163a861c610220904385820b503a5c88d5973a4

See more details on using hashes here.

File details

Details for the file llama_index_readers_box-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_box-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_box-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ed344fd51e8250f0449eca22108a530131dc002e0871b9c097467f1dabc24d7
MD5 f6ed400dee8e06dfd2f2a4c6e2ba512c
BLAKE2b-256 8b0ca652525eaf37e0d75a4c05a7534b633a062781a7642ef739018a8099a8ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page