Skip to main content

llama-index readers gcs integration

Project description

GCS File or Directory Loader

This loader parses any file stored on Google Cloud Storage (GCS), or the entire Bucket (with an optional prefix filter) if no particular file is specified. It now supports more advanced operations through the implementation of ResourcesReaderMixin and FileSystemReaderMixin.

Features

  • Parse single files or entire buckets from GCS
  • List resources in GCS buckets
  • Retrieve detailed information about GCS objects
  • Load specific resources from GCS
  • Read file content directly
  • Supports various authentication methods
  • Comprehensive logging for easier debugging
  • Robust error handling for improved reliability

Authentication

When initializing GCSReader, you may pass in your GCP Service Account Key in several ways:

  1. As a file path (service_account_key_path)
  2. As a JSON string (service_account_key_json)
  3. As a dictionary (service_account_key)

If no credentials are provided, the loader will attempt to use default credentials.

Usage

To use this loader, you need to pass in the name of your GCS Bucket. You can then either parse a single file by passing its key, or parse multiple files using a prefix.

from llama_index import GCSReader
import logging

# Set up logging (optional, but recommended)
logging.basicConfig(level=logging.INFO)

# Initialize the reader
reader = GCSReader(
    bucket="scrabble-dictionary",
    key="dictionary.txt",  # Optional: specify a single file
    # prefix="subdirectory/",  # Optional: specify a prefix to filter files
    service_account_key_json="[SERVICE_ACCOUNT_KEY_JSON]",
)

# Load data
documents = reader.load_data()

# List resources in the bucket
resources = reader.list_resources()

# Get information about a specific resource
resource_info = reader.get_resource_info("dictionary.txt")

# Load a specific resource
specific_doc = reader.load_resource("dictionary.txt")

# Read file content directly
file_content = reader.read_file_content("dictionary.txt")

print(f"Loaded {len(documents)} documents")
print(f"Found {len(resources)} resources")
print(f"Resource info: {resource_info}")
print(f"Specific document: {specific_doc}")
print(f"File content length: {len(file_content)} bytes")

Note: If the file is nested in a subdirectory, the key should contain that, e.g., subdirectory/input.txt.

Advanced Usage

All files are parsed with SimpleDirectoryReader. You may specify a custom file_extractor, relying on any of the loaders in the LlamaIndex library (or your own)!

from llama_index import GCSReader, SimpleMongoReader

reader = GCSReader(
    bucket="my-bucket",
    file_extractor={
        ".mongo": SimpleMongoReader(),
        # Add more custom extractors as needed
    },
)

Error Handling

The GCSReader now includes comprehensive error handling. You can catch exceptions to handle specific error cases:

from google.auth.exceptions import DefaultCredentialsError

try:
    reader = GCSReader(bucket="your-bucket-name")
    documents = reader.load_data()
except DefaultCredentialsError:
    print("Authentication failed. Please check your credentials.")
except Exception as e:
    print(f"An error occurred: {str(e)}")

Logging

To get insights into the GCSReader's operations, configure logging in your application:

import logging

logging.basicConfig(level=logging.INFO)

This loader is designed to be used as a way to load data into LlamaIndex. For more advanced usage, including custom file extractors, metadata extraction, and working with specific file types, please refer to the LlamaIndex documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_gcs-0.6.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_gcs-0.6.1-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_gcs-0.6.1.tar.gz.

File metadata

  • Download URL: llama_index_readers_gcs-0.6.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_gcs-0.6.1.tar.gz
Algorithm Hash digest
SHA256 cbb94bf11927705f1a85cd67c66d1b0a67e3f181f2b9b38c4001268d7a7edffb
MD5 4e0773b91b59ca5f257b1dd51668fbb0
BLAKE2b-256 5d8b7aa3690390a30e27e6b185f0564c79cef22eadb610dfb46355132962b947

See more details on using hashes here.

File details

Details for the file llama_index_readers_gcs-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_gcs-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_gcs-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 efd0bcdd7b8cd0adf63588bb69f7d74ebae860de9d0f3303a9d9243d60075188
MD5 f15e1e67e877e9538184f310d2c7c921
BLAKE2b-256 f7d12fe32cdee4820f777c4d7045e35599f7117924c71a83d7e9d4777da12e28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page