Skip to main content

llama-index readers microsoft_sharepoint integration

Project description

Microsoft SharePoint Reader

pip install llama-index-readers-microsoft-sharepoint

The loader loads the files from a folder in SharePoint site or SharePoint Site Pages.

It also supports traversing recursively through the sub-folders.

Prerequisites

App Authentication using Microsoft Entra ID (formerly Azure AD)

  1. You need to create an App Registration in Microsoft Entra ID. Refer here
  2. API Permissions for the created app:
    • Microsoft Graph → Application Permissions → Sites.Read.All (Grant Admin Consent) (Allows access to all sites in the tenant)
    • OR Microsoft Graph → Application Permissions → Sites.Selected (Grant Admin Consent) (Allows access only to specific sites you select and grant permissions for)
    • Microsoft Graph → Application Permissions → Files.Read.All (Grant Admin Consent)
    • Microsoft Graph → Application Permissions → BrowserSiteLists.Read.All (Grant Admin Consent)

Note: If you use Sites.Selected, you must grant your app access to the specific SharePoint site(s) via the SharePoint admin center. See Grant access to a specific site for details.

More info on Microsoft Graph APIs - Refer here

Usage

To use this loader client_id, client_secret and tenant_id of the registered app in Microsoft Azure Portal is required.

Loading Files from SharePoint Drive

This loader loads the files present in a specific folder in SharePoint.

If the files are present in the Test folder in SharePoint Site under root directory, then the input for the loader for file_path is Test

FilePath

from llama_index.readers.microsoft_sharepoint import SharePointReader

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
)

documents = loader.load_data(
    sharepoint_site_name="<Sharepoint Site Name>",
    sharepoint_folder_path="<Folder Path>",
    recursive=True,
)

Using Sites.Selected Permission

If you have only been granted access to a specific site (using Sites.Selected), you can use the site host name and relative URL instead of the site name:

from llama_index.readers.microsoft_sharepoint import SharePointReader

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
    sharepoint_host_name="contoso.sharepoint.com",
    sharepoint_relative_url="sites/YourSiteName",
)

documents = loader.load_data(
    sharepoint_folder_path="<Folder Path>",
    recursive=True,
)

Loading SharePoint Site Pages

You can also load SharePoint Site Pages as documents by setting sharepoint_type to PAGE:

from llama_index.readers.microsoft_sharepoint import (
    SharePointReader,
    SharePointType,
)

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
    sharepoint_site_name="<Sharepoint Site Name>",
    sharepoint_host_name="<your-tenant>.sharepoint.com",
    sharepoint_relative_url="/sites/<YourSite>",
    sharepoint_type=SharePointType.PAGE,
)

# Load all pages
documents = loader.load_data()

# Or load a specific page by ID
loader.sharepoint_file_id = "<page_id>"
documents = loader.load_data()

Filtering Pages with Callbacks

You can filter which pages to process using the process_document_callback:

def page_filter(page_name: str) -> bool:
    # Only process pages that don't start with "Draft"
    return not page_name.startswith("Draft")


loader = SharePointReader(
    client_id="<Client ID>",
    client_secret="<Client Secret>",
    tenant_id="<Tenant ID>",
    sharepoint_site_name="<Site Name>",
    sharepoint_type=SharePointType.PAGE,
    process_document_callback=page_filter,
)

Error Handling

Control error behavior with fail_on_error:

loader = SharePointReader(
    client_id="<Client ID>",
    client_secret="<Client Secret>",
    tenant_id="<Tenant ID>",
    fail_on_error=False,  # Log errors and continue instead of raising
)

Instrumentation Events

The SharePoint reader emits events during page processing for monitoring:

from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.readers.microsoft_sharepoint import (
    TotalPagesToProcessEvent,
    PageDataFetchCompletedEvent,
    PageFailedEvent,
)


class SharePointEventHandler(BaseEventHandler):
    def handle(self, event):
        if isinstance(event, TotalPagesToProcessEvent):
            print(f"Processing {event.total_pages} pages...")
        elif isinstance(event, PageDataFetchCompletedEvent):
            print(f"Completed: {event.page_id}")
        elif isinstance(event, PageFailedEvent):
            print(f"Failed: {event.page_id} - {event.error}")


dispatcher = get_dispatcher("llama_index.readers.microsoft_sharepoint.base")
dispatcher.add_event_handler(SharePointEventHandler())

Available events:

  • TotalPagesToProcessEvent: Total number of pages to process
  • PageDataFetchStartedEvent: Page processing started
  • PageDataFetchCompletedEvent: Page successfully processed
  • PageSkippedEvent: Page skipped (via callback)
  • PageFailedEvent: Page processing failed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_microsoft_sharepoint-0.9.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_microsoft_sharepoint-0.9.0.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_microsoft_sharepoint-0.9.0.tar.gz
Algorithm Hash digest
SHA256 9e184e0e90ea43dfb9990a5de1099bcf0e4efd35f5f57a0c147fe04bf5c36409
MD5 12c7ff7bd1e9248e4f24bea6186a7fcc
BLAKE2b-256 866c93ab548f26f0f6e00f71029184c244d77d25ec7f404788a8085c09704741

See more details on using hashes here.

File details

Details for the file llama_index_readers_microsoft_sharepoint-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_microsoft_sharepoint-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 53.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_microsoft_sharepoint-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4f1306364f151de17e30e8e40a251b82f3a2d5a6ffea0bcbf2b3c865396c49f
MD5 c256bd31b771e1d478cbe5ff06243a6a
BLAKE2b-256 6e769133356555fc37f2e0263b4e757ee5d88baf0bda884c499a8e1ccc166769

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page