Skip to main content

llama-index readers microsoft_sharepoint integration

Project description

Microsoft SharePoint Reader

pip install llama-index-readers-microsoft-sharepoint

The loader loads the files from a folder in SharePoint site or SharePoint Site Pages.

It also supports traversing recursively through the sub-folders.

Prerequisites

App Authentication using Microsoft Entra ID (formerly Azure AD)

  1. You need to create an App Registration in Microsoft Entra ID. Refer here
  2. API Permissions for the created app:
    • Microsoft Graph → Application Permissions → Sites.Read.All (Grant Admin Consent) (Allows access to all sites in the tenant)
    • OR Microsoft Graph → Application Permissions → Sites.Selected (Grant Admin Consent) (Allows access only to specific sites you select and grant permissions for)
    • Microsoft Graph → Application Permissions → Files.Read.All (Grant Admin Consent)
    • Microsoft Graph → Application Permissions → BrowserSiteLists.Read.All (Grant Admin Consent)

Note: If you use Sites.Selected, you must grant your app access to the specific SharePoint site(s) via the SharePoint admin center. See Grant access to a specific site for details.

More info on Microsoft Graph APIs - Refer here

Usage

To use this loader client_id, client_secret and tenant_id of the registered app in Microsoft Azure Portal is required.

Loading Files from SharePoint Drive

This loader loads the files present in a specific folder in SharePoint.

If the files are present in the Test folder in SharePoint Site under root directory, then the input for the loader for file_path is Test

FilePath

from llama_index.readers.microsoft_sharepoint import SharePointReader

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
)

documents = loader.load_data(
    sharepoint_site_name="<Sharepoint Site Name>",
    sharepoint_folder_path="<Folder Path>",
    recursive=True,
)

Using Sites.Selected Permission

If you have only been granted access to a specific site (using Sites.Selected), you can use the site host name and relative URL instead of the site name:

from llama_index.readers.microsoft_sharepoint import SharePointReader

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
    sharepoint_host_name="contoso.sharepoint.com",
    sharepoint_relative_url="sites/YourSiteName",
)

documents = loader.load_data(
    sharepoint_folder_path="<Folder Path>",
    recursive=True,
)

Loading SharePoint Site Pages

You can also load SharePoint Site Pages as documents by setting sharepoint_type to PAGE:

from llama_index.readers.microsoft_sharepoint import (
    SharePointReader,
    SharePointType,
)

loader = SharePointReader(
    client_id="<Client ID of the app>",
    client_secret="<Client Secret of the app>",
    tenant_id="<Tenant ID of the Microsoft Azure Directory>",
    sharepoint_site_name="<Sharepoint Site Name>",
    sharepoint_host_name="<your-tenant>.sharepoint.com",
    sharepoint_relative_url="/sites/<YourSite>",
    sharepoint_type=SharePointType.PAGE,
)

# Load all pages
documents = loader.load_data()

# Or load a specific page by ID
loader.sharepoint_file_id = "<page_id>"
documents = loader.load_data()

Filtering Pages with Callbacks

You can filter which pages to process using the process_document_callback:

def page_filter(page_name: str) -> bool:
    # Only process pages that don't start with "Draft"
    return not page_name.startswith("Draft")


loader = SharePointReader(
    client_id="<Client ID>",
    client_secret="<Client Secret>",
    tenant_id="<Tenant ID>",
    sharepoint_site_name="<Site Name>",
    sharepoint_type=SharePointType.PAGE,
    process_document_callback=page_filter,
)

Error Handling

Control error behavior with fail_on_error:

loader = SharePointReader(
    client_id="<Client ID>",
    client_secret="<Client Secret>",
    tenant_id="<Tenant ID>",
    fail_on_error=False,  # Log errors and continue instead of raising
)

Instrumentation Events

The SharePoint reader emits events during page processing for monitoring:

from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.readers.microsoft_sharepoint import (
    TotalPagesToProcessEvent,
    PageDataFetchCompletedEvent,
    PageFailedEvent,
)


class SharePointEventHandler(BaseEventHandler):
    def handle(self, event):
        if isinstance(event, TotalPagesToProcessEvent):
            print(f"Processing {event.total_pages} pages...")
        elif isinstance(event, PageDataFetchCompletedEvent):
            print(f"Completed: {event.page_id}")
        elif isinstance(event, PageFailedEvent):
            print(f"Failed: {event.page_id} - {event.error}")


dispatcher = get_dispatcher("llama_index.readers.microsoft_sharepoint.base")
dispatcher.add_event_handler(SharePointEventHandler())

Available events:

  • TotalPagesToProcessEvent: Total number of pages to process
  • PageDataFetchStartedEvent: Page processing started
  • PageDataFetchCompletedEvent: Page successfully processed
  • PageSkippedEvent: Page skipped (via callback)
  • PageFailedEvent: Page processing failed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_microsoft_sharepoint-0.9.1.tar.gz.

File metadata

  • Download URL: llama_index_readers_microsoft_sharepoint-0.9.1.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_microsoft_sharepoint-0.9.1.tar.gz
Algorithm Hash digest
SHA256 02b9f399ccbecf3185c61dbd64a9fe5118c4b490bd95f196d6695a10da3834b5
MD5 9669f8cc5a1035f67d9703ab5752e031
BLAKE2b-256 9f5dc20828a5da49275435fc13da60d498ff4a5ac16321d553de6b2df453a7a4

See more details on using hashes here.

File details

Details for the file llama_index_readers_microsoft_sharepoint-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_microsoft_sharepoint-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_microsoft_sharepoint-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 03a60223e93ce219bbc5b1e8558f2af48b5555aa4b23b80ebcd483fa2350ace4
MD5 2f51a938a3092e5e0240f856aa8ef67c
BLAKE2b-256 4adcc2fe40e5f58b4c5f61b9e6abce516a3bfa4e1a89b19f6577e1be05689463

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page