Skip to main content

No project description provided

Project description

Filesystem interface to Microsoft Graph API (SharePoint, OneDrive)

PyPI version shields.io

Quickstart

This package can be installed using:

pip install msgraphfs

or

uv add msgraphfs

The msgd://, sharepoint://, and onedrive:// protocols are included in fsspec's known_implementations registry, allowing seamless integration with fsspec-compatible libraries.

To use the filesystem with specific site and drive:

import pandas as pd

storage_options = {
    'client_id': 'your-client-id',
    'tenant_id': 'your-tenant-id',
    'client_secret': 'your-client-secret',
    'site_name': 'YourSiteName',
    'drive_name': 'Documents'
}

df = pd.read_csv('msgd://folder/data.csv', storage_options=storage_options)

To use multi-site mode where site and drive are specified in the URL:

import pandas as pd

storage_options = {
    'client_id': 'your-client-id',
    'tenant_id': 'your-tenant-id',
    'client_secret': 'your-client-secret'
}

df = pd.read_csv('msgd://YourSite/Documents/folder/data.csv', storage_options=storage_options)
df = pd.read_parquet('sharepoint://AnotherSite/Reports/data.parquet', storage_options=storage_options)

Accepted protocol / uri formats include:

  • msgd://site/drive/path/file (multi-site mode)
  • sharepoint://site/drive/path/file (multi-site mode)
  • onedrive://drive/path/file (OneDrive personal)
  • msgd://path/file (single-site mode when site_name and drive_name specified in storage_options)

To read files, you can optionally set the MSGRAPHFS_CLIENT_ID, MSGRAPHFS_TENANT_ID, and MSGRAPHFS_CLIENT_SECRET environment variables, then storage_options will be read from the environment:

import pandas as pd

# With environment variables set, you can omit credentials from storage_options
storage_options = {'site_name': 'YourSite', 'drive_name': 'Documents'}
df = pd.read_csv('msgd://folder/data.csv', storage_options=storage_options)

Details

The package provides a pythonic filesystem implementation for Microsoft Graph API drives (SharePoint and OneDrive), facilitating interactions between Microsoft 365 services and data processing libraries like Pandas, Dask, and others. This is implemented using the fsspec base class and Microsoft Graph Python SDK.

Operations work with Azure AD application credentials using the client credentials flow, suitable for server-to-server authentication scenarios.

The filesystem automatically handles OAuth2 token management, site and drive discovery, and provides fork-safe lazy initialization perfect for multi-process environments like Apache Airflow.

Setting credentials

The storage_options can be instantiated with the following authentication parameters:

Required for authentication:

  • client_id: Azure AD application (client) ID
  • tenant_id: Azure AD directory (tenant) ID
  • client_secret: Azure AD application client secret

Optional filesystem parameters:

  • site_name: SharePoint site name (for single-site mode or site discovery)
  • drive_name: Drive/library name (e.g., "Documents", "CustomLibrary")
  • drive_id: Specific drive ID (bypasses site/drive discovery)
  • oauth2_client_params: Pre-built OAuth2 parameters dict
  • use_recycle_bin: Enable recycle bin operations (default: False)

For more details on all available parameters, see the MSGDriveFS documentation.

The following environment variables can be set and will be automatically detected:

  • MSGRAPHFS_CLIENT_ID (or AZURE_CLIENT_ID as fallback)
  • MSGRAPHFS_TENANT_ID (or AZURE_TENANT_ID as fallback)
  • MSGRAPHFS_CLIENT_SECRET (or AZURE_CLIENT_SECRET as fallback)

Usage modes

The filesystem can be used in different modes based on the storage_options provided:

  1. Single-site mode: Specify site_name and drive_name in storage_options, then use relative paths in URLs:

    storage_options = {
        'client_id': CLIENT_ID,
        'tenant_id': TENANT_ID,
        'client_secret': CLIENT_SECRET,
        'site_name': 'YourSite',
        'drive_name': 'Documents'
    }
    df = pd.read_csv('msgd://folder/file.csv', storage_options=storage_options)
    
  2. Multi-site mode: Omit site_name and drive_name from storage_options, specify them in the URL:

    storage_options = {
        'client_id': CLIENT_ID,
        'tenant_id': TENANT_ID,
        'client_secret': CLIENT_SECRET
    }
    df = pd.read_csv('msgd://YourSite/Documents/folder/file.csv', storage_options=storage_options)
    
  3. Direct drive access: Use drive_id to bypass site discovery:

    storage_options = {
        'client_id': CLIENT_ID,
        'tenant_id': TENANT_ID,
        'client_secret': CLIENT_SECRET,
        'drive_id': 'specific-drive-id'
    }
    df = pd.read_csv('msgd://folder/file.csv', storage_options=storage_options)
    

Advanced features

File operations with metadata

import fsspec

fs = fsspec.filesystem('msgd', **storage_options)

# List files with detailed metadata
files = fs.ls('/folder', detail=True)

# Get file information with permissions
info = fs.info('/document.pdf', expand='permissions')

# Read file with version control
with fs.open('/document.docx', mode='r') as f:
    content = f.read()

Permission management

# Get detailed permissions for files and folders
permissions = fs.get_permissions('/sensitive-folder')
print(f"Total permissions: {permissions['summary']['total_permissions']}")

Integration with data processing libraries

import dask.dataframe as dd

# Read multiple CSV files using Dask
ddf = dd.read_csv('msgd://YourSite/Data/*.csv', storage_options=storage_options)

# Read Parquet files
ddf = dd.read_parquet('sharepoint://Reports/Analytics/data.parquet', storage_options=storage_options)

Azure AD Setup

To use this filesystem, you need to register an Azure AD application:

  1. Go to the Azure Portal
  2. Register a new application under "Azure Active Directory" > "App registrations"
  3. Configure API permissions (Application permissions). Choose based on your needs:
    • For read-only access: Sites.Read.All
    • For read-write access: Sites.ReadWrite.All
    • Optional for enhanced functionality: Files.Read.All or Files.ReadWrite.All
  4. Grant admin consent for your organization
  5. Create a client secret
  6. Note the Application (client) ID, Directory (tenant) ID, and client secret

The filesystem uses the OAuth2 client credentials flow with the default scope (https://graph.microsoft.com/.default), which automatically includes all application permissions granted to your Azure AD application.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msgraphfs_dev-0.9.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msgraphfs_dev-0.9-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file msgraphfs_dev-0.9.tar.gz.

File metadata

  • Download URL: msgraphfs_dev-0.9.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.9.tar.gz
Algorithm Hash digest
SHA256 7434f88a7b9e87db89866f63a8b6db2fe2a1c61751ab0155237aeaa1024466c8
MD5 5e5e55d5d380504c94f4736ba1934339
BLAKE2b-256 e4585df42cb66794db7acd217ee67e134a9ef1f170902bf5fddef851b4356d61

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.9.tar.gz:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file msgraphfs_dev-0.9-py3-none-any.whl.

File metadata

  • Download URL: msgraphfs_dev-0.9-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 3196255f00e1178378cc6b844fd890cf0e8cb947f57ef53b63e876479276b511
MD5 23c8c6731c9d3692252a74c3c7ee328f
BLAKE2b-256 3c62de5c041aac55b213788f7910eee945e8c7bffe371b092dc749504e16ce33

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.9-py3-none-any.whl:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page