Skip to main content

No project description provided

Project description

MSGraphFS

A fsspec based filesystem for Microsoft Graph API drives (SharePoint, OneDrive). Features lazy initialization, fork-safety for multi-process environments like Airflow, and comprehensive permission management.

📖 Microsoft Graph OneDrive API Documentation

🚀 Quick Start

Simple Usage

import msgraphfs

# Easy setup - just provide your app credentials and site/drive names
fs = msgraphfs.MSGDriveFS(
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    site_name="YourSiteName",        # SharePoint site name
    drive_name="Documents"           # Optional: defaults to site's default drive
)

# Start using it like any filesystem
files = fs.ls("/")
print(f"Found {len(files)} items")

# Read files
with fs.open("/path/to/file.txt") as f:
    content = f.read()

# Write files
with fs.open("/path/to/new_file.txt", "w") as f:
    f.write("Hello SharePoint!")

Using fsspec Protocol

import fsspec

fs = fsspec.filesystem("msgd",
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    site_name="YourSiteName",
    drive_name="Documents"
)

fs.ls("/")

Environment Variables

You can also use environment variables:

export MSGRAPHFS_CLIENT_ID="your-client-id"
export MSGRAPHFS_TENANT_ID="your-tenant-id"
export MSGRAPHFS_CLIENT_SECRET="your-client-secret"
import msgraphfs

# Credentials loaded from environment
fs = msgraphfs.MSGDriveFS(
    site_name="YourSiteName",
    drive_name="Documents"
)

✨ Key Features

🔧 Automatic Discovery

  • No manual drive/site ID lookup required - just provide site and drive names
  • Automatic OAuth2 token management - handles client credentials flow
  • Fork-safe lazy initialization - perfect for multi-process environments like Airflow

🔐 Permission Management

# Get detailed permissions for any file/directory
permissions = fs.get_permissions("/sensitive-document.pdf")

print(f"Total permissions: {permissions['summary']['total_permissions']}")
print(f"Users with access: {permissions['summary']['user_count']}")

# Check specific users and roles
for user in permissions['users']:
    print(f"{user['display_name']}: {', '.join(user['roles'])}")

📁 Enhanced File Operations

  • Expand queries: Get additional metadata with expand="permissions" or expand="thumbnails"
  • Version control: get_versions(), checkin(), checkout()
  • File preview: preview() for web preview URLs
  • Format conversion: get_content(format="pdf") to convert documents

🚀 Airflow Integration

from airflow.io.path import ObjectStoragePath, attach
import msgraphfs

# Safe to do at module level - lazy initialization prevents fork issues
attach(protocol="sharepoint", fs=msgraphfs.MSGDriveFS(
    site_name="YourSite",
    drive_name="Documents",
    # credentials from environment or parameters
))

@task
def process_files():
    # Works perfectly in Airflow tasks
    src_path = ObjectStoragePath("sharepoint://folder/file.docx")
    content = src_path.read_text()
    return content

📋 Advanced Usage

Working with Item IDs

Many methods accept an optional item_id parameter for efficiency:

# Get item ID for later use
item_id = fs.get_item_id("/important/document.pdf")

# Use item_id to avoid path lookups
info = fs.info("/any/path", item_id=item_id)
content = fs.get_content(item_id=item_id)
permissions = fs.get_permissions(item_id=item_id)

Document Conversion

# Convert Word document to PDF
pdf_content = fs.get_content("/document.docx", format="pdf")

# Get file preview URL
preview_url = fs.preview("/presentation.pptx")

Version Control

# Check out for editing
fs.checkout("/document.docx")

# Make changes...
with fs.open("/document.docx", "w") as f:
    f.write("Updated content")

# Check back in with comment
fs.checkin("/document.docx", "Updated quarterly numbers")

# View version history
versions = fs.get_versions("/document.docx")
for version in versions:
    print(f"Version {version['id']}: {version['lastModifiedDateTime']}")

🔧 Installation

uv add msgraphfs-dev

Or with pip:

pip install msgraphfs-dev

⚙️ Setup Requirements

Azure App Registration

  1. Register an Azure AD application at https://portal.azure.com
  2. Configure API permissions (Application permissions for client credentials flow):
    • Sites.Read.All or Sites.ReadWrite.All
    • Files.Read.All or Files.ReadWrite.All
    • ⚠️ Important: Grant admin consent for your organization
  3. Create a client secret
  4. Note down:
    • Application (client) ID
    • Directory (tenant) ID
    • Client secret value

OAuth2 Scopes

MSGraphFS uses client credentials flow with the default scope (https://graph.microsoft.com/.default). This automatically includes all the application permissions you've granted to your Azure app registration.

You don't need to specify individual scopes - the library handles this automatically! 🎯

SharePoint Site Access

  • Ensure your Azure app has access to the SharePoint site
  • You only need the site name and drive name (e.g., "Documents")
  • No manual ID lookups required! 🎉

Legacy Usage (Advanced)

If you prefer to specify drive IDs directly:

fs = msgraphfs.MSGDriveFS(
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    drive_id="specific-drive-id"  # Skip auto-discovery
)

Find drive IDs using Microsoft Graph Explorer:

  • Sites: GET /sites/{hostname}:/sites/{site-name}
  • Drives: GET /sites/{site-id}/drives

🛠️ Development

To develop this package, you can clone the repository and install the dependencies using uv:

git clone your-repo-url (a fork of https://github.com/acsone/msgraphfs)
cd msgraphfs
uv sync

This will install the package in editable mode with all dependencies, so you can make changes to the code and test them without having to reinstall the package every time.

To run the tests with the test dependencies:

uv run pytest

Or with pip (legacy):

pip install -e .[test]
pytest

Testing the package requires you to have access to a Microsoft Drive (OneDrive, Sharepoint, etc) and to have the client_id, client_secret, tenant_id, dirve_id, site_name and the user's access token.

How to get an access token required for testing

The first step is to get your user's access token.

Prerequisites

  • A registered Azure AD application with:
    • client_id and client_secret
    • Delegated permissions granted (e.g., Files.ReadWrite.All, Sites.ReadWrite.All)
    • A redirect URI configured (e.g., http://localhost:5000/callback)

1. Build the OAuth2 authorization URL

Open the following URL in your browser (replace values as needed):

https://login.microsoftonline.com/<TENANT_ID>/oauth2/v2.0/authorize?
client_id=<CLIENT_ID>
&response_type=code
&redirect_uri=http://localhost:5000/callback
&response_mode=query
&scope=offline_access%20User.Read%20Files.ReadWrite.All%20Sites.ReadWrite.All

You will be asked to log in with your Microsoft account and to grant the requested permissions.

2. Copy the Authorization Code

Once logged in, you'll be redirected to:

http://localhost:5000/callback?code=<AUTHORIZATION_CODE>

Copy the value of code from the URL.

Launch the test suite

To run the test suite, you just need to run the pytest command in the root directory with the following arguments:

  • --auth-code: The authorization code you got in the previous step. (It's only required if you launch the tests for the first time or if your refresh token is expired and you need to get a new access token)
  • --client-id: The client id of your Azure AD application.
  • --client-secret: The client secret of your Azure AD application.
  • --tenant-id: The tenant id of your Azure AD application.
  • --drive-id: The drive id of the drive you want to access.
  • --site-name: The name of the site you want to access. (Only required for tests related to the access to the recycling bin)
pytest --auth-code <AUTH_CODE> \
       --client-id <CLIENT_ID> \
       --client-secret <CLIENT_SECRET> \
       --tenant-id <TENANT_ID> \
       --drive-id <DRIVE_ID> \
       --site-name <SITE_NAME> \
       tests

Alternatively, you can set the environment variables MSGRAPHFS_AUTH_CODE, MSGRAPHFS_CLIENT_ID, MSGRAPHFS_CLIENT_SECRET, MSGRAPHFS_TENANT_ID, MSGRAPHFS_DRIVE_ID and MSGRAPHFS_SITE_NAME to avoid passing the arguments to pytest.

When the auth-code is provided and we need to get the access token (IOW when it's the first time you run the tests or when your refresh token is expired), the package will automatically get the access token and store it in a encrypted file into the keyring of your system. The call to the token endpoint requires a redirect_uri parameter. This one should match one of the redirect URIs you configured in your Azure AD application. By default, it is set to http://localhost:8069/microsoft_account/authentication, but you can change it by setting the environment variable MSGRAPHFS_AUTH_REDIRECT_URI or by passing the --auth-redirect-uri argument to pytest.

Pre-commit hooks

To ensure code quality, this package uses pre-commit hooks. You can install them by running:

pre-commit install

This will set up the pre-commit hooks to run automatically before each commit. You can also run them manually by executing:

pre-commit run --all-files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msgraphfs_dev-0.7.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msgraphfs_dev-0.7-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file msgraphfs_dev-0.7.tar.gz.

File metadata

  • Download URL: msgraphfs_dev-0.7.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.7.tar.gz
Algorithm Hash digest
SHA256 5d2d8ca10d2e746037c6b61dc94c1ba5662d0ab78e550cbbb71a7b3ef0997b3d
MD5 addc6e3ba59e36f9cf7d704fd1e6f13f
BLAKE2b-256 dcfb4210429b1d90d46a71f5e525d14a6c5602cd381da0ae5b50f6d9cbc59c35

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.7.tar.gz:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file msgraphfs_dev-0.7-py3-none-any.whl.

File metadata

  • Download URL: msgraphfs_dev-0.7-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c1c8868ffb0d3d334866c99cc72e17f3a3182dc578d1eb2b558fef39383984a1
MD5 8cc97d2ea764af08f3ad81a3ea1f8bf1
BLAKE2b-256 cea45721e8b3d09425f4cbe06012c1bc4d1a6cdeb96404130af354667e5e5d91

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.7-py3-none-any.whl:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page