Skip to main content

No project description provided

Project description

MSGraphFS

A fsspec based filesystem for Microsoft Graph API drives (SharePoint, OneDrive). Features lazy initialization, fork-safety for multi-process environments like Airflow, and comprehensive permission management.

📖 Microsoft Graph OneDrive API Documentation

🚀 Quick Start

Simple Usage

import msgraphfs

# Easy setup - just provide your app credentials and site/drive names
fs = msgraphfs.MSGDriveFS(
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    site_name="YourSiteName",        # SharePoint site name
    drive_name="Documents"           # Optional: defaults to site's default drive
)

# Start using it like any filesystem
files = fs.ls("/")
print(f"Found {len(files)} items")

# Read files
with fs.open("/path/to/file.txt") as f:
    content = f.read()

# Write files
with fs.open("/path/to/new_file.txt", "w") as f:
    f.write("Hello SharePoint!")

Using fsspec Protocol

import fsspec

fs = fsspec.filesystem("msgd",
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    site_name="YourSiteName",
    drive_name="Documents"
)

fs.ls("/")

Environment Variables

You can also use environment variables:

export MSGRAPHFS_CLIENT_ID="your-client-id"
export MSGRAPHFS_TENANT_ID="your-tenant-id"
export MSGRAPHFS_CLIENT_SECRET="your-client-secret"
import msgraphfs

# Credentials loaded from environment
fs = msgraphfs.MSGDriveFS(
    site_name="YourSiteName",
    drive_name="Documents"
)

✨ Key Features

🔧 Automatic Discovery

  • No manual drive/site ID lookup required - just provide site and drive names
  • Automatic OAuth2 token management - handles client credentials flow
  • Fork-safe lazy initialization - perfect for multi-process environments like Airflow

🔐 Permission Management

# Get detailed permissions for any file/directory
permissions = fs.get_permissions("/sensitive-document.pdf")

print(f"Total permissions: {permissions['summary']['total_permissions']}")
print(f"Users with access: {permissions['summary']['user_count']}")

# Check specific users and roles
for user in permissions['users']:
    print(f"{user['display_name']}: {', '.join(user['roles'])}")

📁 Enhanced File Operations

  • Expand queries: Get additional metadata with expand="permissions" or expand="thumbnails"
  • Version control: get_versions(), checkin(), checkout()
  • File preview: preview() for web preview URLs
  • Format conversion: get_content(format="pdf") to convert documents

🚀 Airflow Integration

from airflow.io.path import ObjectStoragePath, attach
import msgraphfs

# Safe to do at module level - lazy initialization prevents fork issues
attach(protocol="sharepoint", fs=msgraphfs.MSGDriveFS(
    site_name="YourSite",
    drive_name="Documents",
    # credentials from environment or parameters
))

@task
def process_files():
    # Works perfectly in Airflow tasks
    src_path = ObjectStoragePath("sharepoint://folder/file.docx")
    content = src_path.read_text()
    return content

📋 Advanced Usage

Working with Item IDs

Many methods accept an optional item_id parameter for efficiency:

# Get item ID for later use
item_id = fs.get_item_id("/important/document.pdf")

# Use item_id to avoid path lookups
info = fs.info("/any/path", item_id=item_id)
content = fs.get_content(item_id=item_id)
permissions = fs.get_permissions(item_id=item_id)

Document Conversion

# Convert Word document to PDF
pdf_content = fs.get_content("/document.docx", format="pdf")

# Get file preview URL
preview_url = fs.preview("/presentation.pptx")

Version Control

# Check out for editing
fs.checkout("/document.docx")

# Make changes...
with fs.open("/document.docx", "w") as f:
    f.write("Updated content")

# Check back in with comment
fs.checkin("/document.docx", "Updated quarterly numbers")

# View version history
versions = fs.get_versions("/document.docx")
for version in versions:
    print(f"Version {version['id']}: {version['lastModifiedDateTime']}")

🔧 Installation

uv add msgraphfs-dev

Or with pip:

pip install msgraphfs-dev

⚙️ Setup Requirements

Azure App Registration

  1. Register an Azure AD application at https://portal.azure.com
  2. Configure API permissions (Application permissions for client credentials flow):
    • Sites.Read.All or Sites.ReadWrite.All
    • Files.Read.All or Files.ReadWrite.All
    • ⚠️ Important: Grant admin consent for your organization
  3. Create a client secret
  4. Note down:
    • Application (client) ID
    • Directory (tenant) ID
    • Client secret value

OAuth2 Scopes

MSGraphFS uses client credentials flow with the default scope (https://graph.microsoft.com/.default). This automatically includes all the application permissions you've granted to your Azure app registration.

You don't need to specify individual scopes - the library handles this automatically! 🎯

SharePoint Site Access

  • Ensure your Azure app has access to the SharePoint site
  • You only need the site name and drive name (e.g., "Documents")
  • No manual ID lookups required! 🎉

Legacy Usage (Advanced)

If you prefer to specify drive IDs directly:

fs = msgraphfs.MSGDriveFS(
    client_id="your-client-id",
    tenant_id="your-tenant-id",
    client_secret="your-client-secret",
    drive_id="specific-drive-id"  # Skip auto-discovery
)

Find drive IDs using Microsoft Graph Explorer:

  • Sites: GET /sites/{hostname}:/sites/{site-name}
  • Drives: GET /sites/{site-id}/drives

🛠️ Development

To develop this package, you can clone the repository and install the dependencies using uv:

git clone your-repo-url (a fork of https://github.com/acsone/msgraphfs)
cd msgraphfs
uv sync

This will install the package in editable mode with all dependencies, so you can make changes to the code and test them without having to reinstall the package every time.

To run the tests with the test dependencies:

uv run pytest

Or with pip (legacy):

pip install -e .[test]
pytest

Testing the package requires you to have access to a Microsoft Drive (OneDrive, Sharepoint, etc) and to have the client_id, client_secret, tenant_id, dirve_id, site_name and the user's access token.

How to get an access token required for testing

The first step is to get your user's access token.

Prerequisites

  • A registered Azure AD application with:
    • client_id and client_secret
    • Delegated permissions granted (e.g., Files.ReadWrite.All, Sites.ReadWrite.All)
    • A redirect URI configured (e.g., http://localhost:5000/callback)

1. Build the OAuth2 authorization URL

Open the following URL in your browser (replace values as needed):

https://login.microsoftonline.com/<TENANT_ID>/oauth2/v2.0/authorize?
client_id=<CLIENT_ID>
&response_type=code
&redirect_uri=http://localhost:5000/callback
&response_mode=query
&scope=offline_access%20User.Read%20Files.ReadWrite.All%20Sites.ReadWrite.All

You will be asked to log in with your Microsoft account and to grant the requested permissions.

2. Copy the Authorization Code

Once logged in, you'll be redirected to:

http://localhost:5000/callback?code=<AUTHORIZATION_CODE>

Copy the value of code from the URL.

Launch the test suite

To run the test suite, you just need to run the pytest command in the root directory with the following arguments:

  • --auth-code: The authorization code you got in the previous step. (It's only required if you launch the tests for the first time or if your refresh token is expired and you need to get a new access token)
  • --client-id: The client id of your Azure AD application.
  • --client-secret: The client secret of your Azure AD application.
  • --tenant-id: The tenant id of your Azure AD application.
  • --drive-id: The drive id of the drive you want to access.
  • --site-name: The name of the site you want to access. (Only required for tests related to the access to the recycling bin)
pytest --auth-code <AUTH_CODE> \
       --client-id <CLIENT_ID> \
       --client-secret <CLIENT_SECRET> \
       --tenant-id <TENANT_ID> \
       --drive-id <DRIVE_ID> \
       --site-name <SITE_NAME> \
       tests

Alternatively, you can set the environment variables MSGRAPHFS_AUTH_CODE, MSGRAPHFS_CLIENT_ID, MSGRAPHFS_CLIENT_SECRET, MSGRAPHFS_TENANT_ID, MSGRAPHFS_DRIVE_ID and MSGRAPHFS_SITE_NAME to avoid passing the arguments to pytest.

When the auth-code is provided and we need to get the access token (IOW when it's the first time you run the tests or when your refresh token is expired), the package will automatically get the access token and store it in a encrypted file into the keyring of your system. The call to the token endpoint requires a redirect_uri parameter. This one should match one of the redirect URIs you configured in your Azure AD application. By default, it is set to http://localhost:8069/microsoft_account/authentication, but you can change it by setting the environment variable MSGRAPHFS_AUTH_REDIRECT_URI or by passing the --auth-redirect-uri argument to pytest.

Pre-commit hooks

To ensure code quality, this package uses pre-commit hooks. You can install them by running:

pre-commit install

This will set up the pre-commit hooks to run automatically before each commit. You can also run them manually by executing:

pre-commit run --all-files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msgraphfs_dev-0.8.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msgraphfs_dev-0.8-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file msgraphfs_dev-0.8.tar.gz.

File metadata

  • Download URL: msgraphfs_dev-0.8.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.8.tar.gz
Algorithm Hash digest
SHA256 113d272f7b74aa371cc0c873885db655cbda48f078d55f706c91b09b2cf5c262
MD5 abce9369c58cd527a1485004fc31f15d
BLAKE2b-256 9dbc81a7cbb0f515c2bda9ff752e2e29bc5115c01c092a2c89dbdd457b74e595

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.8.tar.gz:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file msgraphfs_dev-0.8-py3-none-any.whl.

File metadata

  • Download URL: msgraphfs_dev-0.8-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for msgraphfs_dev-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 294ef4ae1eb43a75aaaeef7af188eabb744050bdb5061cc2dacecf56edce817b
MD5 566ff1478957b9c2ae80bf57d9f7d31d
BLAKE2b-256 846931260a602a9853c1a5c6120fa252aa46ef4fe3e7b9420cd3647a754eb7d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for msgraphfs_dev-0.8-py3-none-any.whl:

Publisher: release.yml on bolkedebruin/msgraphfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page