No project description provided
Project description
MSGraphFS
A fsspec based filesystem for Microsoft Graph API drives (SharePoint, OneDrive). Features lazy initialization, fork-safety for multi-process environments like Airflow, and comprehensive permission management.
📖 Microsoft Graph OneDrive API Documentation
🚀 Quick Start
Simple Usage
import msgraphfs
# Easy setup - just provide your app credentials and site/drive names
fs = msgraphfs.MSGDriveFS(
client_id="your-client-id",
tenant_id="your-tenant-id",
client_secret="your-client-secret",
site_name="YourSiteName", # SharePoint site name
drive_name="Documents" # Optional: defaults to site's default drive
)
# Start using it like any filesystem
files = fs.ls("/")
print(f"Found {len(files)} items")
# Read files
with fs.open("/path/to/file.txt") as f:
content = f.read()
# Write files
with fs.open("/path/to/new_file.txt", "w") as f:
f.write("Hello SharePoint!")
Using fsspec Protocol
import fsspec
fs = fsspec.filesystem("msgd",
client_id="your-client-id",
tenant_id="your-tenant-id",
client_secret="your-client-secret",
site_name="YourSiteName",
drive_name="Documents"
)
fs.ls("/")
Environment Variables
You can also use environment variables:
export MSGRAPHFS_CLIENT_ID="your-client-id"
export MSGRAPHFS_TENANT_ID="your-tenant-id"
export MSGRAPHFS_CLIENT_SECRET="your-client-secret"
import msgraphfs
# Credentials loaded from environment
fs = msgraphfs.MSGDriveFS(
site_name="YourSiteName",
drive_name="Documents"
)
✨ Key Features
🔧 Automatic Discovery
- No manual drive/site ID lookup required - just provide site and drive names
- Automatic OAuth2 token management - handles client credentials flow
- Fork-safe lazy initialization - perfect for multi-process environments like Airflow
🔐 Permission Management
# Get detailed permissions for any file/directory
permissions = fs.get_permissions("/sensitive-document.pdf")
print(f"Total permissions: {permissions['summary']['total_permissions']}")
print(f"Users with access: {permissions['summary']['user_count']}")
# Check specific users and roles
for user in permissions['users']:
print(f"{user['display_name']}: {', '.join(user['roles'])}")
📁 Enhanced File Operations
- Expand queries: Get additional metadata with
expand="permissions"orexpand="thumbnails" - Version control:
get_versions(),checkin(),checkout() - File preview:
preview()for web preview URLs - Format conversion:
get_content(format="pdf")to convert documents
🚀 Airflow Integration
from airflow.io.path import ObjectStoragePath, attach
import msgraphfs
# Safe to do at module level - lazy initialization prevents fork issues
attach(protocol="sharepoint", fs=msgraphfs.MSGDriveFS(
site_name="YourSite",
drive_name="Documents",
# credentials from environment or parameters
))
@task
def process_files():
# Works perfectly in Airflow tasks
src_path = ObjectStoragePath("sharepoint://folder/file.docx")
content = src_path.read_text()
return content
📋 Advanced Usage
Working with Item IDs
Many methods accept an optional item_id parameter for efficiency:
# Get item ID for later use
item_id = fs.get_item_id("/important/document.pdf")
# Use item_id to avoid path lookups
info = fs.info("/any/path", item_id=item_id)
content = fs.get_content(item_id=item_id)
permissions = fs.get_permissions(item_id=item_id)
Document Conversion
# Convert Word document to PDF
pdf_content = fs.get_content("/document.docx", format="pdf")
# Get file preview URL
preview_url = fs.preview("/presentation.pptx")
Version Control
# Check out for editing
fs.checkout("/document.docx")
# Make changes...
with fs.open("/document.docx", "w") as f:
f.write("Updated content")
# Check back in with comment
fs.checkin("/document.docx", "Updated quarterly numbers")
# View version history
versions = fs.get_versions("/document.docx")
for version in versions:
print(f"Version {version['id']}: {version['lastModifiedDateTime']}")
🔧 Installation
uv add msgraphfs-dev
Or with pip:
pip install msgraphfs-dev
⚙️ Setup Requirements
Azure App Registration
- Register an Azure AD application at https://portal.azure.com
- Configure API permissions (Application permissions for client credentials flow):
Sites.Read.AllorSites.ReadWrite.AllFiles.Read.AllorFiles.ReadWrite.All- ⚠️ Important: Grant admin consent for your organization
- Create a client secret
- Note down:
- Application (client) ID
- Directory (tenant) ID
- Client secret value
OAuth2 Scopes
MSGraphFS uses client credentials flow with the default scope (https://graph.microsoft.com/.default). This automatically includes all the application permissions you've granted to your Azure app registration.
You don't need to specify individual scopes - the library handles this automatically! 🎯
SharePoint Site Access
- Ensure your Azure app has access to the SharePoint site
- You only need the site name and drive name (e.g., "Documents")
- No manual ID lookups required! 🎉
Legacy Usage (Advanced)
If you prefer to specify drive IDs directly:
fs = msgraphfs.MSGDriveFS(
client_id="your-client-id",
tenant_id="your-tenant-id",
client_secret="your-client-secret",
drive_id="specific-drive-id" # Skip auto-discovery
)
Find drive IDs using Microsoft Graph Explorer:
- Sites:
GET /sites/{hostname}:/sites/{site-name} - Drives:
GET /sites/{site-id}/drives
🛠️ Development
To develop this package, you can clone the repository and install the dependencies using uv:
git clone your-repo-url (a fork of https://github.com/acsone/msgraphfs)
cd msgraphfs
uv sync
This will install the package in editable mode with all dependencies, so you can make changes to the code and test them without having to reinstall the package every time.
To run the tests with the test dependencies:
uv run pytest
Or with pip (legacy):
pip install -e .[test]
pytest
Testing the package requires you to have access to a Microsoft Drive (OneDrive, Sharepoint, etc) and to have the client_id, client_secret, tenant_id, dirve_id, site_name and the user's
access token.
How to get an access token required for testing
The first step is to get your user's access token.
Prerequisites
- A registered Azure AD application with:
client_idandclient_secret- Delegated permissions granted (e.g.,
Files.ReadWrite.All,Sites.ReadWrite.All) - A redirect URI configured (e.g.,
http://localhost:5000/callback)
1. Build the OAuth2 authorization URL
Open the following URL in your browser (replace values as needed):
https://login.microsoftonline.com/<TENANT_ID>/oauth2/v2.0/authorize?
client_id=<CLIENT_ID>
&response_type=code
&redirect_uri=http://localhost:5000/callback
&response_mode=query
&scope=offline_access%20User.Read%20Files.ReadWrite.All%20Sites.ReadWrite.All
You will be asked to log in with your Microsoft account and to grant the requested permissions.
2. Copy the Authorization Code
Once logged in, you'll be redirected to:
http://localhost:5000/callback?code=<AUTHORIZATION_CODE>
Copy the value of code from the URL.
Launch the test suite
To run the test suite, you just need to run the pytest command in the root directory with the following arguments:
- --auth-code: The authorization code you got in the previous step. (It's only required if you launch the tests for the first time or if your refresh token is expired and you need to get a new access token)
- --client-id: The client id of your Azure AD application.
- --client-secret: The client secret of your Azure AD application.
- --tenant-id: The tenant id of your Azure AD application.
- --drive-id: The drive id of the drive you want to access.
- --site-name: The name of the site you want to access. (Only required for tests related to the access to the recycling bin)
pytest --auth-code <AUTH_CODE> \
--client-id <CLIENT_ID> \
--client-secret <CLIENT_SECRET> \
--tenant-id <TENANT_ID> \
--drive-id <DRIVE_ID> \
--site-name <SITE_NAME> \
tests
Alternatively, you can set the environment variables MSGRAPHFS_AUTH_CODE, MSGRAPHFS_CLIENT_ID, MSGRAPHFS_CLIENT_SECRET, MSGRAPHFS_TENANT_ID, MSGRAPHFS_DRIVE_ID and MSGRAPHFS_SITE_NAME to avoid passing the arguments to pytest.
When the auth-code is provided and we need to get the access token (IOW when it's the first time you run the tests or when your refresh token is expired), the package will automatically get the access token and store it
in a encrypted file into the keyring of your system. The call to the token endpoint requires a redirect_uri parameter. This one should match one of the redirect URIs you configured in your Azure AD application.
By default, it is set to http://localhost:8069/microsoft_account/authentication, but you can change it by setting the environment variable MSGRAPHFS_AUTH_REDIRECT_URI or by passing the --auth-redirect-uri argument to pytest.
Pre-commit hooks
To ensure code quality, this package uses pre-commit hooks. You can install them by running:
pre-commit install
This will set up the pre-commit hooks to run automatically before each commit. You can also run them manually by executing:
pre-commit run --all-files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msgraphfs_dev-0.8.tar.gz.
File metadata
- Download URL: msgraphfs_dev-0.8.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
113d272f7b74aa371cc0c873885db655cbda48f078d55f706c91b09b2cf5c262
|
|
| MD5 |
abce9369c58cd527a1485004fc31f15d
|
|
| BLAKE2b-256 |
9dbc81a7cbb0f515c2bda9ff752e2e29bc5115c01c092a2c89dbdd457b74e595
|
Provenance
The following attestation bundles were made for msgraphfs_dev-0.8.tar.gz:
Publisher:
release.yml on bolkedebruin/msgraphfs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msgraphfs_dev-0.8.tar.gz -
Subject digest:
113d272f7b74aa371cc0c873885db655cbda48f078d55f706c91b09b2cf5c262 - Sigstore transparency entry: 532938582
- Sigstore integration time:
-
Permalink:
bolkedebruin/msgraphfs@6618b95d83f441dafd8040a2ce826335793f7334 -
Branch / Tag:
refs/tags/v0.8 - Owner: https://github.com/bolkedebruin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6618b95d83f441dafd8040a2ce826335793f7334 -
Trigger Event:
release
-
Statement type:
File details
Details for the file msgraphfs_dev-0.8-py3-none-any.whl.
File metadata
- Download URL: msgraphfs_dev-0.8-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
294ef4ae1eb43a75aaaeef7af188eabb744050bdb5061cc2dacecf56edce817b
|
|
| MD5 |
566ff1478957b9c2ae80bf57d9f7d31d
|
|
| BLAKE2b-256 |
846931260a602a9853c1a5c6120fa252aa46ef4fe3e7b9420cd3647a754eb7d8
|
Provenance
The following attestation bundles were made for msgraphfs_dev-0.8-py3-none-any.whl:
Publisher:
release.yml on bolkedebruin/msgraphfs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msgraphfs_dev-0.8-py3-none-any.whl -
Subject digest:
294ef4ae1eb43a75aaaeef7af188eabb744050bdb5061cc2dacecf56edce817b - Sigstore transparency entry: 532938590
- Sigstore integration time:
-
Permalink:
bolkedebruin/msgraphfs@6618b95d83f441dafd8040a2ce826335793f7334 -
Branch / Tag:
refs/tags/v0.8 - Owner: https://github.com/bolkedebruin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6618b95d83f441dafd8040a2ce826335793f7334 -
Trigger Event:
release
-
Statement type: