A Python library for syncing files between Microsoft 365 SharePoint and local storage
Project description
MS365Sync
A Python library for syncing files between Microsoft 365 SharePoint and local storage.
Features
- 🔄 Two-way sync detection: Automatically detects added, modified, and deleted files
- 📁 Hierarchical support: Maintains folder structures during sync
- 🔐 OAuth2 authentication: Secure authentication using Microsoft Graph API
- 🔓 Permissions tracking: Maintains a
.permissions.jsonfile with file-level permissions - 📊 Detailed logging: Comprehensive sync reports and file trees
- 🚀 CLI and library: Use as a command-line tool or import as a Python library
- ⚡ Efficient: Only downloads changed files to minimize bandwidth usage
Installation
From PyPI (when published)
pip install ms365sync
From source
git clone https://github.com/yourusername/ms365sync.git
cd ms365sync
pip install -e .
Development installation
git clone https://github.com/yourusername/ms365sync.git
cd ms365sync
pip install -e ".[dev]"
Configuration
Create a .env file in your project directory with the following variables:
TENANT_ID=your-azure-tenant-id
CLIENT_ID=your-azure-app-client-id
CLIENT_SECRET=your-azure-app-client-secret
Azure App Registration
- Go to the Azure Portal
- Navigate to "Azure Active Directory" → "App registrations"
- Click "New registration"
- Set application type to "Web"
- Under "API permissions", add:
Sites.Read.All(to read SharePoint sites)Files.Read.All(to read files)Files.ReadWrite.All(if you need write access)
- Generate a client secret under "Certificates & secrets"
- Copy the Application (client) ID, Directory (tenant) ID, and client secret
Usage
Command Line Interface
# Basic sync
ms365sync
# Verbose output
ms365sync --verbose
# Dry run (see what would be synced)
ms365sync --dry-run
# Use custom config file
ms365sync --config /path/to/your/.env
Python Library
from ms365sync import SharePointSync
# Initialize the sync client
syncer = SharePointSync()
# Perform sync and get changes
changes = syncer.sync()
print(f"Added: {len(changes['added'])} files")
print(f"Modified: {len(changes['modified'])} files")
print(f"Deleted: {len(changes['deleted'])} files")
Advanced Usage
from ms365sync import SharePointSync
import os
# Custom configuration
os.environ['TENANT_ID'] = 'your-tenant-id'
os.environ['CLIENT_ID'] = 'your-client-id'
os.environ['CLIENT_SECRET'] = 'your-client-secret'
syncer = SharePointSync()
# Get SharePoint files without syncing
sp_files = syncer.get_sharepoint_files()
print(f"Found {len(sp_files)} files in SharePoint")
# Get local files
local_files = syncer.get_local_files()
print(f"Found {len(local_files)} local files")
# Compare without syncing
added, modified, deleted = syncer.compare_files(sp_files, local_files)
print(f"Would add: {len(added)}, modify: {len(modified)}, delete: {len(deleted)}")
Configuration Options
The library uses the following configuration variables (set in .env or environment):
| Variable | Description | Required |
|---|---|---|
TENANT_ID |
Azure Active Directory tenant ID | Yes |
CLIENT_ID |
Azure app registration client ID | Yes |
CLIENT_SECRET |
Azure app registration client secret | Yes |
The following constants can be modified in the code:
SHAREPOINT_HOST = "your-sharepoint-site.sharepoint.com"
SITE_NAME = "Your Site Name" # Display name as seen in SharePoint
DOC_LIBRARY = "Your Document Library" # Display name
LOCAL_ROOT = pathlib.Path("ms365_data/data") # Local destination folder
File Structure
ms365sync/
├── __init__.py # Package initialization
├── sharepoint_sync.py # Main sync logic
└── cli.py # Command-line interface
ms365_data/ # Data folder (in .gitignore)
├── data/ # Downloaded files from SharePoint
└── .permissions.json # File permissions tracking
sync_logs/ # Sync change logs (JSON)
Permissions Tracking
The library automatically tracks permissions for all synced files in a .permissions.json file located in the ms365_data directory. This file:
- Contains file paths as keys and permission lists as values
- Updates automatically when files are added, modified, or deleted
- Stores permissions in a simple format: "Display Name:::Permission Level"
- Permission levels include: Full Control, Edit, View
Example .permissions.json structure:
{
"Documents/Report.pdf": [
"Phi Chat Test Site Owners:::Full Control",
"AI Team:::Edit",
"Phi Chat Test Site Visitors:::View"
],
"Projects/Presentation.pptx": [
"Project Managers:::Full Control",
"Team Members:::Edit",
"Sharing Link (view, anonymous):::View"
]
}
Sync Process
- Authentication: Connects to Microsoft Graph API using OAuth2
- Discovery: Recursively scans SharePoint document library
- Permissions: Fetches permissions for each file
- Comparison: Compares SharePoint files with local files by size and modification date
- Sync: Downloads new/modified files, deletes files removed from SharePoint
- Permissions Update: Updates
.permissions.jsonwith current permissions - Logging: Saves detailed change log to
sync_logs/sync_changes_TIMESTAMP.json
RAG Database Integration
The sync process generates a comprehensive sync_changes_TIMESTAMP.json file designed for RAG database updates. This file contains:
Structure
{
"timestamp": "2024-01-20_14-30-45",
"summary": {
"total_files": 42,
"added_count": 3,
"modified_count": 2,
"deleted_count": 1,
"permission_only_changes_count": 4
},
"changes": {
"added": {
"path/to/new/file.pdf": {
"permissions": [
"Team Owners:::Full Control",
"Team Members:::Edit"
],
"file_path": "ms365_data/data/path/to/new/file.pdf"
}
},
"modified": {
"path/to/modified/file.docx": {
"content_changed": true,
"permissions_changed": true,
"file_path": "ms365_data/data/path/to/modified/file.docx",
"permission_changes": {
"added": ["New User:::View"],
"removed": ["Old User:::Edit"],
"current": ["Team Owners:::Full Control", "New User:::View"]
}
}
},
"permission_only_changes": {
"path/to/unchanged/file.xlsx": {
"permission_changes": {
"added": ["Marketing Team:::Edit"],
"removed": ["Sales Team:::View"],
"current": ["Owners:::Full Control", "Marketing Team:::Edit"]
},
"file_path": "ms365_data/data/path/to/unchanged/file.xlsx"
}
},
"deleted": {
"path/to/deleted/file.pptx": {
"permissions": [
"Team Owners:::Full Control",
"All Users:::View"
]
}
}
}
}
Using sync_changes.json for RAG Updates
- Added Files: Ingest the file content and add all listed permissions
- Modified Files:
- If
content_changedis true, re-ingest the file content - If
permissions_changedis true, update permissions (add/remove as specified)
- If
- Permission-Only Changes: Update permissions without re-ingesting content
- Deleted Files: Remove from RAG database and remove all associated permissions
See examples/rag_sync_example.py for a complete example of processing sync changes.
Error Handling
The library includes comprehensive error handling:
- Authentication errors: Clear messages for invalid credentials
- Network errors: Retry logic for temporary connection issues
- File system errors: Graceful handling of permission issues
- API errors: Proper handling of SharePoint/Graph API limitations
Development
Setting up development environment
git clone https://github.com/yourusername/ms365sync.git
cd ms365sync
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e ".[dev]"
Running tests
pytest
Code formatting
black ms365sync/
isort ms365sync/
Type checking
mypy ms365sync/
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
Version 0.1.0
- Initial release
- Basic SharePoint to local sync functionality
- CLI interface
- Comprehensive logging and error handling
- File permissions tracking
Roadmap
- Implement dry-run mode
- Add configuration file support (YAML/JSON)
- Implement upload functionality (local to SharePoint)
- Add filtering options (file types, patterns)
- Add scheduled sync support
- Implement incremental sync optimization
- Add progress bars for large syncs
- Support for multiple SharePoint sites
- Permission change notifications
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ms365sync-0.2.0.tar.gz.
File metadata
- Download URL: ms365sync-0.2.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c5570b758a779db76b024ebc51cc00681ff3eb1311b11523e550ddac78f8e54
|
|
| MD5 |
43c2db9bccd949a4323fa170f3c06ba4
|
|
| BLAKE2b-256 |
27abc0f5772d99eba1b4249b42697cd5875b4dd89c8c4218546fea0704a5dda9
|
File details
Details for the file ms365sync-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ms365sync-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7628c1d804a65f63f5422124dc109a46fe8d509163178d3e5c2824dbc1d0b658
|
|
| MD5 |
ccc5bea0ef83c6a70f32301bc2f5759e
|
|
| BLAKE2b-256 |
83725abb4fc37c81619ae0f5cc80b94893f81ee71e4528f5a1f6cb4904d721fc
|