Pyspark custom data source for Microsoft Graph APIs, including path and query parameters, with PySpark read examples.
Project description
PySpark Microsoft Graph Source
A PySpark DataSource to seamlessly integrate and read data from Microsoft Graph API, enabling easy access to resources like SharePoint List Items, and more.
Features
-
Entra ID Authentication Securely authenticate with Microsoft Graph using DefaultAzureCredential, supporting local development and production seamlessly.
-
Automatic Pagination Handling Fetches all paginated data from Microsoft Graph without manual intervention.
-
Dynamic Schema Inference Automatically detects the schema of the resource by sampling data, so you don't need to define it manually.
-
Simple Configuration with .option() Easily configure resources and query parameters directly in your Spark read options, making it flexible and intuitive.
-
Zero External Ingestion Services No additional services like Azure Data Factory or Logic Apps are needed—directly ingest data into Spark from Microsoft Graph.
-
Extensible Resource Providers Add custom resource providers to support more Microsoft Graph endpoints as needed.
-
Pluggable Architecture Dynamically load resource providers without modifying core logic.
-
Optimized for PySpark Designed to work natively with Spark's DataFrame API for big data processing.
-
Secure by Design Credentials and secrets are handled using Azure Identity best practices, avoiding hardcoding sensitive data.
Installation
pip install pyspark-msgraph-source
⚡ Quickstart
1. Authentication
This package uses DefaultAzureCredential.
Ensure you're authenticated:
az login
Or set environment variables:
export AZURE_CLIENT_ID=<your-client-id>
export AZURE_TENANT_ID=<your-tenant-id>
export AZURE_CLIENT_SECRET=<your-client-secret>
2. Example Usage
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("MSGraphExample") \
.getOrCreate()
from pyspark_msgraph_source.core.source import MSGraphDataSource
spark.dataSource.register(MSGraphDataSource)
df = spark.read.format("msgraph") \
.option("resource", "list_items") \
.option("site-id", "<YOUR_SITE_ID>") \
.option("list-id", "<YOUR_LIST_ID>") \
.option("top", 100) \
.option("expand", "fields") \
.load()
df.show()
# with schema
df = spark.read.format("msgraph") \
.option("resource", "list_items") \
.option("site-id", "<YOUR_SITE_ID>") \
.option("list-id", "<YOUR_LIST_ID>") \
.option("top", 100) \
.option("expand", "fields") \
.schema("id string, Title string")
.load()
df.show()
Supported Resources
| Resource | Description |
|---|---|
list_items |
SharePoint List Items |
| (more coming soon...) |
Development
Coming soon...
Troubleshooting
| Issue | Solution |
|---|---|
ValueError: resource missing |
Add .option("resource", "list_items") |
| Empty dataframe | Verify IDs, permissions, and access |
| Authentication failures | Check Azure credentials and login status |
📄 License
📚 Resources
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspark_msgraph_source-0.3.0.tar.gz.
File metadata
- Download URL: pyspark_msgraph_source-0.3.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0bcfa9ffaa18d28690a0d17a18ddd2dee6e574a6b2bf57d9f216495b16ed43b
|
|
| MD5 |
1c9d317576533d15f183f5764ad011c1
|
|
| BLAKE2b-256 |
34475a2d6ee23a771b7f0acc6d1fda3ddfd40c48a5cc8d8cd0f8d3e43d3e3937
|
File details
Details for the file pyspark_msgraph_source-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pyspark_msgraph_source-0.3.0-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbec51d137ece6f91d5fea5a4bfbe4404100f248879b34e13dc3ffb19a2486b3
|
|
| MD5 |
0387b5565aa06802c5b39cd8b9e77760
|
|
| BLAKE2b-256 |
08ef23c68984c13d17cd5281ecca69513c3db65fe5e32e2886beaaaafa735eef
|