Parse Azure Data Factory project files and evaluate them using AI models.
Project description
ADFMentor
A Python package for parsing Azure Data Factory (ADF) project files and evaluating them using AI models like Google Gemini.
Features
- ๐๏ธ ADF Processing: Parse ADF pipeline, dataset, linked service, trigger, and dataflow JSON files
- Pipeline โ Parse ADF pipeline JSON files (activities, dependencies, parameters)
- Written โ Evaluate text-based answers about ADF concepts
- ๐ Detailed Reports: Generate comprehensive grading reports from ADF project structures
- ๐ฆ ZIP Support: Automatically handles ZIP file submissions โ no manual extraction needed
- ๐๏ธ Flexible Inputs: Accepts a directory, a ZIP, or a single file (
.json,.txt) - ๐ Auto File Discovery: Locates ADF resource folders (
pipeline/,dataset/,linkedService/, etc.) - โ ๏ธ Graceful Missing-File Scoring: Missing required files yield a
0score and clear feedback - ๐งฉ Lesson Question Parser: Parse structured lesson text into
pipelineandtextquestion blocks - ๐ง Easy Integration: Simple API for evaluating student assignments and projects
Installation
From Source
git clone https://github.com/yourusername/ADFMentor.git
cd ADFMentor
pip install -e .
Using pip (when published)
pip install ADFMentor
Quick Start
from ADFmentor import ADFMentor
# Initialize with your Gemini API key
mentor = ADFMentor(api_key="your-api-key")
# Evaluate a full submission (pipeline + written)
# Works with directories, ZIP files, or single files (.json/.txt)
questions = """
PIPELINE: Create an ADF pipeline to copy data from Blob Storage to SQL Database
TEXT: Explain your pipeline design choices
"""
prompts = {
"pipeline": "Evaluate pipeline structure, activities, and best practices",
"text": "Evaluate clarity and reasoning",
}
result = mentor.evaluate_all(
answer_path="path/to/submission/", # or "submission.zip"
questions=questions,
prompts=prompts,
)
print(f"Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")
Package Structure
ADFMentor/
โโโ __init__.py # Main package entry point
โโโ core.py # ADFMentor class with evaluation methods
โโโ models/ # AI model wrappers
โ โโโ __init__.py
โ โโโ model.py # Abstract base model
โ โโโ gemini.py # Google Gemini implementation
โโโ utils/ # Utility functions
โโโ __init__.py
โโโ processor.py # ADF JSON parsing and report generation
โโโ checker.py # File discovery helpers
โโโ extractor.py # ZIP extraction utilities
โโโ question_parser.py # Lesson question parser helpers
Core Components
ADFMentor Class
The main class provides a single evaluation method:
evaluate_all(answer_path, questions, prompts): Evaluates pipeline structure and written answers together and returns an overall score and combined feedback
Notes:
answer_pathcan be a directory, ZIP file, or a single submission file (.json,.txt).questionsandpromptsmust includepipelineandtextkeys. A section is skipped if its question is set toNone.
ADF Processor
ADFMentor.utils.processor provides functions for processing ADF projects:
parse_adf_json(json_path)
Reads and parses a single ADF resource JSON file.
discover_adf_resources(directory)
Scans a directory for ADF resource folders:
pipeline/โ Pipeline definitionsdataset/โ Dataset definitionslinkedService/โ Linked service definitionstrigger/โ Trigger definitionsdataflow/โ Dataflow definitions
extract_grading_info(resources)
Extracts key elements for grading:
- Pipelines: activities, dependencies, parameters, variables
- Datasets: type, linked service reference, schema, location
- Linked Services: type, connection details (sanitized)
- Triggers: type, schedule, pipeline references
- Dataflows: sources, sinks, transformations
generate_grading_report(grading_info)
Formats extracted information into a readable text report.
analyze_adf(adf_path)
Convenience function that chains all steps above.
AI Models
Gemini Model
ADFMentor.models.gemini.Gemini
from ADFmentor.models import Gemini
model = Gemini(api_key="your-api-key", model_name="gemini-2.0-flash-exp")
# Evaluate text-based answers
result = model.evaluate(
question="What are ADF linked services?",
answer="Linked services are connection strings...",
prompt="Evaluate for accuracy and completeness"
)
Response Format:
{
"score": 85,
"feedback": "Strong implementation with minor issues..."
}
Configuration
API Key Setup
Create a .env file in your project root:
API_KEY=your_gemini_api_key_here
Load it in your code:
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("API_KEY")
Detailed Usage Examples
1. Analyze an ADF Project
from ADFmentor.utils import analyze_adf
# Generate a detailed report from an ADF project directory
report = analyze_adf("path/to/adf-project/")
print(report)
Sample Output:
============================================================
AZURE DATA FACTORY PROJECT REPORT
============================================================
PIPELINES:
- CopyBlobToSQL
Parameters: inputPath, outputTable
Activities (3):
โข LookupSource (type: Lookup)
โข CopyData (type: Copy)
depends on: LookupSource [Succeeded]
source: BlobSource
sink: SqlSink
โข StoredProcedure (type: SqlServerStoredProcedure)
depends on: CopyData [Succeeded]
DATASETS:
- BlobInput (type: DelimitedText)
linked service: AzureBlobStorage
location: type: AzureBlobStorageLocation, folder: input
- SqlOutput (type: AzureSqlTable)
linked service: AzureSqlDatabase
table: dbo.SalesData
LINKED SERVICES:
- AzureBlobStorage (type: AzureBlobStorage)
- AzureSqlDatabase (type: AzureSqlDatabase)
TRIGGERS:
- DailyTrigger (type: ScheduleTrigger)
schedule: every 1 Day
pipelines: CopyBlobToSQL
DATAFLOWS:
none
SUMMARY:
- total_pipelines: 1
- total_activities: 3
- total_datasets: 2
- total_linked_services: 2
- total_triggers: 1
- total_dataflows: 0
2. Complete Evaluation Pipeline
from ADFmentor import ADFMentor
mentor = ADFMentor(api_key="your-api-key")
# Define questions and prompts for each evaluation type
questions = {
"pipeline": "Create a pipeline to copy data from Blob to SQL with error handling",
"text": "Explain your pipeline design choices"
}
prompts = {
"pipeline": "Evaluate pipeline structure, activities, error handling, and best practices",
"text": "Evaluate clarity, justification, and understanding"
}
# Evaluate all aspects
result = mentor.evaluate_all(
answer_path="path/to/student/submission/",
questions=questions,
prompts=prompts
)
print(f"Overall Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")
3. Parse Lesson Questions
If your lesson content uses codes like "TEXT001", "PIPELINE002", you can parse it into question blocks:
from ADFmentor.utils.question_parser import parse_lesson_questions
lesson_text = """
"TEXT001"
1. Explain the purpose of linked services in ADF.
"PIPELINE002"
2. Create a pipeline to copy data from Blob Storage to SQL Database.
"""
questions = parse_lesson_questions(lesson_text)
# -> {"text": "1. ...", "pipeline": "2. ..."}
# Map to the evaluate_all() schema
questions = {
"pipeline": questions["pipeline"],
"text": questions["text"],
}
4. Skipping an Evaluation Section
To skip a section, set its question to None (the key must still exist):
questions = {
"pipeline": "Create a copy pipeline with parameterized paths",
"text": None,
}
prompts = {
"pipeline": "Evaluate pipeline structure and best practices",
"text": "Evaluate clarity, justification, and understanding",
}
ADF Project Structure
ADFMentor expects submissions to follow the standard ADF project structure:
adf-project/
โโโ pipeline/ # Pipeline JSON definitions
โ โโโ CopyPipeline.json
โโโ dataset/ # Dataset JSON definitions
โ โโโ BlobInput.json
โ โโโ SqlOutput.json
โโโ linkedService/ # Linked service definitions
โ โโโ BlobStorage.json
โ โโโ SqlDatabase.json
โโโ trigger/ # Trigger definitions
โ โโโ DailyTrigger.json
โโโ dataflow/ # Dataflow definitions (optional)
Each JSON file follows the standard ADF resource format with name, type, and properties fields.
Development
Running Tests
# Run integration tests
python tests/test.py
Project Dependencies
Core:
google-genai>=1.0.0- Google Gemini API clientpython-dotenv>=1.0.0- Environment variable management
Optional:
google-cloud-aiplatform>=1.0.0- For Vertex AI support
Requirements
- Python 3.9 or higher
- Google Gemini API key (get one at Google AI Studio)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adfmentor-0.3.1.tar.gz.
File metadata
- Download URL: adfmentor-0.3.1.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba79c31bdf3bfb99809fb01439bd0c581a56ed902ce9fce02567272782197dee
|
|
| MD5 |
f050c21fea05a2cc8024ac6d0a0264a0
|
|
| BLAKE2b-256 |
2698e2ce4fae220621f89ca8e7c4c1feff5282b186fe5f8b6e1b32b6a6ffd8bd
|
File details
Details for the file adfmentor-0.3.1-py3-none-any.whl.
File metadata
- Download URL: adfmentor-0.3.1-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62790770e3e8727f923c3ef4b2131bb99ea7a2046afe9b5637f98614302cd0b2
|
|
| MD5 |
97f8f4ac73f6c68ca503ca0ae0512b70
|
|
| BLAKE2b-256 |
04511fb5021659632f483c8344f5ee5074f5a692c26c244676d97121fe7c6bc2
|