Skip to main content

Parse Azure Data Factory project files and evaluate them using AI models.

Project description

ADFMentor

A Python package for parsing Azure Data Factory (ADF) project files and evaluating them using AI models like Google Gemini.

Features

  • ๐Ÿ—๏ธ ADF Processing: Parse ADF pipeline, dataset, linked service, trigger, and dataflow JSON files
  • Pipeline โ€” Parse ADF pipeline JSON files (activities, dependencies, parameters)
  • Written โ€” Evaluate text-based answers about ADF concepts
  • ๐Ÿ“ Detailed Reports: Generate comprehensive grading reports from ADF project structures
  • ๐Ÿ“ฆ ZIP Support: Automatically handles ZIP file submissions โ€” no manual extraction needed
  • ๐Ÿ—‚๏ธ Flexible Inputs: Accepts a directory, a ZIP, or a single file (.json, .txt)
  • ๐Ÿ” Auto File Discovery: Locates ADF resource folders (pipeline/, dataset/, linkedService/, etc.)
  • โš ๏ธ Graceful Missing-File Scoring: Missing required files yield a 0 score and clear feedback
  • ๐Ÿงฉ Lesson Question Parser: Parse structured lesson text into pipeline and text question blocks
  • ๐Ÿ”ง Easy Integration: Simple API for evaluating student assignments and projects

Installation

From Source

git clone https://github.com/yourusername/ADFMentor.git
cd ADFMentor
pip install -e .

Using pip (when published)

pip install ADFMentor

Quick Start

from ADFmentor import ADFMentor

# Initialize with your Gemini API key
mentor = ADFMentor(api_key="your-api-key")

# Evaluate a full submission (pipeline + written)
# Works with directories, ZIP files, or single files (.json/.txt)
questions = """
PIPELINE: Create an ADF pipeline to copy data from Blob Storage to SQL Database
TEXT: Explain your pipeline design choices
"""

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, and best practices",
    "text": "Evaluate clarity and reasoning",
}

result = mentor.evaluate_all(
    answer_path="path/to/submission/",  # or "submission.zip"
    questions=questions,
    prompts=prompts,
)

print(f"Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

Package Structure

ADFMentor/
โ”œโ”€โ”€ __init__.py           # Main package entry point
โ”œโ”€โ”€ core.py               # ADFMentor class with evaluation methods
โ”œโ”€โ”€ models/               # AI model wrappers
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ model.py          # Abstract base model
โ”‚   โ””โ”€โ”€ gemini.py         # Google Gemini implementation
โ””โ”€โ”€ utils/                # Utility functions
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ processor.py      # ADF JSON parsing and report generation
    โ”œโ”€โ”€ checker.py        # File discovery helpers
    โ”œโ”€โ”€ extractor.py      # ZIP extraction utilities
    โ””โ”€โ”€ question_parser.py # Lesson question parser helpers

Core Components

ADFMentor Class

The main class provides a single evaluation method:

  • evaluate_all(answer_path, questions, prompts): Evaluates pipeline structure and written answers together and returns an overall score and combined feedback

Notes:

  • answer_path can be a directory, ZIP file, or a single submission file (.json, .txt).
  • questions and prompts must include pipeline and text keys. A section is skipped if its question is set to None.

ADF Processor

ADFMentor.utils.processor provides functions for processing ADF projects:

parse_adf_json(json_path)

Reads and parses a single ADF resource JSON file.

discover_adf_resources(directory)

Scans a directory for ADF resource folders:

  • pipeline/ โ€” Pipeline definitions
  • dataset/ โ€” Dataset definitions
  • linkedService/ โ€” Linked service definitions
  • trigger/ โ€” Trigger definitions
  • dataflow/ โ€” Dataflow definitions

extract_grading_info(resources)

Extracts key elements for grading:

  • Pipelines: activities, dependencies, parameters, variables
  • Datasets: type, linked service reference, schema, location
  • Linked Services: type, connection details (sanitized)
  • Triggers: type, schedule, pipeline references
  • Dataflows: sources, sinks, transformations

generate_grading_report(grading_info)

Formats extracted information into a readable text report.

analyze_adf(adf_path)

Convenience function that chains all steps above.

AI Models

Gemini Model

ADFMentor.models.gemini.Gemini

from ADFmentor.models import Gemini

model = Gemini(api_key="your-api-key", model_name="gemini-2.0-flash-exp")

# Evaluate text-based answers
result = model.evaluate(
    question="What are ADF linked services?",
    answer="Linked services are connection strings...",
    prompt="Evaluate for accuracy and completeness"
)

Response Format:

{
  "score": 85,
  "feedback": "Strong implementation with minor issues..."
}

Configuration

API Key Setup

Create a .env file in your project root:

API_KEY=your_gemini_api_key_here

Load it in your code:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")

Detailed Usage Examples

1. Analyze an ADF Project

from ADFmentor.utils import analyze_adf

# Generate a detailed report from an ADF project directory
report = analyze_adf("path/to/adf-project/")
print(report)

Sample Output:

============================================================
AZURE DATA FACTORY PROJECT REPORT
============================================================

PIPELINES:
  - CopyBlobToSQL
    Parameters: inputPath, outputTable
    Activities (3):
      โ€ข LookupSource (type: Lookup)
      โ€ข CopyData (type: Copy)
        depends on: LookupSource [Succeeded]
        source: BlobSource
        sink: SqlSink
      โ€ข StoredProcedure (type: SqlServerStoredProcedure)
        depends on: CopyData [Succeeded]

DATASETS:
  - BlobInput (type: DelimitedText)
    linked service: AzureBlobStorage
    location: type: AzureBlobStorageLocation, folder: input
  - SqlOutput (type: AzureSqlTable)
    linked service: AzureSqlDatabase
    table: dbo.SalesData

LINKED SERVICES:
  - AzureBlobStorage (type: AzureBlobStorage)
  - AzureSqlDatabase (type: AzureSqlDatabase)

TRIGGERS:
  - DailyTrigger (type: ScheduleTrigger)
    schedule: every 1 Day
    pipelines: CopyBlobToSQL

DATAFLOWS:
  none

SUMMARY:
  - total_pipelines: 1
  - total_activities: 3
  - total_datasets: 2
  - total_linked_services: 2
  - total_triggers: 1
  - total_dataflows: 0

2. Complete Evaluation Pipeline

from ADFmentor import ADFMentor

mentor = ADFMentor(api_key="your-api-key")

# Define questions and prompts for each evaluation type
questions = {
    "pipeline": "Create a pipeline to copy data from Blob to SQL with error handling",
    "text": "Explain your pipeline design choices"
}

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, error handling, and best practices",
    "text": "Evaluate clarity, justification, and understanding"
}

# Evaluate all aspects
result = mentor.evaluate_all(
    answer_path="path/to/student/submission/",
    questions=questions,
    prompts=prompts
)

print(f"Overall Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

3. Parse Lesson Questions

If your lesson content uses codes like "TEXT001", "PIPELINE002", you can parse it into question blocks:

from ADFmentor.utils.question_parser import parse_lesson_questions

lesson_text = """
"TEXT001"
1. Explain the purpose of linked services in ADF.

"PIPELINE002"
2. Create a pipeline to copy data from Blob Storage to SQL Database.
"""

questions = parse_lesson_questions(lesson_text)
# -> {"text": "1. ...", "pipeline": "2. ..."}

# Map to the evaluate_all() schema
questions = {
    "pipeline": questions["pipeline"],
    "text": questions["text"],
}

4. Skipping an Evaluation Section

To skip a section, set its question to None (the key must still exist):

questions = {
    "pipeline": "Create a copy pipeline with parameterized paths",
    "text": None,
}

prompts = {
    "pipeline": "Evaluate pipeline structure and best practices",
    "text": "Evaluate clarity, justification, and understanding",
}

ADF Project Structure

ADFMentor expects submissions to follow the standard ADF project structure:

adf-project/
โ”œโ”€โ”€ pipeline/          # Pipeline JSON definitions
โ”‚   โ””โ”€โ”€ CopyPipeline.json
โ”œโ”€โ”€ dataset/           # Dataset JSON definitions
โ”‚   โ”œโ”€โ”€ BlobInput.json
โ”‚   โ””โ”€โ”€ SqlOutput.json
โ”œโ”€โ”€ linkedService/     # Linked service definitions
โ”‚   โ”œโ”€โ”€ BlobStorage.json
โ”‚   โ””โ”€โ”€ SqlDatabase.json
โ”œโ”€โ”€ trigger/           # Trigger definitions
โ”‚   โ””โ”€โ”€ DailyTrigger.json
โ””โ”€โ”€ dataflow/          # Dataflow definitions (optional)

Each JSON file follows the standard ADF resource format with name, type, and properties fields.

Development

Running Tests

# Run integration tests
python tests/test.py

Project Dependencies

Core:

  • google-genai>=1.0.0 - Google Gemini API client
  • python-dotenv>=1.0.0 - Environment variable management

Optional:

  • google-cloud-aiplatform>=1.0.0 - For Vertex AI support

Requirements

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adfmentor-0.3.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adfmentor-0.3.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file adfmentor-0.3.0.tar.gz.

File metadata

  • Download URL: adfmentor-0.3.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f64a1aa6c9f39c9f2222f19b0c361b86b2d11e1c8eb743bd62bf8800e99d062e
MD5 c713c71a8051063b904117f2b562a828
BLAKE2b-256 6283f9c6e74b286a743dfbe500d3fc0557af9f113fbea7204834ebc79c007c45

See more details on using hashes here.

File details

Details for the file adfmentor-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: adfmentor-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 597ee810fd8c6e79026a4b03be4b2758530ba711c6b74d890104cd3e34e4025b
MD5 0aef8e2f2b899bfc453fd3c460dfafeb
BLAKE2b-256 b9027e9a8b77cd1f7ffde3a0cafea29fd62a0b94e6bea6de639d209ce9cb33e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page