Skip to main content

Parse Azure Data Factory project files and evaluate them using AI models.

Project description

ADFMentor

A Python package for parsing Azure Data Factory (ADF) project files and evaluating them using AI models like Google Gemini.

Features

  • ๐Ÿ—๏ธ ADF Processing: Parse ADF pipeline, dataset, linked service, trigger, and dataflow JSON files
  • Pipeline โ€” Parse ADF pipeline JSON files (activities, dependencies, parameters)
  • Written โ€” Evaluate text-based answers about ADF concepts
  • ๐Ÿ“ Detailed Reports: Generate comprehensive grading reports from ADF project structures
  • ๐Ÿ“ฆ ZIP Support: Automatically handles ZIP file submissions โ€” no manual extraction needed
  • ๐Ÿ—‚๏ธ Flexible Inputs: Accepts a directory, a ZIP, or a single file (.json, .txt)
  • ๐Ÿ” Auto File Discovery: Locates ADF resource folders (pipeline/, dataset/, linkedService/, etc.)
  • โš ๏ธ Graceful Missing-File Scoring: Missing required files yield a 0 score and clear feedback
  • ๐Ÿงฉ Lesson Question Parser: Parse structured lesson text into pipeline and text question blocks
  • ๐Ÿ”ง Easy Integration: Simple API for evaluating student assignments and projects

Installation

From Source

git clone https://github.com/yourusername/ADFMentor.git
cd ADFMentor
pip install -e .

Using pip (when published)

pip install ADFMentor

Quick Start

from ADFmentor import ADFMentor

# Initialize with your Gemini API key
mentor = ADFMentor(api_key="your-api-key")

# Evaluate a full submission (pipeline + written)
# Works with directories, ZIP files, or single files (.json/.txt)
questions = """
PIPELINE: Create an ADF pipeline to copy data from Blob Storage to SQL Database
TEXT: Explain your pipeline design choices
"""

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, and best practices",
    "text": "Evaluate clarity and reasoning",
}

result = mentor.evaluate_all(
    answer_path="path/to/submission/",  # or "submission.zip"
    questions=questions,
    prompts=prompts,
)

print(f"Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

Package Structure

ADFMentor/
โ”œโ”€โ”€ __init__.py           # Main package entry point
โ”œโ”€โ”€ core.py               # ADFMentor class with evaluation methods
โ”œโ”€โ”€ models/               # AI model wrappers
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ model.py          # Abstract base model
โ”‚   โ””โ”€โ”€ gemini.py         # Google Gemini implementation
โ””โ”€โ”€ utils/                # Utility functions
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ processor.py      # ADF JSON parsing and report generation
    โ”œโ”€โ”€ checker.py        # File discovery helpers
    โ”œโ”€โ”€ extractor.py      # ZIP extraction utilities
    โ””โ”€โ”€ question_parser.py # Lesson question parser helpers

Core Components

ADFMentor Class

The main class provides a single evaluation method:

  • evaluate_all(answer_path, questions, prompts): Evaluates pipeline structure and written answers together and returns an overall score and combined feedback

Notes:

  • answer_path can be a directory, ZIP file, or a single submission file (.json, .txt).
  • questions and prompts must include pipeline and text keys. A section is skipped if its question is set to None.

ADF Processor

ADFMentor.utils.processor provides functions for processing ADF projects:

parse_adf_json(json_path)

Reads and parses a single ADF resource JSON file.

discover_adf_resources(directory)

Scans a directory for ADF resource folders:

  • pipeline/ โ€” Pipeline definitions
  • dataset/ โ€” Dataset definitions
  • linkedService/ โ€” Linked service definitions
  • trigger/ โ€” Trigger definitions
  • dataflow/ โ€” Dataflow definitions

extract_grading_info(resources)

Extracts key elements for grading:

  • Pipelines: activities, dependencies, parameters, variables
  • Datasets: type, linked service reference, schema, location
  • Linked Services: type, connection details (sanitized)
  • Triggers: type, schedule, pipeline references
  • Dataflows: sources, sinks, transformations

generate_grading_report(grading_info)

Formats extracted information into a readable text report.

analyze_adf(adf_path)

Convenience function that chains all steps above.

AI Models

Gemini Model

ADFMentor.models.gemini.Gemini

from ADFmentor.models import Gemini

model = Gemini(api_key="your-api-key", model_name="gemini-2.0-flash-exp")

# Evaluate text-based answers
result = model.evaluate(
    question="What are ADF linked services?",
    answer="Linked services are connection strings...",
    prompt="Evaluate for accuracy and completeness"
)

Response Format:

{
  "score": 85,
  "feedback": "Strong implementation with minor issues..."
}

Configuration

API Key Setup

Create a .env file in your project root:

API_KEY=your_gemini_api_key_here

Load it in your code:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")

Detailed Usage Examples

1. Analyze an ADF Project

from ADFmentor.utils import analyze_adf

# Generate a detailed report from an ADF project directory
report = analyze_adf("path/to/adf-project/")
print(report)

Sample Output:

============================================================
AZURE DATA FACTORY PROJECT REPORT
============================================================

PIPELINES:
  - CopyBlobToSQL
    Parameters: inputPath, outputTable
    Activities (3):
      โ€ข LookupSource (type: Lookup)
      โ€ข CopyData (type: Copy)
        depends on: LookupSource [Succeeded]
        source: BlobSource
        sink: SqlSink
      โ€ข StoredProcedure (type: SqlServerStoredProcedure)
        depends on: CopyData [Succeeded]

DATASETS:
  - BlobInput (type: DelimitedText)
    linked service: AzureBlobStorage
    location: type: AzureBlobStorageLocation, folder: input
  - SqlOutput (type: AzureSqlTable)
    linked service: AzureSqlDatabase
    table: dbo.SalesData

LINKED SERVICES:
  - AzureBlobStorage (type: AzureBlobStorage)
  - AzureSqlDatabase (type: AzureSqlDatabase)

TRIGGERS:
  - DailyTrigger (type: ScheduleTrigger)
    schedule: every 1 Day
    pipelines: CopyBlobToSQL

DATAFLOWS:
  none

SUMMARY:
  - total_pipelines: 1
  - total_activities: 3
  - total_datasets: 2
  - total_linked_services: 2
  - total_triggers: 1
  - total_dataflows: 0

2. Complete Evaluation Pipeline

from ADFmentor import ADFMentor

mentor = ADFMentor(api_key="your-api-key")

# Define questions and prompts for each evaluation type
questions = {
    "pipeline": "Create a pipeline to copy data from Blob to SQL with error handling",
    "text": "Explain your pipeline design choices"
}

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, error handling, and best practices",
    "text": "Evaluate clarity, justification, and understanding"
}

# Evaluate all aspects
result = mentor.evaluate_all(
    answer_path="path/to/student/submission/",
    questions=questions,
    prompts=prompts
)

print(f"Overall Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

3. Parse Lesson Questions

If your lesson content uses codes like "TEXT001", "PIPELINE002", you can parse it into question blocks:

from ADFmentor.utils.question_parser import parse_lesson_questions

lesson_text = """
"TEXT001"
1. Explain the purpose of linked services in ADF.

"PIPELINE002"
2. Create a pipeline to copy data from Blob Storage to SQL Database.
"""

questions = parse_lesson_questions(lesson_text)
# -> {"text": "1. ...", "pipeline": "2. ..."}

# Map to the evaluate_all() schema
questions = {
    "pipeline": questions["pipeline"],
    "text": questions["text"],
}

4. Skipping an Evaluation Section

To skip a section, set its question to None (the key must still exist):

questions = {
    "pipeline": "Create a copy pipeline with parameterized paths",
    "text": None,
}

prompts = {
    "pipeline": "Evaluate pipeline structure and best practices",
    "text": "Evaluate clarity, justification, and understanding",
}

ADF Project Structure

ADFMentor expects submissions to follow the standard ADF project structure:

adf-project/
โ”œโ”€โ”€ pipeline/          # Pipeline JSON definitions
โ”‚   โ””โ”€โ”€ CopyPipeline.json
โ”œโ”€โ”€ dataset/           # Dataset JSON definitions
โ”‚   โ”œโ”€โ”€ BlobInput.json
โ”‚   โ””โ”€โ”€ SqlOutput.json
โ”œโ”€โ”€ linkedService/     # Linked service definitions
โ”‚   โ”œโ”€โ”€ BlobStorage.json
โ”‚   โ””โ”€โ”€ SqlDatabase.json
โ”œโ”€โ”€ trigger/           # Trigger definitions
โ”‚   โ””โ”€โ”€ DailyTrigger.json
โ””โ”€โ”€ dataflow/          # Dataflow definitions (optional)

Each JSON file follows the standard ADF resource format with name, type, and properties fields.

Development

Running Tests

# Run integration tests
python tests/test.py

Project Dependencies

Core:

  • google-genai>=1.0.0 - Google Gemini API client
  • python-dotenv>=1.0.0 - Environment variable management

Optional:

  • google-cloud-aiplatform>=1.0.0 - For Vertex AI support

Requirements

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adfmentor-0.3.1.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adfmentor-0.3.1-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file adfmentor-0.3.1.tar.gz.

File metadata

  • Download URL: adfmentor-0.3.1.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.1.tar.gz
Algorithm Hash digest
SHA256 ba79c31bdf3bfb99809fb01439bd0c581a56ed902ce9fce02567272782197dee
MD5 f050c21fea05a2cc8024ac6d0a0264a0
BLAKE2b-256 2698e2ce4fae220621f89ca8e7c4c1feff5282b186fe5f8b6e1b32b6a6ffd8bd

See more details on using hashes here.

File details

Details for the file adfmentor-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: adfmentor-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62790770e3e8727f923c3ef4b2131bb99ea7a2046afe9b5637f98614302cd0b2
MD5 97f8f4ac73f6c68ca503ca0ae0512b70
BLAKE2b-256 04511fb5021659632f483c8344f5ee5074f5a692c26c244676d97121fe7c6bc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page