Parse Azure Data Factory project files and evaluate them using AI models.

These details have not been verified by PyPI

Project description

ADFMentor

A Python package for parsing Azure Data Factory (ADF) project files and evaluating them using AI models like Google Gemini.

Features

🏗️ ADF Processing: Parse ADF pipeline, dataset, linked service, trigger, and dataflow JSON files
Pipeline — Parse ADF pipeline JSON files (activities, dependencies, parameters)
Written — Evaluate text-based answers about ADF concepts
📝 Detailed Reports: Generate comprehensive grading reports from ADF project structures
📦 ZIP Support: Automatically handles ZIP file submissions — no manual extraction needed
🗂️ Flexible Inputs: Accepts a directory, a ZIP, or a single file (.json, .txt)
🔍 Auto File Discovery: Locates ADF resource folders (pipeline/, dataset/, linkedService/, etc.)
⚠️ Graceful Missing-File Scoring: Missing required files yield a 0 score and clear feedback
🧩 Lesson Question Parser: Parse structured lesson text into pipeline and text question blocks
🔧 Easy Integration: Simple API for evaluating student assignments and projects

Installation

From Source

git clone https://github.com/yourusername/ADFMentor.git
cd ADFMentor
pip install -e .

Using pip (when published)

pip install ADFMentor

Quick Start

from ADFmentor import ADFMentor

# Initialize with your Gemini API key
mentor = ADFMentor(api_key="your-api-key")

# Evaluate a full submission (pipeline + written)
# Works with directories, ZIP files, or single files (.json/.txt)
questions = """
PIPELINE: Create an ADF pipeline to copy data from Blob Storage to SQL Database
TEXT: Explain your pipeline design choices
"""

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, and best practices",
    "text": "Evaluate clarity and reasoning",
}

result = mentor.evaluate_all(
    answer_path="path/to/submission/",  # or "submission.zip"
    questions=questions,
    prompts=prompts,
)

print(f"Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

Package Structure

ADFMentor/
├── __init__.py           # Main package entry point
├── core.py               # ADFMentor class with evaluation methods
├── models/               # AI model wrappers
│   ├── __init__.py
│   ├── model.py          # Abstract base model
│   └── gemini.py         # Google Gemini implementation
└── utils/                # Utility functions
    ├── __init__.py
    ├── processor.py      # ADF JSON parsing and report generation
    ├── checker.py        # File discovery helpers
    ├── extractor.py      # ZIP extraction utilities
    └── question_parser.py # Lesson question parser helpers

Core Components

ADFMentor Class

The main class provides a single evaluation method:

evaluate_all(answer_path, questions, prompts): Evaluates pipeline structure and written answers together and returns an overall score and combined feedback

Notes:

answer_path can be a directory, ZIP file, or a single submission file (.json, .txt).
questions and prompts must include pipeline and text keys. A section is skipped if its question is set to None.

ADF Processor

ADFMentor.utils.processor provides functions for processing ADF projects:

`parse_adf_json(json_path)`

Reads and parses a single ADF resource JSON file.

`discover_adf_resources(directory)`

Scans a directory for ADF resource folders:

pipeline/ — Pipeline definitions
dataset/ — Dataset definitions
linkedService/ — Linked service definitions
trigger/ — Trigger definitions
dataflow/ — Dataflow definitions

`extract_grading_info(resources)`

Extracts key elements for grading:

Pipelines: activities, dependencies, parameters, variables
Datasets: type, linked service reference, schema, location
Linked Services: type, connection details (sanitized)
Triggers: type, schedule, pipeline references
Dataflows: sources, sinks, transformations

`generate_grading_report(grading_info)`

Formats extracted information into a readable text report.

`analyze_adf(adf_path)`

Convenience function that chains all steps above.

AI Models

Gemini Model

ADFMentor.models.gemini.Gemini

from ADFmentor.models import Gemini

model = Gemini(api_key="your-api-key", model_name="gemini-2.0-flash-exp")

# Evaluate text-based answers
result = model.evaluate(
    question="What are ADF linked services?",
    answer="Linked services are connection strings...",
    prompt="Evaluate for accuracy and completeness"
)

Response Format:

{
  "score": 85,
  "feedback": "Strong implementation with minor issues..."
}

Configuration

API Key Setup

Create a .env file in your project root:

API_KEY=your_gemini_api_key_here

Load it in your code:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")

Detailed Usage Examples

1. Analyze an ADF Project

from ADFmentor.utils import analyze_adf

# Generate a detailed report from an ADF project directory
report = analyze_adf("path/to/adf-project/")
print(report)

Sample Output:

============================================================
AZURE DATA FACTORY PROJECT REPORT
============================================================

PIPELINES:
  - CopyBlobToSQL
    Parameters: inputPath, outputTable
    Activities (3):
      • LookupSource (type: Lookup)
      • CopyData (type: Copy)
        depends on: LookupSource [Succeeded]
        source: BlobSource
        sink: SqlSink
      • StoredProcedure (type: SqlServerStoredProcedure)
        depends on: CopyData [Succeeded]

DATASETS:
  - BlobInput (type: DelimitedText)
    linked service: AzureBlobStorage
    location: type: AzureBlobStorageLocation, folder: input
  - SqlOutput (type: AzureSqlTable)
    linked service: AzureSqlDatabase
    table: dbo.SalesData

LINKED SERVICES:
  - AzureBlobStorage (type: AzureBlobStorage)
  - AzureSqlDatabase (type: AzureSqlDatabase)

TRIGGERS:
  - DailyTrigger (type: ScheduleTrigger)
    schedule: every 1 Day
    pipelines: CopyBlobToSQL

DATAFLOWS:
  none

SUMMARY:
  - total_pipelines: 1
  - total_activities: 3
  - total_datasets: 2
  - total_linked_services: 2
  - total_triggers: 1
  - total_dataflows: 0

2. Complete Evaluation Pipeline

from ADFmentor import ADFMentor

mentor = ADFMentor(api_key="your-api-key")

# Define questions and prompts for each evaluation type
questions = {
    "pipeline": "Create a pipeline to copy data from Blob to SQL with error handling",
    "text": "Explain your pipeline design choices"
}

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, error handling, and best practices",
    "text": "Evaluate clarity, justification, and understanding"
}

# Evaluate all aspects
result = mentor.evaluate_all(
    answer_path="path/to/student/submission/",
    questions=questions,
    prompts=prompts
)

print(f"Overall Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")

3. Parse Lesson Questions

If your lesson content uses codes like "TEXT001", "PIPELINE002", you can parse it into question blocks:

from ADFmentor.utils.question_parser import parse_lesson_questions

lesson_text = """
"TEXT001"
1. Explain the purpose of linked services in ADF.

"PIPELINE002"
2. Create a pipeline to copy data from Blob Storage to SQL Database.
"""

questions = parse_lesson_questions(lesson_text)
# -> {"text": "1. ...", "pipeline": "2. ..."}

# Map to the evaluate_all() schema
questions = {
    "pipeline": questions["pipeline"],
    "text": questions["text"],
}

4. Skipping an Evaluation Section

To skip a section, set its question to None (the key must still exist):

questions = {
    "pipeline": "Create a copy pipeline with parameterized paths",
    "text": None,
}

prompts = {
    "pipeline": "Evaluate pipeline structure and best practices",
    "text": "Evaluate clarity, justification, and understanding",
}

ADF Project Structure

ADFMentor expects submissions to follow the standard ADF project structure:

adf-project/
├── pipeline/          # Pipeline JSON definitions
│   └── CopyPipeline.json
├── dataset/           # Dataset JSON definitions
│   ├── BlobInput.json
│   └── SqlOutput.json
├── linkedService/     # Linked service definitions
│   ├── BlobStorage.json
│   └── SqlDatabase.json
├── trigger/           # Trigger definitions
│   └── DailyTrigger.json
└── dataflow/          # Dataflow definitions (optional)

Each JSON file follows the standard ADF resource format with name, type, and properties fields.

Development

Running Tests

# Run integration tests
python tests/test.py

Project Dependencies

Core:

google-genai>=1.0.0 - Google Gemini API client
python-dotenv>=1.0.0 - Environment variable management

Optional:

google-cloud-aiplatform>=1.0.0 - For Vertex AI support

Requirements

Python 3.9 or higher
Google Gemini API key (get one at Google AI Studio)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.0.2

Mar 11, 2026

1.0.1

Mar 3, 2026

1.0.0

Mar 3, 2026

0.4.0

Mar 11, 2026

0.3.1

Mar 3, 2026

This version

0.3.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adfmentor-0.3.0.tar.gz (19.2 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

adfmentor-0.3.0-py3-none-any.whl (17.1 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file adfmentor-0.3.0.tar.gz.

File metadata

Download URL: adfmentor-0.3.0.tar.gz
Upload date: Mar 3, 2026
Size: 19.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`f64a1aa6c9f39c9f2222f19b0c361b86b2d11e1c8eb743bd62bf8800e99d062e`
MD5	`c713c71a8051063b904117f2b562a828`
BLAKE2b-256	`6283f9c6e74b286a743dfbe500d3fc0557af9f113fbea7204834ebc79c007c45`

See more details on using hashes here.

File details

Details for the file adfmentor-0.3.0-py3-none-any.whl.

File metadata

Download URL: adfmentor-0.3.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 17.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for adfmentor-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`597ee810fd8c6e79026a4b03be4b2758530ba711c6b74d890104cd3e34e4025b`
MD5	`0aef8e2f2b899bfc453fd3c460dfafeb`
BLAKE2b-256	`b9027e9a8b77cd1f7ffde3a0cafea29fd62a0b94e6bea6de639d209ce9cb33e7`

See more details on using hashes here.

ADFMentor 0.3.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

ADFMentor

Features

Installation

From Source

Using pip (when published)

Quick Start

Package Structure

Core Components

ADFMentor Class

ADF Processor

parse_adf_json(json_path)

discover_adf_resources(directory)

extract_grading_info(resources)

generate_grading_report(grading_info)

analyze_adf(adf_path)

AI Models

Gemini Model

Configuration

API Key Setup

Detailed Usage Examples

1. Analyze an ADF Project

2. Complete Evaluation Pipeline

3. Parse Lesson Questions

4. Skipping an Evaluation Section

ADF Project Structure

Development

Running Tests

Project Dependencies

Requirements

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`parse_adf_json(json_path)`

`discover_adf_resources(directory)`

`extract_grading_info(resources)`

`generate_grading_report(grading_info)`

`analyze_adf(adf_path)`