A Python SDK for customizing Amazon Nova models.
Project description
Amazon Nova Forge SDK
A comprehensive Python SDK for fine-tuning and customizing Amazon Nova models. This SDK provides a unified interface for training, evaluation, deployment, and monitoring of Nova models across both SageMaker Training Jobs and SageMaker HyperPod.
Table of Contents
- Installation
- Setup
- Supported Models and Training Methods
- Data Preparation
- Core Modules Overview
- Additional Features
- Telemetry
- Getting Started
- Security Best Practices for SDK Users
Installation
pip install amzn-nova-forge
- The SDK requires sagemaker, which is automatically set by pip.
Setup
In most cases, the SDK will inform you if the environment lacks the required setup to run a Nova customization job.
Below are some common requirements which you can set up in advance before trying to run a job.
Supported Python Versions
Nova Forge SDK is tested on:
- Python 3.12
IAM Roles/Policies
- You will need an IAM role with sufficient permissions in order to use the Nova Forge SDK. You can find a list of these permissions in the
docs/iam_setup.mdfile.
Instances
Nova customization jobs also require access to enough of the right instance type to run:
- The requested instance type and count should be compatible with the requested job. The SDK will validate your instance configuration for you.
- The SageMaker account quotas for using the requested instance type in training jobs (for SMTJ) or HyperPod clusters (for SMHP) should allow the requested number of instances.
- (For SMHP) The selected HyperPod cluster should have a Restricted Instance Group with enough instances of the right type to run the requested job. The SDK will validate that your cluster contains a valid instance group.
- You can look in the
docs/instance_type_spec.mdfile for the different instance types and combinations for specific jobs and methods.
HyperPod CLI
For HyperPod-based customization jobs, the SDK uses the SageMaker HyperPod CLI to connect to HyperPod Clusters and start jobs.
Prerequisites (required for both Forge and Non-Forge customers)
-
Install Helm 3. Verify with
helm version. If not installed:curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sh rm -f ./get_helm.sh
-
If you are using a Python virtual environment, activate it before installing the CLI:
source <path to venv>/bin/activate
For Non-Forge Customers
- Clone the
release_v2branch of the HyperPod CLI:git clone -b release_v2 https://github.com/aws/sagemaker-hyperpod-cli.git
- Install the CLI:
cd sagemaker-hyperpod-cli pip install .
- Verify the installation:
hyperpod --help
For Forge Customers
- Download the latest HyperPod CLI repo with Forge feature support from S3:
aws s3 cp s3://nova-forge-c7363-206080352451-us-east-1/v1/ ./ --recursive mkdir -p src/hyperpod_cli/sagemaker_hyperpod_recipes/launcher/nemo git clone https://github.com/NVIDIA/NeMo-Framework-Launcher.git src/hyperpod_cli/sagemaker_hyperpod_recipes/launcher/nemo/nemo_framework_launcher --recursive pip install -e .
- Verify the installation:
hyperpod --help
Supported Models and Training Methods
Models
| Model | Version | Model Type | Context Length |
|---|---|---|---|
NOVA_MICRO |
1.0 | amazon.nova-micro-v1:0:128k |
128k tokens |
NOVA_LITE |
1.0 | amazon.nova-lite-v1:0:300k |
300k tokens |
NOVA_LITE_2 |
2.0 | amazon.nova-2-lite-v1:0:256k |
256k tokens |
NOVA_PRO |
1.0 | amazon.nova-pro-v1:0:300k |
300k tokens |
Training Methods
| Method | Description | Supported Models |
|---|---|---|
CPT |
Continued Pre-Training | All models (SMHP only) |
DPO_LORA |
Direct Preference Optimization with LoRA | Nova 1.0 models |
DPO_FULL |
Full-rank Direct Preference Optimization | Nova 1.0 models |
SFT_LORA |
Supervised Fine-tuning with LoRA | All models |
SFT_FULL |
Full-rank Supervised Fine-tuning | All models |
RFT_LORA |
Reinforcement Fine-tuning with LoRA | Nova 2.0 models |
RFT_FULL |
Full Reinforcement Fine-tuning | Nova 2.0 models |
RFT_MULTITURN_LORA |
RFT Multiturn with LoRA | Nova 2.0 models |
RFT_MULTITURN_FULL |
Full RFT Multiturn | Nova 2.0 models |
EVALUATION |
Model evaluation | All models |
Platform Support
| Platform | Description | Models Supported | Supported Methods |
|---|---|---|---|
SMTJ |
SageMaker Training Jobs | All models | All methods |
SMTJServerless |
SageMaker Serverless Training (no infra) | All models | SFT, DPO, RFT_LORA, EVALUATION |
SMHP |
SageMaker HyperPod | All models | All methods |
BEDROCK |
Amazon Bedrock (Managed Service) | All models | SFT, DPO, RFT |
Data Preparation
Before launching a training job, your data needs to be in the right format. The SDK's dataset module handles loading, transforming, validating, filtering, and saving training data — supporting JSONL, JSON, CSV, Parquet, and Arrow formats from local files or S3.
A typical data preparation workflow:
from amzn_nova_forge.dataset import JSONLDatasetLoader
from amzn_nova_forge.model import TrainingMethod, Model, TransformMethod, ValidateMethod
loader = JSONLDatasetLoader()
loader.load("s3://my-bucket/raw-data.jsonl")
loader.transform(method=TransformMethod.SCHEMA, training_method=TrainingMethod.SFT_LORA, model=Model.NOVA_LITE_2)
loader.validate(method=ValidateMethod.INVALID_RECORDS, training_method=TrainingMethod.SFT_LORA, model=Model.NOVA_LITE_2)
loader.save("s3://my-bucket/prepared-data.jsonl")
For the complete guide — including column mappings, dataset splitting, filtering, chaining operations, and end-to-end examples — see Data Preparation Guide.
For a hands-on notebook walkthrough, see samples/dataprep_quickstart.ipynb.
Core Modules Overview
The Nova Forge SDK is organized into the following modules:
| Module | Purpose | Key Components |
|---|---|---|
| Dataset | Data loading, transformation, filtering, and preparation | JSONLDatasetLoader, JSONDatasetLoader, CSVDatasetLoader |
| Manager | Runtime infrastructure management | SMTJRuntimeManager, SMTJServerlessRuntimeManager, SMHPRuntimeManager, BedrockRuntimeManager |
| Model | Main SDK entrypoint and orchestration | NovaModelCustomizer |
| Monitor | Job monitoring and logging | CloudWatchLogMonitor, MLflowMonitor |
| RFT Multiturn | Reinforcement fine-tuning infrastructure | RFTMultiturnInfrastructure |
- For detailed API documentation: See
docs/spec.md - For usage examples: See
samples/nova_quickstart.ipynb - For RFT Singleturn examples: See
samples/rft_singleturn_quickstart.ipynb - For RFT Multiturn documentation: See
docs/rft_multiturn.md - For RFT Multiturn examples: See
samples/rft_multiturn_quickstart.ipynb
Dataset Module
Handles data loading, transformation, validation, filtering, and persistence for training datasets. Supports JSONL, JSON, CSV, Parquet, and Arrow formats from local files or S3.
See the Data Preparation section above for usage overview, or the full Data Preparation Guide for detailed documentation.
Manager Module
Manages runtime infrastructure for executing training and evaluation jobs.
For the allowed instance types for each model/method combination, see docs/instance_type_spec.md.
Main Methods:
execute()- Start a training or evaluation jobcleanup()- Stop and clean up a running jobscale_cluster()- (SMHP only) Scale HyperPod cluster instance groups up or downget_instance_groups()- (SMHP only) View instance groups and current instance counts
Key Classes:
SMTJRuntimeManager- For SageMaker Training JobsSMHPRuntimeManager- For SageMaker HyperPod clustersBedrockRuntimeManager- For Amazon Bedrock managed service
Cluster Scaling (SMHP):
The SMHPRuntimeManager provides a scale_cluster() method to dynamically adjust the number of instances in a HyperPod cluster instance group:
from amzn_nova_forge.manager import SMHPRuntimeManager
# Create a runtime manager for your cluster
manager = SMHPRuntimeManager(
instance_type="ml.p4d.24xlarge",
instance_count=4,
cluster_name="my-hyperpod-cluster",
namespace="default"
)
# View the available instance groups to update
available_instance_groups = manager.get_instance_groups()
# Scale up the worker group from 4 to 8 instances
result = manager.scale_cluster(
instance_group_name="worker-group",
target_instance_count=8
)
For more cluster scaling documentation, see docs/spec.md.
Model Module
Provides the main SDK entrypoint for orchestrating model customization workflows.
Main Methods:
train()- Launch a training jobevaluate()- Launch an evaluation jobdeploy()- Deploy trained model to Amazon SageMaker or Bedrockbatch_inference()- Run batch inference on trained modelget_logs()- Retrieve CloudWatch logs for current jobget_data_mixing_config()- Get data mixing configurationset_data_mixing_config()- Set data mixing configuration
Key Class:
NovaModelCustomizer- Main orchestration class
Monitor Module
Provides job monitoring and experiment tracking capabilities.
Main Methods:
show_logs()- Display CloudWatch logsget_logs()- Retrieve logs as listfrom_job_result()- Create monitor from job resultfrom_job_id()- Create monitor from job ID
Key Classes:
CloudWatchLogMonitor- For viewing job logsMLflowMonitor- For experiment tracking with presigned URL generation
RFT Multiturn Module
Manages infrastructure for reinforcement fine-tuning with multi-turn conversational tasks.
Main Methods:
setup()- Deploy SAM stack and validate platformstart_training_environment()- Start training environmentstart_evaluation_environment()- Start evaluation environmentget_logs()- Retrieve environment logskill_task()- Stop running taskcleanup()- Clean up infrastructure resourcescheck_all_queues()- Check message counts in all queuesflush_all_queues()- Purge all messages from queues
Key Classes:
RFTMultiturnInfrastructure- Main infrastructure management classCustomEnvironment- For creating custom reward environments
Supported Platforms:
LOCAL- Local development environmentEC2- Amazon EC2 instancesECS- Amazon ECS clusters
Built-in Environments:
VFEnvId.WORDLE- Wordle game environmentVFEnvId.TERMINAL_BENCH- Terminal benchmark environment
Iterative Training
The Nova Forge SDK supports iterative fine-tuning of Nova models.
This is done by progressively running fine-tuning jobs on the output checkpoint from the previous job:
# Stage 1: Initial training on base model
stage1_customizer = NovaModelCustomizer(
model=Model.NOVA_LITE,
method=TrainingMethod.SFT_LORA,
infra=infra,
data_s3_path="s3://bucket/stage1-data.jsonl",
output_s3_path="s3://bucket/stage1-output"
)
stage1_result = stage1_customizer.train(job_name="stage1-training")
# Wait for completion...
stage1_checkpoint = stage1_result.model_artifacts.checkpoint_s3_path
# Stage 2: Continue training from Stage 1 checkpoint
stage2_customizer = NovaModelCustomizer(
model=Model.NOVA_LITE,
method=TrainingMethod.SFT_LORA,
infra=infra,
data_s3_path="s3://bucket/stage2-data.jsonl",
output_s3_path="s3://bucket/stage2-output",
model_path=stage1_checkpoint # Use previous checkpoint
)
stage2_result = stage2_customizer.train(job_name="stage2-training")
Note: Iterative fine-tuning requires using the same model and training method (LoRA vs Full-Rank) across all stages.
Dry Run
The Nova Forge SDK supports dry_run mode for the following functions: train(), evaluate(), and batch_inference().
When calling any of the above functions, you can set the dry_run parameter to True.
The SDK will still generate your recipe and validate your input, but it won't begin a job.
This feature is useful whenever you want to test or validate inputs and still have a recipe generated, without starting a job.
# Training dry run
customizer.train(
job_name="train_dry_run",
dry_run=True,
...
)
# Evaluation dry run
customizer.evaluate(
job_name="evaluate_dry_run",
dry_run=True,
...
)
Data Mixing
Data mixing allows you to blend your custom training data with Nova's high-quality curated datasets, helping maintain the model's broad capabilities while adding your domain-specific knowledge.
Key Features:
- Available for CPT and SFT training for Nova 1 and Nova 2 (both LoRA and Full-Rank) on SageMaker HyperPod
- Mix customer data (0-100%) with Nova's curated data
- Nova data categories include general knowledge and code
- Nova data percentages must sum to 100%
Example Usage:
# Initialize with data mixing enabled
customizer = NovaModelCustomizer(
model=Model.NOVA_LITE_2,
method=TrainingMethod.SFT_LORA,
infra=SMHPRuntimeManager(...), # Must use HyperPod
data_s3_path="s3://bucket/data.jsonl",
output_s3_path="s3://bucket/output", # Optional
data_mixing_enabled=True
)
# Configure data mixing percentages
customizer.set_data_mixing_config({
"customer_data_percent": 50, # 50% your data
"nova_code_percent": 30, # 30% Nova code data (30% of Nova's 50%)
"nova_general_percent": 70 # 70% Nova general data (70% of Nova's 50%)
})
# Or use 100% customer data (no Nova mixing)
customizer.set_data_mixing_config({
"customer_data_percent": 100,
"nova_code_percent": 0,
"nova_general_percent": 0
})
Important Notes:
- The
dataset_catalogfield is system-managed and cannot be set by users - Data mixing is only available on SageMaker HyperPod platform for Forge customers.
- Refer to the Get Forge Subscription page to enable Nova subscription in your account to use this feature.
Job Notifications
Get email notifications when your training jobs complete, fail, or are stopped. The SDK automatically sets up the required AWS infrastructure (CloudFormation, DynamoDB, SNS, Lambda, EventBridge) to monitor job status and send notifications.
Features:
- Automatic AWS infrastructure setup and management
- Email notifications for terminal job states (Completed, Failed, Stopped)
- Email notifications for SMHP master pods that are stuck in a crash loop
- Output artifact validation for successful jobs (manifest.json)
- Optional customer key KMS encryption for SNS topics
Platform Support:
- SMTJ (SageMaker Training Jobs): Minimal configuration required
- SMTJServerless (SageMaker Serverless): No instance type needed — SageMaker manages compute automatically
- SMHP (SageMaker HyperPod): Requires kubectl Lambda layer + additional parameters (see
docs/spec.mdfor more details) - Bedrock (Amazon Bedrock): Fully managed, no infrastructure configuration required
Quick Example:
# Start a training job
result = customizer.train(job_name="my-job")
# Enable notifications (SMTJ)
result.enable_job_notifications(
emails=["user@example.com"]
)
# Enable notifications (SMHP)
result.enable_job_notifications(
emails=["user@example.com"],
namespace="kubeflow", # Required for SMHP
kubectl_layer_arn="arn:aws:lambda:<region>:123456789012:layer:kubectl:1" # Required for SMHP
)
Important Notes:
- Users must confirm their email subscription by clicking the link in the AWS SNS confirmation email
- SMHP job notifications requires a kubectl Lambda layer (see Job Notifications Guide)
- Notification infrastructure is created once per region (SMTJ) or once per cluster (SMHP) and shared across jobs.
- See
docs/job_notifications.mdfor detailed setup instructions, troubleshooting, and advanced usage - See
docs/spec.mdfor complete API documentation on job notifications.
Telemetry
The Nova Forge SDK has telemetry enabled to help us better understand user needs, diagnose issues, and deliver new features. This telemetry tracks the usage of various SDK functions. If you prefer to opt out of telemetry, you can do so by setting the TELEMETRY_OPT_OUT environment variable to true:
export TELEMETRY_OPT_OUT=true
Getting Started
This comprehensive SDK enables end-to-end customization of Amazon Nova models with support for multiple training methods, deployment platforms, and monitoring capabilities. Each module is designed to work together seamlessly while providing flexibility for advanced use cases.
To get started customizing Nova models, please see the following files:
- Notebook with "quick start" examples to start customizing at
samples/nova_quickstart.ipynb - Specification document with detailed information about each module at
docs/spec.md
Security Best Practices for SDK Users
1. IAM and Access Management
Execution Roles
- Use dedicated execution roles for SageMaker training jobs with minimal required permissions
- Avoid using admin roles - follow the principle of least privilege
- Regularly audit role permissions and remove unused policies
# Good: Explicit execution role
runtime = SMTJRuntimeManager(
instance_type="ml.p5.48xlarge",
instance_count=2,
execution_role="arn:aws:iam::123456789012:role/SageMakerNovaTrainingRole"
)
# Avoid: Using default role without validation
Required Permissions
The SDK requires specific IAM permissions. Review the IAM section and:
- Grant only the minimum permissions needed for your use case
- Use condition statements to restrict resource access
- Regularly review and rotate access keys
2. Credential Management
AWS Credentials
- Never hardcode credentials in code or configuration files
- Use IAM roles instead of access keys when possible
- Rotate credentials regularly
- Use AWS Secrets Manager for application secrets
- Enable credential monitoring through AWS Config
MLflow Integration
- Secure MLflow tracking URIs with proper authentication
- Use encrypted connections to MLflow servers
- Implement access controls on experiment data
- Regularly audit MLflow access logs
3. Data Security and Privacy
Training Data Protection
- Encrypt data at rest in S3 using KMS keys
- Use S3 bucket policies to restrict access
- Validate data sources before processing
# Ensure your S3 buckets have proper encryption and access controls
customizer = NovaModelCustomizer(
model=Model.NOVA_LITE_2,
method=TrainingMethod.SFT_LORA,
infra=runtime,
data_s3_path="s3://secure-training-bucket/encrypted-data/data.jsonl",
output_s3_path="s3://secure-output-bucket/results"
)
4. Network Security
VPC Configuration
- Deploy in private subnets when possible
- Use VPC endpoints for AWS service access
- Implement security groups with minimal required ports
- Enable VPC Flow Logs for network monitoring
5. Secure Communication
- Always use HTTPS endpoints
- Never disable SSL certificate verification
- Keep TLS libraries updated
6. Input Validation
- Always validate user inputs before passing to SDK
- Sanitize data that will be stored or processed
- Check resource quotas before job submission
- Sanitize job names and resource identifiers
# The SDK includes built-in validation
loader = JSONLDatasetLoader(question="input", answer="output")
loader.load("s3://your-bucket/training-data.jsonl")
# Always validate your data format
loader.validate(method=TrainingMethod.SFT_LORA, model=Model.NOVA_LITE_2)
7. Monitoring & Logging
- Enable CloudTrail for API audit logs
- Use CloudWatch for operational monitoring
- Never log sensitive data (tokens, credentials, PII)
- Monitor job logs through CloudWatch
- Set up alerts for suspicious activities
Security Monitoring
- Monitor failed authentication attempts
- Track unusual resource access patterns
- Log all model deployment activities
8. Deployment Security
Bedrock Deployment
- Use least privilege policies for Bedrock access
- Implement endpoint access controls
- Monitor model inference patterns
- Enable request/response logging when appropriate
9. Validation
The SDK includes built-in validation:
- IAM permission validation before job execution
- Input sanitization for user-provided parameters
Validation is enabled by default.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amzn_nova_forge-1.4.1.tar.gz.
File metadata
- Download URL: amzn_nova_forge-1.4.1.tar.gz
- Upload date:
- Size: 293.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99120635c1de09e3063cc6153f1a992596373c993132531fdeab1c7eb0d00da5
|
|
| MD5 |
1da019452f1cf4594350ab957eee6349
|
|
| BLAKE2b-256 |
0982bdc1c6fd5382225d73069520dacd54e22000004ec8ce6a85572a645a3f25
|
File details
Details for the file amzn_nova_forge-1.4.1-py3-none-any.whl.
File metadata
- Download URL: amzn_nova_forge-1.4.1-py3-none-any.whl
- Upload date:
- Size: 384.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bc17e1e96cc63e1b08518bc18a50529d7667ed8de5e714ffb045aa95b625f63
|
|
| MD5 |
42c4b7fccbaeffcf773d51b6d9a0350d
|
|
| BLAKE2b-256 |
27bbaac67ed54bae58fcbf37b2ec64a963c60d50d4fe0bd7bfb4910b735706c1
|