Skip to main content

Comprehensive Python framework for managing FABRIC testbed generic clusters and slices

Project description

fabric-generic-cluster

A comprehensive, type-safe Python framework for managing FABRIC testbed slices with support for complex network topologies, DPU interfaces, multi-OS configurations, and various hardware components.

PyPI version Python 3.9+ Pydantic V2 License: MIT

๐ŸŒŸ Features

Core Capabilities

  • โœ… Type-Safe Data Models - Pydantic-based topology definitions with automatic validation
  • โœ… DPU Interface Support - Full support for DPU network interfaces alongside traditional NICs
  • โœ… Multi-OS Support - Automatic detection and configuration for Rocky Linux, Ubuntu, and Debian
  • โœ… Hardware Components - Full support for GPUs, FPGAs, DPUs, NVMe, and custom NICs
  • โœ… Network Management - L2/L3 network configuration with IPv4/IPv6 support
  • โœ… SSH Automation - Passwordless SSH setup across all nodes
  • โœ… Visualization - Multiple output formats (text, ASCII, graphs, tables)
  • โœ… Easy Installation - Available on PyPI via pip install
  • โœ… Modular Design - Separated concerns for better maintainability

Hardware Support

  • GPUs - NVIDIA RTX series, Tesla T4, A30, A40
  • FPGAs - Xilinx Alveo U280, U50, U250
  • DPUs - ConnectX-7 100G/400G Data Processing Units with network interfaces
  • NVMe - Intel P4510, P4610 NVMe storage
  • NICs - Basic, ConnectX-5, ConnectX-6, SharedNICs, SmartNICs
  • Persistent Storage - Volume management

๐Ÿ“‹ Table of Contents

๐Ÿš€ Installation

From PyPI (Recommended)

pip install fabric-generic-cluster

From Source

git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster
pip install -e .

Prerequisites

  • Python 3.9 or higher
  • Access to FABRIC testbed
  • fabrictestbed-extensions>=1.4.0 (installed automatically)

Verify Installation

import fabric_generic_cluster
print(fabric_generic_cluster.__version__)

๐ŸŽฏ Quick Start

Option 1: Python Script

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    deploy_topology_to_fabric,
    configure_l3_networks,
    configure_node_interfaces,
    setup_passwordless_ssh,
)

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Deploy to FABRIC
slice = deploy_topology_to_fabric(topology, "my-cluster")

# Configure networks (if using L3 networks)
configure_l3_networks(slice, topology)

# Configure interfaces
configure_node_interfaces(slice, topology)

# Setup SSH
setup_passwordless_ssh(slice)

print("โœ… Cluster deployed and configured!")

Option 2: Using the Example Script

# Clone the repository for examples
git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster

# Run the complete deployment example
python examples/complete-deployment-example.py \
    --yaml path/to/topology.yaml \
    --slice-name my-test-slice

Option 3: Jupyter Notebooks

For interactive workflows, check out the fabric-generic-cluster-notebooks repository:

git clone https://github.com/mcevik0/fabric-generic-cluster-notebooks.git
cd fabric-generic-cluster-notebooks
jupyter notebook

๐Ÿ“ฆ Package Structure

fabric-generic-cluster/
โ”œโ”€โ”€ fabric_generic_cluster/          # Main package
โ”‚   โ”œโ”€โ”€ __init__.py                  # Package exports
โ”‚   โ”œโ”€โ”€ models.py                    # Pydantic models for topology
โ”‚   โ”œโ”€โ”€ deployment.py                # Slice deployment functions
โ”‚   โ”œโ”€โ”€ network_config.py            # Network configuration
โ”‚   โ”œโ”€โ”€ ssh_setup.py                 # SSH management
โ”‚   โ”œโ”€โ”€ topology_viewer.py           # Visualization tools
โ”‚   โ”œโ”€โ”€ builder_compat.py            # Backward compatibility
โ”‚   โ””โ”€โ”€ tools/                       # Command-line tools
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ topology_summary.py      # Topology summary generator
โ”‚
โ”œโ”€โ”€ examples/                        # Usage examples
โ”‚   โ””โ”€โ”€ complete-deployment-example.py
โ”‚
โ”œโ”€โ”€ tests/                          # Test suite
โ”‚   โ”œโ”€โ”€ test-dpu-support.py
โ”‚   โ””โ”€โ”€ test-fpga-support.py
โ”‚
โ”œโ”€โ”€ pyproject.toml                  # Package metadata
โ”œโ”€โ”€ setup.py                        # Setup configuration
โ”œโ”€โ”€ MANIFEST.in                     # Package data
โ”œโ”€โ”€ LICENSE                         # MIT License
โ””โ”€โ”€ README.md                       # This file

๐Ÿ“š Usage Examples

Example 1: Load and Explore Topology

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    print_topology_summary,
    draw_topology_graph,
)

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Print summary
print_topology_summary(topology)

# Create visualization
draw_topology_graph(topology, show_ip=True, save_path="topology.png")

Example 2: Deploy Multi-Site Cluster

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    deploy_topology_to_fabric,
    configure_node_interfaces,
    verify_node_interfaces,
)

# Load topology with nodes at multiple sites
topology = load_topology_from_yaml_file("multi-site-topology.yaml")

# Deploy
slice = deploy_topology_to_fabric(topology, "multi-site-cluster")

# Configure all nodes
configure_node_interfaces(slice, topology)

# Verify configuration
verify_node_interfaces(slice, topology)

Example 3: Access Type-Safe Data

from fabric_generic_cluster import load_topology_from_yaml_file

topology = load_topology_from_yaml_file("topology.yaml")

# Get specific node
node = topology.get_node_by_hostname("node-1")

print(f"Node: {node.hostname}")
print(f"Site: {node.site}")
print(f"CPU: {node.capacity.cpu} cores")
print(f"RAM: {node.capacity.ram} GB")

# Check hardware components
if node.pci.dpu:
    print(f"DPUs: {len(node.pci.dpu)}")
    for dpu_name, dpu in node.pci.dpu.items():
        print(f"  - {dpu_name}: {dpu.model}")
        print(f"    Interfaces: {len(dpu.interfaces)}")

if node.pci.fpga:
    print(f"FPGAs: {len(node.pci.fpga)}")
    for fpga_name, fpga in node.pci.fpga.items():
        print(f"  - {fpga_name}: {fpga.model}")

# Get all interfaces (NIC + DPU)
all_interfaces = node.get_all_interfaces()
print(f"\nTotal interfaces: {len(all_interfaces)}")

for device_name, iface_name, iface in all_interfaces:
    device_type = "DPU" if device_name.startswith("dpu") else "NIC"
    print(f"{device_type} {device_name}.{iface_name}: {iface.binding}")

Example 4: Test Network Connectivity

from fabric_generic_cluster import (
    get_slice,
    load_topology_from_yaml_file,
    ping_network_from_node,
    verify_ssh_access,
)

# Get existing slice
slice = get_slice("my-cluster")
topology = load_topology_from_yaml_file("topology.yaml")

# Test ping connectivity
ping_results = ping_network_from_node(
    slice, 
    topology, 
    source_hostname="node-1", 
    network_name="network1",
    count=3
)

if all(ping_results.values()):
    print("โœ… All ping tests passed!")

# Test SSH access
ssh_results = verify_ssh_access(
    slice,
    topology,
    source_hostname="node-1",
    network_name="network1"
)

if all(ssh_results.values()):
    print("โœ… All SSH connections successful!")

Example 5: Using Module-Style Imports

For compatibility with existing code:

from fabric_generic_cluster import deployment as sd
from fabric_generic_cluster import network_config as snc
from fabric_generic_cluster import ssh_setup as ssh
from fabric_generic_cluster import load_topology_from_yaml_file

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Deploy
slice = sd.deploy_topology_to_fabric(topology, "my-slice")

# Configure
snc.configure_node_interfaces(slice, topology)
ssh.setup_passwordless_ssh(slice)

๐Ÿ”ง API Reference

Models and Loaders

from fabric_generic_cluster import (
    SiteTopology,              # Main topology model
    Node,                      # Node model
    Network,                   # Network model
    load_topology_from_yaml_file,   # Load from YAML file
    load_topology_from_dict,        # Load from dictionary
)

Deployment Functions

from fabric_generic_cluster import (
    deploy_topology_to_fabric,   # Deploy slice to FABRIC
    configure_l3_networks,        # Configure L3 networks
    get_slice,                    # Get existing slice
    delete_slice,                 # Delete slice
    check_slices,                 # List all slices
)

# Usage
slice = deploy_topology_to_fabric(topology, "slice-name")
configure_l3_networks(slice, topology)

Network Configuration

from fabric_generic_cluster import (
    configure_node_interfaces,    # Configure all interfaces
    verify_node_interfaces,       # Verify configuration
    ping_network_from_node,       # Test connectivity
    update_hosts_file_on_nodes,   # Update /etc/hosts
)

# Usage
configure_node_interfaces(slice, topology)
verify_node_interfaces(slice, topology)

SSH Setup

from fabric_generic_cluster import (
    setup_passwordless_ssh,       # Complete SSH setup
    verify_ssh_access,            # Verify SSH connectivity
)

# Usage
setup_passwordless_ssh(slice)
results = verify_ssh_access(slice, topology, "node-1", "network1")

Visualization

from fabric_generic_cluster import (
    print_topology_summary,       # Detailed summary
    print_compact_summary,        # Brief summary
    draw_topology_graph,          # Visual graph
)

# Usage
print_topology_summary(topology)
draw_topology_graph(topology, show_ip=True, save_path="topology.png")

๐Ÿ› ๏ธ Command-Line Tools

Topology Summary Generator

The package includes a command-line tool for generating topology summaries:

# Generate summary for a YAML file
fabric-topology-summary input.yaml --output output.yaml

# Just print summary without modifying file
fabric-topology-summary input.yaml --dry-run

# Include ASCII diagram
fabric-topology-summary input.yaml --ascii --output output.yaml

This tool is automatically installed when you install the package.

๐Ÿ’ป Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run test suite
pytest tests/

# Run specific test
python tests/test-dpu-support.py
python tests/test-fpga-support.py

Building the Package

# Install build tools
pip install build twine

# Build distribution
python -m build

# Check package
twine check dist/*

# Test upload to TestPyPI
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*

Code Style

# Format code
black fabric_generic_cluster/

# Check style
flake8 fabric_generic_cluster/

๐Ÿ“– Documentation

Comprehensive Guides

Example Topologies

Example YAML topology files are available in the notebooks repository:

  • Basic 2-node cluster
  • Multi-site deployment
  • Storage cluster with NVMe
  • DPU/SmartNIC configurations
  • FPGA-enabled topologies
  • OpenStack deployment variants

YAML Topology Format

site_topology:
  nodes:
    node-1:
      hostname: node-1
      site: SITE1
      capacity:
        cpu: 8
        ram: 32
        disk: 100
        os: default_rocky_9
      nics:
        nic1:
          interfaces:
            iface1:
              binding: network1
              ipv4_address: 10.0.1.1
              ipv4_netmask: 255.255.255.0
      pci:
        dpu:
          dpu1:
            model: NIC_ConnectX_7_100
            interfaces:
              iface1:
                binding: network1
                ipv4_address: 10.0.1.10

  networks:
    network1:
      name: network1
      type: L2Bridge
      subnet: 10.0.1.0/24

๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. Report bugs: Open an issue on GitHub
  2. Suggest features: Open an issue with your idea
  3. Submit PRs: Fork, make changes, and submit a pull request

Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Workflow

  1. Update code in fabric_generic_cluster/
  2. Add tests in tests/
  3. Update documentation
  4. Run tests: pytest tests/
  5. Build package: python -m build
  6. Test locally: pip install dist/*.whl

๐Ÿ“Š Performance

  • Validation Speed: ~10ms for typical topology (3-10 nodes)
  • Deployment Time: Depends on FABRIC (typically 5-10 minutes)
  • Network Config: ~30 seconds per node
  • SSH Setup: ~1-2 minutes for 3-node cluster

๐Ÿ—บ๏ธ Roadmap

  • Type-safe Pydantic models
  • DPU interface support
  • Multi-distro support (Rocky/Ubuntu/Debian)
  • L2/L3 network configuration
  • Automated SSH setup
  • PyPI package distribution
  • Web-based topology editor
  • Ansible playbook integration
  • Monitoring and metrics collection
  • REST API endpoint

๐Ÿ› Troubleshooting

Import Issues

Problem: ModuleNotFoundError: No module named 'fabric_generic_cluster'

Solution:

pip install fabric-generic-cluster

YAML File Not Found

Problem: FileNotFoundError when loading topology

Solution: Use absolute paths or ensure YAML file is in current directory:

from pathlib import Path

yaml_file = Path("path/to/topology.yaml")
topology = load_topology_from_yaml_file(str(yaml_file))

DPU Interfaces Not Detected

Problem: DPU interfaces not showing up

Solution: Verify DPU configuration in YAML:

node = topology.get_node_by_hostname("node-1")
print(f"DPUs: {node.pci.dpu}")

# Check all interfaces
all_ifaces = node.get_all_interfaces()
print(f"Total interfaces: {len(all_ifaces)}")

Network Configuration Fails

Problem: Interface configuration errors

Solution:

  1. Check L3 networks are configured first: configure_l3_networks(slice, topology)
  2. Ensure nodes are active: slice.wait()
  3. Verify OS detection: Check logs for supported distro

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support

๐Ÿ“ฆ Related Repositories

๐Ÿ”— Links


Made with โค๏ธ for the FABRIC Community

Author: Mert Cevik (@mcevik0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fabric_generic_cluster-1.0.15.tar.gz (62.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fabric_generic_cluster-1.0.15-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file fabric_generic_cluster-1.0.15.tar.gz.

File metadata

  • Download URL: fabric_generic_cluster-1.0.15.tar.gz
  • Upload date:
  • Size: 62.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for fabric_generic_cluster-1.0.15.tar.gz
Algorithm Hash digest
SHA256 14170035b6398e622cbb457bf69a3639b1cdb0d5a3175c63f0c88f6ce5a8e95b
MD5 21b843fd0569e509bc87ff516a8409d8
BLAKE2b-256 19d9cb366e143fb8710f1556bfe114ca7f078b49f0e659d762081b1f6fe8285c

See more details on using hashes here.

File details

Details for the file fabric_generic_cluster-1.0.15-py3-none-any.whl.

File metadata

File hashes

Hashes for fabric_generic_cluster-1.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 1cf7c28ce8e60fd6637ad2dbb063a30ce0d1ce9af8f8fe4e12a79274d64160ff
MD5 8cd21fb1fcc6d0b04203384676608a04
BLAKE2b-256 43c313aaae4b2266b1e030538a4c156af610352b830677aca57707338620584d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page