Skip to main content

AfriLink SDK — One-line access to GPUs, models and datasets from your notebook

Project description

AfriLink SDK

Version: 0.1.6
Last Updated: Feb 17, 2025

Finetune LLMs on HPC from your notebook

AfriLink SDK gives you one-line access to GPUs, models and datasets; all ready to use directly from your notebook interface. Authenticate, submit LoRA finetune jobs, download trained weights, and run inference without ever leaving your notebook.

pip install afrilink-sdk

Quick Start

from afrilink import AfriLinkClient

# 1. Authenticate (prompts for DataSpires email/password, then auto-handles HPC)
client = AfriLinkClient()
client.authenticate()

# 2. Prepare your dataset (pandas DataFrame with "text" column)
import pandas as pd
data = pd.DataFrame({"text": [
    "Below is an instruction...\n\n### Response:\nHere is the answer..."
]})

# 3. Submit a finetune job
job = client.finetune(model="qwen2.5-0.5b", training_mode="low", data=data, gpus=1)
result = job.run(wait=True)          # blocks until SLURM job finishes

# 4. Download the trained adapter (only if job succeeded)
if result["status"] == "completed":
    client.download_model(result["job_id"], "./my-model")

    # 5. Load & run inference
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel

    base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
    model = PeftModel.from_pretrained(base, "./my-model")
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")

    out = model.generate(**tokenizer("Hello!", return_tensors="pt"), max_new_tokens=64)
    print(tokenizer.decode(out[0], skip_special_tokens=True))
else:
    print(f"Job failed with status: {result['status']}")
    print(f"Check logs: job.get_logs()")

Installation

pip install afrilink-sdk

The package has zero required dependencies — heavy libraries (requests, torch, transformers, peft) are only needed at the point you actually use them and are pre-installed in most notebook environments.


Authentication

AfriLink uses a two-phase auth flow. Both phases happen inside a single client.authenticate() call:

Phase What happens User action
1. DataSpires Validates your DataSpires account for billing/telemetry Enter email + password when prompted
2. HPC Headless Selenium browser automation gets SSH certificates via Smallstep Fully automatic (org credentials auto-provisioned)
from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()   # prompts for DataSpires creds, then auto-handles HPC

# Or pass credentials explicitly:
client.authenticate(
    dataspires_email="you@example.com",
    dataspires_password="...",
)

After authentication you get:

  • SSH certificate valid for ~16 hours
  • SLURM job manager ready to submit jobs
  • SCP transfer manager ready to move files
  • Telemetry tracker logging GPU-minutes to your DataSpires account

API Reference

AfriLinkClient

Main entry point. Created once per notebook session.

Method Description
authenticate() Full auth flow (DataSpires + HPC)
finetune(model, training_mode, data, gpus, ...) Create a FinetuneJob
download_model(job_id, local_dir) Download trained adapter weights
upload_dataset(local_path, dataset_name) Upload dataset to HPC
list_available_models(size=None) List models in the registry
list_available_datasets() List datasets in the registry
get_model_requirements(model, training_mode) GPU/memory recommendations
list_jobs() List SLURM queue
cancel_job(job_id) Cancel a running job
run_command(command) Run arbitrary shell command on HPC login node
get_queue_status() SLURM partition info

client.finetune()

job = client.finetune(
    model="qwen2.5-0.5b",       # model ID from registry
    training_mode="low",          # "low" | "medium" | "high"
    data=my_dataframe,            # pandas DataFrame, HF Dataset, or file path
    gpus=1,                       # number of A100 GPUs
    time_limit="04:00:00",        # max wallclock
    output_dir=None,              # default: $WORK/finetune_outputs
)

Training modes:

Mode Strategy Quantization Typical GPUs
low QLoRA (rank 8) 4-bit 1
medium LoRA (rank 16) 8-bit / none 1-2
high LoRA (rank 64) + DDP/FSDP none 2-4+

FinetuneJob

Returned by client.finetune().

Method / Property Description
run(wait=True) Submit to SLURM. wait=True polls until done.
cancel() Cancel the SLURM job
get_logs(tail=100) Fetch recent log lines
status Current status string
job_id AfriLink job ID (8-char UUID prefix)
slurm_job_id SLURM numeric job ID (set after run())

run() returns a dict:

{
    "job_id": "a1b2c3d4",
    "slurm_job_id": "12345678",
    "status": "completed",        # or "submitted" if wait=False
    "output_dir": "/path/...",
    "model_path": "/path/...",
}

client.download_model()

client.download_model(result["job_id"], "./my-model")

Downloads adapter files (adapter_config.json, adapter_model.safetensors, tokenizer files) flat into the target directory — ready for PeftModel.from_pretrained().

Model & Dataset Registry

# List all models
client.list_available_models()

# Filter by size
client.list_available_models(size="tiny")   # tiny | small | medium | large

# List datasets
client.list_available_datasets()

# Resource requirements
client.get_model_requirements("qwen2.5-0.5b", "low")

Available models (v0.1.0):

ID Name Type Params Min VRAM
qwen2.5-0.5b Qwen 2.5 0.5B text 0.5B 4 GB
gemma-3-270m Gemma 3 270M text 0.27B 2 GB
llama-3.2-1b Llama 3.2 1B text 1.0B 4 GB
deepseek-r1-1.5b DeepSeek R1 1.5B text 1.5B 6 GB
ministral-3b Ministral 3B text 3.3B 8 GB
florence-2-base Florence 2 Base vision 0.23B 4 GB
smolvlm-256m SmolVLM 256M vision 0.26B 2 GB
moondream2 Moondream 2 vision 1.9B 8 GB
internvl2-1b InternVL2 1B vision 1.0B 4 GB
llava-1.5-7b LLaVA 1.5 7B vision 7.0B 16 GB

Data Transfer

# Upload a dataset
client.upload_dataset("./train.jsonl", dataset_name="my-data")

# Download model weights
client.download_model("a1b2c3d4", "./my-model")

# List remote files
client.transfer.list_remote_files("$WORK/finetune_outputs/")

# Run shell commands on HPC
client.run_command("squeue -u $USER")

Dataset Formats

client.finetune(data=...) accepts:

Type How it's handled
pandas.DataFrame Serialised to JSONL, uploaded via SCP
datasets.Dataset Saved to disk, uploaded via SCP
str (local path) Uploaded via SCP
str (starts with $) Treated as remote HPC path (no upload)

Your DataFrame should have a text column with the full prompt+response formatted as a single string (Alpaca-style or chat template).


Architecture

Notebook Interface                      High Performance Compute
+--------------+      SSH/SCP          +------------------+
| AfriLink SDK | ------------------->  |  Login Node      |
|              |  (Smallstep certs)    |  +- SLURM sbatch |
| DataSpires   |                       |  +- $WORK/       |
| (billing)    |                       |  |  +- containers|
|              |                       |  |  +- datasets  |
+--------------+                       |  |  +- finetune_ |
                                       |  |     outputs/  |
                                       |  |     +- {jobid}|
                                       |  +- Singularity  |
                                       |     container    |
                                       |     (A100 GPUs)  |
                                       +------------------+

Publishing to PyPI

For maintainers:

cd afrilink-sdk
pip install build twine

# Build wheel + sdist
python -m build

# Upload to PyPI (requires PyPI API token)
twine upload dist/*

You'll need a PyPI account at https://pypi.org and an API token configured in ~/.pypirc or passed via --username __token__ --password pypi-....


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afrilink_sdk-0.1.8.tar.gz (70.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

afrilink_sdk-0.1.8-py3-none-any.whl (73.0 kB view details)

Uploaded Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page