Skip to main content

AfriLink SDK — One-line access to GPUs, models and datasets from your notebook

Project description

AfriLink SDK

Version: 0.8.16

Last Updated: May 29, 2026

Train & Finetune on a Dedicated OpenToken A100 from Any Notebook

AfriLink SDK gives you one-line access to a dedicated NVIDIA A100 80 GB hosted by OpenToken for training and finetuning across text, vision and multimodal models. Works on Google Colab, Kaggle, Jupyter, VS Code, and any Python environment.

Capability API What It Does
Curated finetune client.finetune() LoRA/QLoRA LLM fine-tuning in our pre-built afrilink-finetune container
Curated training client.train() Run any training script in our pre-built afrilink-yolo container (Ultralytics, vision)
Build-your-own container client.build_image() Define base image + pip / apt deps + model source, build on Cloud Build, push to private Artifact Registry
Build-and-train client.build_and_train() One-shot: builds (or hits the cache), runs on the A100, downloads artefacts, cleans up
Image cache lookup client.find_existing_image() Check if a matching image already exists before triggering a fresh ~5 min build
pip install afrilink-sdk[build]

Quick Start — Finetune an LLM

from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()   # reads AFRILINK_API_KEY from notebook secrets / env

import pandas as pd
data = pd.DataFrame({"text": [
    "Below is an instruction...\n\n### Response:\nHere is the answer..."
]})

job = client.finetune(
    model="qwen2.5-0.5b",
    training_mode="low",
    data=data,
    gpus=1,
    time_limit="01:00:00",
)
result = job.run(wait=True)

if result["status"] == "completed":
    client.download_model(result["job_id"], "./my-model")

Quick Start — Train a Vision Model

from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()

# Submit a YOLOv8 training job to the A100
job = client.train(
    script="train_yolo.py",        # your training script
    container="afrilink-yolo",      # pre-built container with YOLOv8 + PyTorch
    data="./dataset.tar.gz",        # dataset (uploaded automatically)
    data_config="dataset.yaml",     # YOLO dataset.yaml
    gpus=1,
    time_limit="02:00:00",
)
result = job.run(wait=True)
print(job.get_logs(tail=50))
client.download_model(result["job_id"], "./yolo-out")

Quick Start — Custom Container

from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()

# Define exactly the environment your training needs.
spec = dict(
    base_image="pytorch",                                # preset
    pip_packages=["transformers>=4.45", "accelerate>=0.34", "peft>=0.13"],
    apt_packages=["git"],
    model_source={
        "kind": "huggingface",
        "id": "Qwen/Qwen2.5-0.5B-Instruct",
    },
)

# Builds the image on Cloud Build (~5 min first time, instant on cache hit
# for the same spec on subsequent runs), runs on the A100, deletes the
# local image layer afterwards, returns the artefact directory.
result = client.build_and_train(
    script="my_train.py",
    gpus=1,
    time_limit_hours=0.5,
    reuse_existing_image=True,   # default — short-circuits identical specs
    **spec,
)
client.download_model(result["run"]["job_id"], "./output")

Installation

pip install 'afrilink-sdk[build]'

The [build] extras pull cryptography + requests, needed for the GCP service-account JWT signing the custom-container path uses. Without [build] only the curated client.train() and client.finetune() paths work.

The core package has zero required dependencies — heavy libraries (torch, transformers, peft, etc.) are only loaded when you actually call into code that needs them, and are pre-installed in most notebook environments.


Authentication

As of v0.8.x the SDK uses stateless API-key auth — no email/password prompts, no 12-hour certificate refreshes, no SSH key management on your side.

Get an API key

  1. Sign up at dataspires.com.
  2. Go to Profile → AfriLink SDK keys, click Create new key, copy the afk_live_… value (shown once).
  3. Add it to your notebook environment as AFRILINK_API_KEY.

Set the key

Where you run How to set the key
Google Colab 🔑 sidebar → Add secret → name AFRILINK_API_KEY, paste, enable for notebook
Kaggle Add-ons → Secrets → name AFRILINK_API_KEY, paste, attach to notebook
Local Jupyter / VS Code os.environ["AFRILINK_API_KEY"] = "afk_live_…" before client.authenticate()
Anywhere Pass directly: client.authenticate(api_key="afk_live_…")
from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()   # resolves from secret / env / argument in that order

What happens at auth time

Phase What runs
1. DataSpires session The SDK exchanges your API key at api.dataspires.com for a short-lived Supabase JWT used for billing writes (sessions, deduct_credits RPC)
2. A100 reachability Silent SSH probe to the OpenToken A100 to confirm your slot is live and pull-ready

Both phases together take ~1–2 seconds. The session keeps the JWT in memory for the kernel lifetime — no on-disk state. To rotate the key, revoke it on the dashboard and mint a new one.


Built-in User Guide

The SDK ships with an inline reference manual you can query from any notebook cell using a slash-style syntax:

import afrilink

afrilink/help          # top-level index of all topics
afrilink/quickstart    # step-by-step getting started guide
afrilink/auth          # authentication
afrilink/finetune      # finetune job parameters & training modes
afrilink/training      # general training jobs and containers
afrilink/specs         # A100 hardware spec sheet
afrilink/datasets      # dataset formats and upload
afrilink/billing       # rates, credits, invoices

Each page prints a formatted reference to your notebook output — no internet connection required.


API Reference

AfriLinkClient

Main entry point. Created once per notebook session.

Method Description
authenticate(api_key=None) Resolve API key (arg / env / Colab Secrets / Kaggle Secrets), exchange at api.dataspires.com, probe the A100
finetune(model, training_mode, data, gpus, ...) Create a FinetuneJob in the curated afrilink-finetune container
train(script, container, data, gpus, ...) Create a TrainJob in a curated container (afrilink-yolo)
find_existing_image(base_image, pip_packages, apt_packages, model_source, ...) Check the A100 + Artifact Registry for a matching cached image; returns {"image", "source", "spec_hash"} or None
build_image(base_image, pip_packages, apt_packages, script, model_source, ...) Build a custom Docker image on Cloud Build, push to private Artifact Registry
build_and_train(...) One-shot: cache-check → build (or skip) → run on the A100 → ephemeral cleanup
delete_built_image(job_id_or_image) Remove a built image from the A100's local Docker cache (Artifact Registry copy persists)
download_model(job_id, local_dir) Download the entire output/ directory from the A100
upload_dataset(local_path, dataset_name) Upload a dataset to the A100's job-scoped staging area
list_containers() List available curated training containers
list_available_models(size=None) List models in the registry
list_available_datasets() List datasets in the registry
get_model_requirements(model, training_mode) GPU/memory recommendations
cancel_job(job_id) Stop + remove a running container
run_command(command) Run arbitrary shell command on the A100

client.finetune()

job = client.finetune(
    model="qwen2.5-0.5b",         # model ID from registry
    training_mode="low",           # "low" | "medium" | "high"
    data=my_dataframe,             # pandas DataFrame, HF Dataset, or file path
    gpus=1,                        # silently clamped to 1 (A100 backend has 1 GPU)
    time_limit="01:00:00",         # max wallclock (HH:MM:SS)
    output_dir=None,               # default: /workspace/job/output
)

Training modes:

Mode Strategy Quantization
low QLoRA (rank 8) 4-bit
medium LoRA (rank 16) 8-bit / none
high Full LoRA (rank 64) none

The A100 backend has 1 GPU — distributed training (gpus>1) is silently clamped to 1 with a console note. Multi-GPU is on the roadmap.

client.train()

job = client.train(
    script="train_yolo.py",        # local Python script to upload and run
    container="afrilink-yolo",      # pre-built container
    data="./dataset/",              # local path, archive, DataFrame, or remote path
    data_config="dataset.yaml",     # config file (e.g. YOLO dataset.yaml)
    gpus=1,
    time_limit="04:00:00",
    script_args=["--epochs", "100"],
    extra_files=["weights.pt"],
    container_env={"KEY": "val"},
)

Curated containers (container= argument):

Name Frameworks Use case
afrilink-yolo Ultralytics, PyTorch, torchvision Object detection, segmentation, pose estimation
afrilink-finetune PyTorch, Transformers, PEFT, bitsandbytes LLM fine-tuning (used internally by client.finetune())

Need a different stack? Use client.build_image() / client.build_and_train() (next section).

Data handling:

Input type What happens
Local directory Uploaded via SCP to /mnt/data/sdk-jobs/<job_id>/input/, mounted at /workspace/job/input/ inside the container
.tar.gz / .zip archive Uploaded and extracted on the A100
Single file Uploaded to job directory
pandas.DataFrame Serialised to JSONL, uploaded
Path starting with / Treated as a remote A100 path (no upload)

TrainJob / FinetuneJob

Returned by client.train() / client.finetune().

Method / Property Description
run(wait=True) Submit to the A100. wait=True polls until done.
cancel() Stop + remove the running container
get_logs(tail=100) Fetch recent log lines
estimated_cost_usd() Estimate max cost based on GPUs and time limit
status Current status string
job_id AfriLink job ID (8-char UUID prefix)
container_id Docker container ID on the A100 (set after run())

run() returns a dict:

{
    "job_id": "a1b2c3d4",
    "container_id": "d9072f194771...",
    "status": "completed",        # or "submitted" / "failed" / "cancelled"
    "output_dir": "/mnt/data/sdk-jobs/a1b2c3d4/output",
    "billing": {
        "total_gpu_minutes": 5.0,
        "total_cost_usd": 0.1667,
        "rate_per_gpu_hour": 2.00,
        "billing_source": "wall-clock-docker",
    },
}

Custom Containers — client.build_image() / client.build_and_train()

If the curated containers don't have the framework, version, or model you need, define it yourself. Cloud Build builds the image, Artifact Registry hosts it, the A100 runs it ephemerally.

Define the spec

spec = dict(
    base_image="pytorch",                                # preset name, or full image:tag
    pip_packages=["transformers>=4.45", "accelerate>=0.34"],
    apt_packages=["git"],
    pip_index_url=None,                                  # optional alternative index
    pip_extra_index_urls=[],
    model_source={                                       # fetched at job runtime
        "kind": "huggingface",                           # huggingface | url | git | gs | s3
        "id": "Qwen/Qwen2.5-0.5B-Instruct",
        "revision": "main",
    },
    env={"WANDB_PROJECT": "demo"},                       # baked into image (non-secret)
)

Presets for base_image:

Preset Resolves to Notes
pytorch pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime GPU default
pytorch-2.4 pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime
pytorch-cpu pytorch/pytorch:2.5.0-cpu-runtime CPU-only build (smaller, no GPU at runtime)
cuda-12.4 nvidia/cuda:12.4.0-runtime-ubuntu22.04 bring-your-own-Python
ultralytics ultralytics/ultralytics:latest YOLOv8 ready

You can also pass any full image:tag you want.

Model sources (model_source=):

kind Required fields Example
"huggingface" id, optional revision, subfolder {"kind":"huggingface","id":"meta-llama/Llama-3.2-1B","revision":"main"}
"url" url {"kind":"url","url":"https://example.com/weights.tar.gz"}
"git" url, optional revision {"kind":"git","url":"https://github.com/openai/whisper.git"}
"gs" uri {"kind":"gs","uri":"gs://bucket/checkpoints/"}
"s3" uri {"kind":"s3","uri":"s3://bucket/checkpoints/"}
(omitted) Your script handles model loading itself

The model is fetched at container runtime, not baked at build time — that keeps user images thin (~2 GB instead of 7+ GB) and means you can iterate on dependencies without re-shipping weights. The downloaded model lands at /workspace/models/<sanitised_id>/ and the path is exposed via MODELS_DIR env var to your script.

For gated HF models (Llama, Gemma, etc.): add HUGGINGFACE_TOKEN as a notebook secret and the SDK forwards it to the container automatically.

Check the cache before building

hit = client.find_existing_image(
    base_image="pytorch",
    pip_packages=["transformers>=4.45", "accelerate>=0.34"],
    apt_packages=["git"],
    model_source={"kind": "huggingface", "id": "Qwen/Qwen2.5-0.5B-Instruct"},
)
# hit == None  → no match, will build
# hit == {"image": "...", "source": "a100" | "artifact_registry", "spec_hash": "..."}

The hash includes only the inputs that change what gets baked into the image:

Included Excluded
base_image (after preset resolution) script / script_content (uploaded but not baked)
pip_packages (sorted, exact strings) env (runtime injection, not bake-time)
apt_packages (sorted, exact strings) extra_files
pip_index_url / pip_extra_index_urls user_id / job_id
model_source (kind + id/url/uri + revision + subfolder) Cloud Build machine_type / build_timeout

Two specs that produce a runtime-equivalent image hash to the same value → instant cache hit. A version bump on any pip package, an extra apt dep, a different model revision → fresh hash → fresh build.

Build and run

# Build only — useful if you want to inspect the image or run it multiple ways
build = client.build_image(**spec, script="my_train.py")
# build["image"] = "europe-west4-docker.pkg.dev/.../<job>:latest"
# build["status"] = "success"
# build["build_id"] = "<...>"

# Build (or skip if cached) + run + cleanup, all in one call
result = client.build_and_train(
    **spec,
    script="my_train.py",
    data="./train.jsonl",
    gpus=1,
    time_limit_hours=1.0,
    reuse_existing_image=True,   # default; False forces a fresh build
    cleanup_image_after=True,    # default; False keeps the A100's local layer
)
# result["build"]["status"] = "success" (fresh) or "cached" (reused)
# result["run"]["status"]   = "completed"
# result["run"]["output_dir"] = "/mnt/data/sdk-jobs/<job_id>/output"

client.download_model(result["run"]["job_id"], "./local-out")

Container lifecycle

Where Lifetime
A100 disk (pulled image + container) Ephemeral — removed at end of build_and_train() unless cleanup_image_after=False
A100 disk (running container) Removed at job end always
Artifact Registry (image) Persistent — cache hits read from here on subsequent runs
Your notebook (downloaded output) Yours to manage

To delete an image from Artifact Registry too: gcloud artifacts docker images delete <uri>.


Working With Your Model

Once you've downloaded the adapter, the directory is ready for standard Hugging Face tooling.

GGUF Conversion & Ollama

Convert your adapter to GGUF format for use with Ollama or llama.cpp:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Merge adapter into base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
model = PeftModel.from_pretrained(base, "./my-model")
merged = model.merge_and_unload()
merged.save_pretrained("./my-model-merged")
AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")

# 2. Convert to GGUF (requires llama.cpp built locally)
# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf --outtype f16

# 3. Quantize (optional, 4-bit)
# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M

# 4. Run with Ollama
# Create a Modelfile:  FROM ./my-model-q4.gguf
# ollama create my-model -f Modelfile
# ollama run my-model

Publishing to Hugging Face Hub

from huggingface_hub import HfApi

api = HfApi(token="hf_...")
repo_id = "your-username/my-finetuned-model"
api.create_repo(repo_id, exist_ok=True)

# Option A — adapter only (small, loads on top of base model)
api.upload_folder(folder_path="./my-model", repo_id=repo_id)

# Option B — full merged model
api.upload_folder(folder_path="./my-model-merged", repo_id=repo_id)

# Option C — GGUF file
api.upload_file(path_or_fileobj="./my-model-q4.gguf",
                path_in_repo="my-model-q4.gguf",
                repo_id=repo_id)

Hardware Specs

OpenToken A100 80 GB (opentoken.global) — the dedicated GPU node the SDK runs on:

Component Specification
GPU 1× NVIDIA A100 PCIe
GPU memory 80 GB HBM2e
FP64 performance 9.7 TFLOPS
FP32 performance 19.5 TFLOPS
TensorFloat-32 156 TFLOPS
BF16 / FP16 (tensor cores) 312 TFLOPS
CPU cores 12
System RAM 82 GB
Job runtime Containerised (Docker, CUDA-aware via --gpus all)

Per-job memory guide for 1× A100 80 GB:

Model size Training mode Fits on 1 GPU?
0.5B – 1B low (QLoRA 4-bit) yes
3B – 7B low / medium yes
7B – 13B low (QLoRA) yes
13B high (bf16) tight — checkpoint-heavy
30B+ low (QLoRA) marginal

Billing

$2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance via the deduct_credits Supabase RPC at job end. Invoices appear on the DataSpires Billing dashboard in real time.

Build-time minutes on Cloud Build are absorbed by the platform — you only pay GPU-time.


Model & Dataset Registry

client.list_available_models()                      # all models
client.list_available_models(size="tiny")           # tiny | small | medium | large
client.list_available_datasets()
client.get_model_requirements("qwen2.5-0.5b", "low")

Curated models:

ID Name Type Params Min VRAM
qwen2.5-0.5b Qwen 2.5 0.5B text 0.5B 4 GB
gemma-3-270m Gemma 3 270M text 0.27B 2 GB
llama-3.2-1b Llama 3.2 1B text 1.0B 4 GB
deepseek-r1-1.5b DeepSeek R1 1.5B text 1.5B 6 GB
ministral-3b Ministral 3B text 3.3B 8 GB
florence-2-base Florence 2 Base vision 0.23B 4 GB
smolvlm-256m SmolVLM 256M vision 0.26B 2 GB
moondream2 Moondream 2 vision 1.9B 8 GB
internvl2-1b InternVL2 1B vision 1.0B 4 GB
llava-1.5-7b LLaVA 1.5 7B vision 7.0B 16 GB

For anything outside this registry, use client.build_image() / client.build_and_train() with model_source=.


Architecture

Notebook (Colab / Kaggle / Local)         api.dataspires.com (Cloudflare Worker)
+---------------------+                   +---------------------------+
| AfriLink SDK        | --- POST -----→   | exchange afk_live_… for:  |
|  client.authenticate()                  |  - Supabase JWT (billing) |
|                     | ←-- response ---  |  - A100 SSH key (in-mem)  |
+---------------------+                   |  - GCP SA key (build)     |
     |        ↓                           |  - GHCR PAT (image pulls) |
     |   (in-memory state)                +---------------------------+
     |
     ↓
+---------------------+        SSH        +---------------------+
| docker_runner.py    | ----------------→ | OpenToken A100 80GB |
|  - prepare_job_dir  |   /mnt/data/      |  Docker daemon      |
|  - upload via SCP   |    sdk-jobs/      |  (containerd at     |
|  - docker run --gpus=all                |   /mnt/data/)       |
|  - docker inspect (poll)                +---------------------+
+---------------------+
     |
     ↓ build path
+---------------------+   Cloud Build    +---------------------+
| build.py            | --→ submit job → | europe-west4-       |
|  - generate Docker- |     (anadrome)   | docker.pkg.dev/...  |
|    file from spec   |                  |  afrilink-user-     |
|  - tar build context|                  |  images/<user>/<job>|
|  - upload to GCS    |                  +---------------------+
+---------------------+                            |
                                                   ↓ docker pull
                                              (A100 fetches image,
                                               runs it, deletes
                                               local layer at end)

The A100 backend, the Cloudflare Worker, the Cloud Build pipeline, the Artifact Registry, the Supabase backend — all of it lives behind client.authenticate(). As a user you set one notebook secret and get on with training.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afrilink_sdk-0.8.16.tar.gz (146.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

afrilink_sdk-0.8.16-py3-none-any.whl (148.3 kB view details)

Uploaded Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page