Skip to main content

Config-driven LLM fine-tuning with safety evaluation, EU AI Act compliance, 6 alignment methods, and one-command bundled quickstart templates.

Project description

ForgeLM

PyPI License Python 3.10+ CI

Runs on Linux, macOS, and Windows. The base pip install forgelm command is identical on every platform — no Linux-specific install path. The PyPI metadata's Operating System :: OS Independent classifier is backed by a release-tag CI matrix that builds the wheel on Linux and re-installs + tests it across 3 operating systems × 4 Python versions = 12 combinations (Ubuntu, macOS, Windows × Python 3.10, 3.11, 3.12, 3.13) — every combo must pass before PyPI publish runs. See .github/workflows/publish.yml.

Two optional extras — qlora (bitsandbytes-backed 4-bit quantization) and unsloth (fused-kernel SFT acceleration) — depend on Linux-only upstream wheels; on macOS/Windows pip install "forgelm[qlora]" and pip install "forgelm[unsloth]" complete successfully but the underlying packages are skipped via a sys_platform == 'linux' marker, so the corresponding functionality is unavailable. All other extras (eval, tracking, distributed, ingestion, ingestion-scale, ingestion-pii-ml, export, chat, merging) work cross-platform.

ForgeLM is a config-driven, enterprise-ready LLM fine-tuning toolkit. It supports the full modern post-training stack — from supervised fine-tuning to preference alignment to reasoning RL — with integrated safety evaluation, EU AI Act compliance, and CI/CD-native design.

Features

Training

  • 6 Trainer Types: SFT, DPO, SimPO, KTO, ORPO, GRPO — the complete alignment stack
  • Unsloth & Transformers: 2-5x faster training with unsloth backend, or standard transformers
  • 4-Bit QLoRA & DoRA: NF4 quantization with LoRA, DoRA, PiSSA, and rsLoRA support
  • GaLore: Optimizer-level memory optimization — full-parameter training via gradient low-rank projection (alternative to LoRA)
  • Long-Context Training: RoPE scaling, NEFTune noise injection, sliding window attention, sample packing
  • Multi-Dataset Training: Mix multiple datasets with configurable ratios
  • Synthetic Data Pipeline: Teacher-to-student distillation with --generate-data CLI flag
  • DeepSpeed & FSDP: Multi-GPU distributed training with ZeRO-2/3 presets
  • MoE Support: Fine-tune Mixture of Experts models (Qwen3, Mixtral, DeepSeek)
  • GPU Cost Estimation: Auto-detection for 16 GPU models with per-run cost tracking

Evaluation & Safety

  • Automated Benchmarking: Post-training evaluation via lm-evaluation-harness
  • Safety Evaluation: Llama Guard classifier with confidence-weighted scoring, S1-S14 harm categories, severity levels, cross-run trend tracking, and auto-revert
  • LLM-as-Judge: API-based (OpenAI) or local model scoring for quality assessment
  • Auto-Revert: Opt-in (evaluation.auto_revert: true) — automatically discards models that fail loss, benchmark, or safety thresholds before artifacts are written

Document Ingestion & Data Audit (v0.5.0 — Phases 11 + 11.5 + 12 + 12.5 consolidated)

  • Multi-Format Ingestion: forgelm ingest ./policies/ --recursive --output data/policies.jsonl — turns raw PDF / DOCX / EPUB / TXT / Markdown into the SFT-ready JSONL the trainer accepts. Optional dep: pip install forgelm[ingestion]. Includes a Markdown-aware splitter (--strategy markdown) and DOCX table preservation in Markdown table syntax.
  • Chunking Strategies: paragraph (default; preserves boundaries), sliding (fixed window with overlap), or markdown (heading-aware splitter that keeps fenced code blocks atomic and inlines a heading breadcrumb at the top of each chunk for SFT context). A semantic strategy is reserved for a follow-up phase — the implementation in forgelm.ingestion raises NotImplementedError today and the CLI hides it from --strategy choices. Token-aware mode: --chunk-tokens 1024 --tokenizer Qwen/Qwen2.5-7B-Instruct sizes chunks against your model's actual vocabulary.
  • PII Masking on Ingest: --pii-mask redacts emails, phones, credit cards (Luhn-validated), IBAN, and national IDs (TR / DE / FR / US-SSN) before chunks land in the JSONL.
  • PDF Page Header/Footer Dedup: Lines that recur on ≥ 70 % of PDF pages (watermarks, page numbers, copyright lines) are stripped automatically — the audit's near-duplicate counts stop misfiring on long policy / book PDFs.
  • Dataset Audit: forgelm audit data/sft.jsonl --output ./audit/ — produces data_audit_report.json with sample count, length distribution, top-3 language detection, LSH-banded near-duplicate rate (O(n × k) typical case, exact recall at the default Hamming threshold; optional MinHash LSH via --dedup-method minhash for >50K-row corpora), cross-split leakage check, null/empty rate, and PII flag counts with severity tiers (critical / high / medium / low + worst-tier verdict). Always-on secrets/credential scan covers the nine families enumerated in forgelm.data_audit.SECRET_TYPES — AWS access keys, GitHub tokens, Slack tokens, OpenAI API keys, Google API keys, JWTs, full OpenSSH private-key blocks, full PGP private-key blocks, and Azure storage connection strings — and an opt-in heuristic quality filter (--quality-filter). CPU-only; streaming JSONL reader keeps memory bounded on multi-million-row splits; feeds EU AI Act Article 10 governance artifact automatically when present at training time. Legacy --data-audit flag still works as a deprecation alias.
  • Secrets-Aware Ingest: forgelm ingest … --secrets-mask scrubs credentials before chunks land in the JSONL — fine-tuning on text containing real API keys memorises them at training time. Pairs with --pii-mask; secrets run first so combined detectors don't double-count overlapping spans. --all-mask is a one-flag shorthand for both.
  • Optional ML-NER PII: forgelm audit --pii-ml [--pii-ml-language LANG] layers Presidio NER on top of the regex detector via the optional [ingestion-pii-ml] extra plus a separate python -m spacy download en_core_web_lg step (or the matching model for the chosen language). Adds person / organization / location categories into the same pii_summary / pii_severity blocks under disjoint category names.
  • Croissant 1.0 dataset card: forgelm audit --croissant emits a Google Croissant 1.0 dataset card under the report's croissant key so the same JSON file doubles as both the EU AI Act Article 10 governance artifact and a Croissant-consumer dataset card.
  • Wizard "audit first": when the wizard resolves a JSONL (typed or produced by forgelm ingest) it offers to run forgelm audit inline and prints the verdict before continuing — closes the BYOD audit loop end-to-end.

Quickstart Layer (v0.4.5)

  • One-Command Templates: forgelm quickstart customer-support — 5 bundled templates (SFT customer-support, code-assistant, medical-qa-tr, domain-expert, GRPO grpo-math). Auto-downsizes models on small GPUs.
  • Conservative Defaults: Every template ships QLoRA 4-bit, rank=8, batch=1, gradient checkpointing on — designed to run on a single 12 GB GPU.
  • Wizard Integration: forgelm --wizard opens with "Start from a template?" — same code paths, same YAML schema as a hand-written config.

Post-Training (v0.4.0)

  • Interactive Chat: forgelm chat ./model — streaming REPL with /reset, /save, /temperature, /system commands; optional Llama Guard safety routing
  • GGUF Export: forgelm export ./model --quant q4_k_m — wraps llama-cpp-python converter; 6 quant levels; SHA-256 appended to integrity manifest
  • Deployment Configs: forgelm deploy ./model --target ollama|vllm|tgi|hf-endpoints — generates ready-to-use config files; does not start the server
  • VRAM Fit Check: forgelm --config my.yaml --fit-check — pre-flight memory estimator; FITS / TIGHT / OOM / UNKNOWN verdict with recommendations

Enterprise & MLOps

  • Config-Driven: Declarative YAML — built for CI/CD pipelines, not notebooks
  • EU AI Act Compliance: Auto-generated audit trails, data provenance (SHA-256), training manifests
  • ISO 27001 / SOC 2 Type II Alignment: Software cannot be ISO/SOC 2 certified — only organisations can — but ForgeLM produces the audit-trail, change-management, data-lineage, and supply-chain evidence the deployer's auditor explicitly asks for. CycloneDX 1.5 SBOM per release, pip-audit nightly, bandit CI, append-only HMAC-chained audit log (HMAC integrity is active only when FORGELM_AUDIT_SECRET is set + KMS-managed + rotated between output-dir lifecycles — see docs/qms/access_control.md §3.4), Article 14 staging gate, Article 15/17 GDPR tooling. See docs/guides/iso_soc2_deployer_guide.md for the deployer audit cookbook + 93-control coverage map.
  • Library API: from forgelm import ForgeTrainer, audit_dataset, verify_audit_log, verify_annex_iv_artifact, verify_gguf, mask_pii, mask_secrets, ... — every CLI surface has a stable Python entry point under forgelm.__all__, version-pinned via forgelm.__api_version__ (decoupled from __version__). Run python -c "import forgelm; print(sorted(forgelm.__all__))" for the full surface.
  • GDPR Tooling: forgelm purge --row-id <id> --corpus <path> (Article 17 right-to-erasure: redacts data + adapters in place; the audit log itself is append-only — Article 17(3)(b) preservation — and the erasure is recorded by appending six compensating events (data.erasure_requested, data.erasure_completed, etc.) so the hash chain remains intact) and forgelm reverse-pii --query <fragment> (Article 15 right-of-access: locates PII matches across artefacts without re-loading raw data).
  • Operational Subcommands: forgelm doctor (env / GPU / CUDA / extras pre-flight), forgelm cache-models + forgelm cache-tasks (air-gap pre-cache for HF models + lm-eval tasks), forgelm safety-eval (standalone Llama Guard run), forgelm verify-audit / verify-annex-iv / verify-gguf (compliance + artefact verification toolbelt), forgelm approvals (list pending Article 14 staging-gate runs).
  • Docker: Official Dockerfile and docker-compose for portable deployment
  • Offline / Air-Gapped: Full operation without internet for regulated industries
  • JSON Output: Machine-readable results with --output-format json for pipeline integration; the per-subcommand schema is locked in docs/usermanuals/en/reference/json-output.md.
  • Webhook Notifications: Slack/Teams alerts on training start, success, failure, reverted, or awaiting_approval (5-event vocabulary, paired with audit events)
  • W&B / MLflow / TensorBoard: Flexible experiment tracking via report_to
  • Model Card Generation: Auto-generated HF-compatible model cards with metrics and benchmarks
  • Model Merging: TIES, DARE, SLERP, linear merge of multiple adapters via --merge
  • Supply-Chain Security: CycloneDX 1.5 SBOM per release-tag matrix combo (12 SBOMs per tag), pip-audit + bandit nightly + on-tag, gitleaks pre-commit. See docs/reference/supply_chain_security.md.

Quick Start

# Install
pip install -e .
pip install -e ".[export]"   # GGUF export (optional, non-Windows)

# Fastest path: pick a bundled template (v0.4.5+)
forgelm quickstart --list
forgelm quickstart customer-support           # render config + train + chat
forgelm quickstart code-assistant --dry-run   # render config only
forgelm quickstart medical-qa-tr --model your-org/your-model  # override

# Have raw docs? Ingest them first (v0.5.0; supports token-aware sizing)
pip install -e ".[ingestion]"
forgelm ingest ./policies/ --recursive --output data/policies.jsonl
forgelm audit data/policies.jsonl --output ./audit/   # `forgelm --data-audit ...` still works as legacy alias

# Or generate config interactively
forgelm --wizard

# Validate without training
forgelm --config my_config.yaml --dry-run

# Check VRAM before a long run
forgelm --config my_config.yaml --fit-check

# Train
forgelm --config my_config.yaml

# After training: chat, export, deploy
forgelm chat ./checkpoints/final_model
forgelm export ./checkpoints/final_model --output model.gguf --quant q4_k_m
forgelm deploy ./checkpoints/final_model --target ollama --output ./Modelfile

See the Quick Start Guide for a complete walkthrough.


Guides

Guide Description
Quick Start First fine-tuned model in 5 minutes
Document Ingestion (Türkçe) Raw PDF/DOCX/EPUB → SFT-ready JSONL
Dataset Audit (Türkçe) Length, language, dedup, cross-split leakage, PII
Alignment (DPO/SimPO/KTO/GRPO) Complete post-training stack
CI/CD Pipeline Integration GitHub Actions, GitLab CI, Docker
Enterprise Deployment Docker, air-gapped, multi-GPU
Safety & Compliance EU AI Act, safety evaluation
Distributed Training DeepSpeed ZeRO, FSDP, multi-node
Troubleshooting & FAQ Common issues and solutions

Reference Documentation

  1. Architecture Overview (Türkçe)
  2. Configuration Guide (Türkçe)
  3. Usage & Execution (Türkçe)
  4. Data Preparation Format (Türkçe)
  5. Product Strategy (Türkçe)
  6. Roadmap (Türkçe)

Notebooks

Each notebook is runnable in Colab with a free T4 GPU. Data preparation runs CPU-only.

Getting started

Alignment methods (post-SFT preference / RL)

Advanced training

Post-training & safety


Installation

# From PyPI
pip install forgelm

# Or from source
git clone https://github.com/cemililik/ForgeLM.git
cd ForgeLM
pip install -e .

Prerequisites: Python 3.10+ and torch>=2.2.0. The PyTorch Foundation publishes torch wheels for Linux × {x86_64, aarch64}, macOS × {x86_64 ≤ 2.2.2, arm64 (Apple Silicon)}, and Windows × x86_64; ForgeLM's dependency floor is set to the highest torch minor available on every supported platform. If pip install forgelm resolves to an older ForgeLM version, run pip show torch to check the installed torch — pip's resolver silently downgrades ForgeLM when the local torch can't satisfy a newer floor. See docs/usermanuals/en/getting-started/installation.md for platform-specific notes.

Optional Dependencies

From PyPI (most users):

pip install "forgelm[qlora]"             # 4-bit quantization (Linux)
pip install "forgelm[unsloth]"           # Unsloth backend (Linux)
pip install "forgelm[eval]"              # lm-evaluation-harness benchmarks
pip install "forgelm[tracking]"          # W&B experiment tracking
pip install "forgelm[distributed]"       # DeepSpeed multi-GPU
pip install "forgelm[merging]"           # mergekit model merging
pip install "forgelm[ingestion]"         # PDF/DOCX/EPUB/Markdown → JSONL + langdetect + xxhash
pip install "forgelm[ingestion-scale]"   # MinHash LSH dedup (datasketch) for >50K-row corpora
pip install "forgelm[ingestion-pii-ml]"  # Presidio ML-NER for person/organization/location PII (Phase 12.5; ALSO needs `python -m spacy download en_core_web_lg`)
pip install "forgelm[export]"            # GGUF export via llama-cpp-python
pip install "forgelm[chat]"              # Rich terminal rendering for `forgelm chat`

From a local clone (contributors):

pip install -e ".[ingestion,eval,tracking]"  # Editable install, multiple extras
pip install -e ".[dev]"                      # pytest, ruff (development)

Docker

# Build (with benchmarking support)
docker build -t forgelm --build-arg INSTALL_EVAL=true .

# Train
docker run --gpus all \
  -v $(pwd)/my_config.yaml:/workspace/config.yaml \
  -v $(pwd)/output:/workspace/output \
  forgelm --config /workspace/config.yaml

# Multi-GPU
docker run --gpus all --shm-size=16g \
  forgelm:latest \
  torchrun --nproc_per_node=4 -m forgelm.cli --config /workspace/config.yaml

CLI

forgelm --config job.yaml                    # Train
forgelm --config job.yaml --dry-run          # Validate config
forgelm --config job.yaml --output-format json  # JSON output for CI/CD
forgelm --config job.yaml --resume           # Resume from checkpoint
forgelm --config job.yaml --offline          # Air-gapped mode
forgelm --config job.yaml -q                 # Quiet mode (warnings only)
forgelm --config job.yaml --benchmark-only /path/to/model  # Evaluate only
forgelm --config job.yaml --merge            # Merge models
forgelm --config job.yaml --compliance-export ./audit/  # Export audit artifacts
forgelm --wizard                             # Interactive config generator
forgelm --version                            # Show version

Project Structure

forgelm/
├── cli/              # CLI package (split out of legacy single-file cli.py)
│   ├── __main__.py   # `python -m forgelm.cli` entrypoint
│   ├── _parser.py    # argparse wiring for the top-level + every subcommand
│   ├── _dispatch.py  # subcommand router; one entry per *_subcommand* below
│   ├── _exit_codes.py# 0/1/2/3/4/5 — public CLI contract
│   └── subcommands/  # _audit, _ingest, _chat, _export, _deploy, _quickstart,
│                     # _doctor, _cache, _purge, _reverse_pii, _approve,
│                     # _approvals, _safety_eval, _verify_audit,
│                     # _verify_annex_iv, _verify_gguf, ...
├── data_audit/       # Audit package (split out of legacy single-file data_audit.py)
│   ├── _orchestrator.py   # `run_audit` entry point + parallel split walker
│   ├── _aggregator.py     # per-split metric aggregator
│   ├── _streaming.py      # streaming JSONL reader
│   ├── _simhash.py        # 64-bit simhash + LSH banding (default dedup)
│   ├── _minhash.py        # MinHash LSH dedup (`[ingestion-scale]` extra)
│   ├── _pii_regex.py      # PII regex engine (Luhn, IBAN, TC Kimlik validators)
│   ├── _pii_ml.py         # Presidio NER adapter (`[ingestion-pii-ml]` extra)
│   ├── _secrets.py        # 9-family credential scan (always-on)
│   ├── _quality.py        # Gopher / C4 / RefinedWeb-style heuristics
│   ├── _croissant.py      # Croissant 1.0 dataset card emitter
│   └── _summary.py        # `summarize_report` truncation policy
├── config.py         # Pydantic config (19 models: training, evaluation, distributed, ...)
├── data.py           # Dataset loading (SFT, DPO, KTO, GRPO formats + multi-dataset)
├── ingestion.py      # Raw docs → SFT JSONL (PDF/DOCX/EPUB/TXT/Markdown + chunking + masking) — `forgelm ingest`
├── model.py          # Model loading (transformers, unsloth, MoE, PEFT)
├── trainer.py        # Training orchestration (6 trainer types via TRL, GaLore, long-context)
├── inference.py      # Shared inference primitives (load, generate, stream, adaptive sampling)
├── chat.py           # Interactive terminal REPL with streaming and slash commands
├── export.py         # GGUF export via llama-cpp-python
├── fit_check.py      # Pre-flight VRAM estimator (FITS / TIGHT / OOM / UNKNOWN)
├── deploy.py         # Deployment config generator (Ollama, vLLM, TGI, HF Endpoints)
├── results.py        # TrainResult dataclass (AuditReport lives in data_audit/, IngestionResult in ingestion.py)
├── benchmark.py      # lm-evaluation-harness integration
├── safety.py         # Post-training safety evaluation (Llama Guard)
├── judge.py          # LLM-as-Judge evaluation (API + local)
├── compliance.py     # EU AI Act compliance export, audit log + HMAC chain, GDPR purge / reverse-pii primitives
├── model_card.py     # Auto-generated HF model cards
├── merging.py        # Model merging (TIES, DARE, SLERP, linear)
├── synthetic.py      # Synthetic data generation (teacher→student distillation)
├── grpo_rewards.py   # Built-in GRPO reward shapers (format / length fallbacks)
├── quickstart.py     # `forgelm quickstart <template>` — bundled SFT / code / domain templates
├── wizard/           # Interactive configuration wizard sub-package (Phase 22 split)
│                     # — ships ``_defaults.json`` (schema-derived defaults SOT, F1)
├── webhook.py        # Slack/Teams webhook notifications (5-event vocabulary)
├── _http.py          # Single chokepoint for outbound HTTP (SSRF guard, timeout floor, secret masking)
├── _version.py       # `__version__` + `__api_version__` constants
└── utils.py          # Authentication & checkpoint management

configs/deepspeed/    # ZeRO-2, ZeRO-3, ZeRO-3+Offload presets
forgelm/safety_prompts/  # 140 adversarial prompts × 6 categories for safety evaluation
forgelm/templates/    # Quickstart templates (SFT, code-assistant, domain-expert, medical-qa-tr, grpo-math)
notebooks/            # Colab-ready Jupyter notebooks (data curation, SFT, DPO, KTO, GRPO, ...)
tests/                # Test suite — run with `pytest tests/` (count grows over time; CI gates on green-on-every-commit)
tools/                # CI guards (check_anchor_resolution, check_bilingual_parity,
                      # check_cli_help_consistency, check_field_descriptions,
                      # check_pip_audit, check_bandit, check_site_claims,
                      # generate_sbom, build_usermanuals)
docs/guides/          # Quickstart, ingestion, audit, alignment, CI/CD, enterprise, safety, ISO/SOC 2 guides
docs/usermanuals/{en,tr}/  # 9-section user manual: compliance, concepts, data,
                           # deployment, evaluation, getting-started, operations,
                           # reference, training

Pro CLI (planned — v0.6.0-pro)

A paid tier built on top of the OSS core. Every Pro feature ships with a documented OSS workaround — Pro is for convenience and scale, not gatekeeping.

  • forgelm pro dashboard — local-first experiment browser (run list, metric comparisons, config diffs, artifact browser) backed by your existing checkpoints/ and audit_log.jsonl
  • HPO via Optuna — hpo: config block spawns N subordinate training runs and emits a best-config YAML
  • Scheduled training jobs — cron-style schedule: field with a daemon runner
  • Team config store — forgelm pro team push/pull for shared golden-config patterns
  • Live GPU cost estimation — real-time spot pricing from RunPod, Lambda Labs, vast.ai

Gated by adoption signal from v0.5.x — will not start before ≥1 K monthly PyPI installs. See docs/roadmap/phase-13-pro-cli.md.


License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forgelm-0.5.7.tar.gz (724.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forgelm-0.5.7-py3-none-any.whl (459.7 kB view details)

Uploaded Python 3

File details

Details for the file forgelm-0.5.7.tar.gz.

File metadata

  • Download URL: forgelm-0.5.7.tar.gz
  • Upload date:
  • Size: 724.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forgelm-0.5.7.tar.gz
Algorithm Hash digest
SHA256 4a2d1224c13ff74129f88e3b7cacf095ba56ba75f6f5891c95cde6d3f4ecb96f
MD5 9d741a5f78fc112d93f4a44eb5a52c3b
BLAKE2b-256 1f624a7f6929facbf42351e011aba446ed90a344cdfadb560782c3015e359fbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgelm-0.5.7.tar.gz:

Publisher: publish.yml on cemililik/ForgeLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgelm-0.5.7-py3-none-any.whl.

File metadata

  • Download URL: forgelm-0.5.7-py3-none-any.whl
  • Upload date:
  • Size: 459.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for forgelm-0.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d00d2faf609ffce6e2a44ed46c1b3c29a894bf36be7528df10f36826afcfe572
MD5 6866d104b287f1f5b63085d8f3c799f4
BLAKE2b-256 0f12de503a887bc5f499213022b78e5c000e6d3005eb8792e53e00d8f948efbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgelm-0.5.7-py3-none-any.whl:

Publisher: publish.yml on cemililik/ForgeLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page