Skip to main content

Runtime de inferencia puro NumPy para extraccion de pesos, capas y activaciones de LLMs y CNNs

Project description

micronnx

Capa de extracción de pesos, capas y activaciones para sistemas de fusión de modelos. Parte del ecosistema UFM (Unified Fusion Model).

micronnx no es un framework de entrenamiento ni un motor de inferencia general. Es una pieza de recolección: carga cualquier modelo desde cualquier formato, extrae sus pesos y activaciones capa a capa, y los expone en una estructura unificada lista para que UFM tome decisiones de fusión.

Sin PyTorch. Sin TensorFlow. Solo NumPy.


Instalación

pip install micronnx


¿Qué hace micronnx?

  • Carga de modelos: GGUF (Q2_K–Q6_K), SafeTensors, HDF5/Keras, NPY/NPZ
  • Extracción de pesos: todos los tensores en float16, etiquetados con rol y capa
  • Extracción de activaciones: capa a capa — embed, attn, ffn, residual, norm, pool
  • Exportación unificada: uno o varios modelos a un solo .npz con índice completo (incluye schema y hp)
  • Carga desde .npz sin archivo original: NpzModelLoader lee pesos directo del .npz
  • Ops vectorizadas: RMSNorm, Attention, RoPE, SwiGLU, Conv2D, etc. en NumPy puro

Formatos soportados

GGUF (.gguf) — Llama, Mistral, Gemma, Phi, Falcon, ChatGLM, Mixtral-MoE, DeepSeek-MoE SafeTensors (.safetensors) — cualquier modelo HuggingFace HDF5/Keras (.h5, .keras) — MobileNetV1 y CNNs feed-forward NumPy (.npy, .npz) — arrays directos


Uso

Exportar modelos a .npz

import micronnx as nx

Un solo modelo

nx.export_to_npz("model.gguf", "model.npz")

Varios modelos, un .npz por cada uno

nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/" )

Varios modelos, un solo .npz fusionado (guarda schema y hp automáticamente)

nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/merged.npz", merge=True )

Con string separado por comas

nx.export_to_npz("model.gguf, model.safetensors", "outputs/", merge=False)


Inspeccionar un .npz

import micronnx as nx

nx.inspect_npz("outputs/merged.npz")

Merged : 2 modelos

Total : 269,030,016 params | 513.14 MB float16

[SmolLM2-135M] 272 tensores | 30 capas | 256.57 MB | schema: gguf_llama

[model] 272 tensores | 30 capas | 256.57 MB | schema: hf_llama

Leer el índice sin cargar tensores

idx = nx.load_index("outputs/merged.npz") print(idx["n_models"]) print(idx["total_params"]) print(idx["models"].keys())

Ver schema y hp de cada modelo

for name, info in idx["models"].items(): print(name, info["schema_name"], info["hp"])


Cargar desde .npz sin archivo original

import micronnx as nx

Lista modelos disponibles en el .npz

models = nx.list_models("outputs/merged.npz") for name, info in models.items(): print(name, info["schema_name"], info["has_hp"], info["has_activations"])

Cargar un modelo directo del .npz

loader = nx.NpzModelLoader("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)


Extraer activaciones — LLM

import numpy as np import micronnx as nx

Desde archivo original

loader = nx.GGUFLoader("SmolLM2-135M-Instruct-Q4_K_M.gguf") schema_name, hp = nx.detect_schema_gguf("SmolLM2-135M-Instruct-Q4_K_M.gguf") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)

O desde .npz sin archivo original

loader = nx.NpzModelLoader("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)

Extraer todas las activaciones

extractor = nx.ActivationExtractor(runner) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64)) print(extractor.activations.keys())

embed, attn_norm_0, post_attn_0, residual_attn_0,

ffn_norm_0, post_ffn_0, residual_ffn_0, ..., final_norm

Solo las capas que te interesan

extractor = nx.ActivationExtractor(runner, hooks=["post_attn", "residual_ffn"]) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64))

Reducir secuencia al último token

extractor = nx.ActivationExtractor(runner, reduce="last")

Reducir secuencia a la media

extractor = nx.ActivationExtractor(runner, reduce="mean")

Solo capas pares

extractor = nx.ActivationExtractor(runner, layer_fn=lambda i: i % 2 == 0)


Extraer activaciones — CNN (MobileNet)

import numpy as np import micronnx as nx

raw = nx.H5Loader("mobilenet_1_0_224_tf.h5") mapped = nx.map_tensors(dict.fromkeys(raw.tensor_names), fmt="h5") loader = nx.CanonicalLoader(raw, mapped) runner = nx.CNNRunner(loader, n_blocks=13)

Imagen 224x224x3 normalizada en [-1, 1]

image = np.random.uniform(-1, 1, (224, 224, 3)).astype(np.float32)

Forward directo

probs = runner.forward(image) print(f"clase: {probs.argmax()}, confianza: {probs.max():.3f}")

Con extracción de activaciones

extractor = nx.CNNActivationExtractor(runner, reduce="spatial_mean") probs = extractor.run(image) print(extractor.activations.keys())

stem, block_0_dw, block_0_pw, ..., block_12_pw, pooled


Guardar y leer activaciones en el .npz

import numpy as np import micronnx as nx

merged = nx.export_to_npz( ["SmolLM2-135M-Instruct-Q4_K_M.gguf", "model.safetensors"], "outputs/merged.npz", merge=True )

loader = nx.NpzModelLoader(merged, "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz(merged, "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp) extractor = nx.ActivationExtractor(runner, reduce="last") extractor.run(np.array([[1, 2, 3]], dtype=np.int64))

nx.save_activations(merged, extractor.activations, model_key="SmolLM2-135M-Instruct-Q4_K_M")

Leer después sin recargar el modelo

acts = nx.load_activations(merged, model_key="SmolLM2-135M-Instruct-Q4_K_M") print(acts["final_norm"].shape)


Loaders directos

import micronnx as nx

GGUF

loader = nx.GGUFLoader("model.gguf") print(loader.tensor_names[:5]) w = loader.load("token_embd.weight") loader.close()

SafeTensors

loader = nx.SafeTensorsLoader("model.safetensors") w = loader.load("model.embed_tokens.weight") loader.close()

HDF5

loader = nx.H5Loader("mobilenet.h5") w = loader.load("conv1/kernel:0") loader.close()

NPY/NPZ

loader = nx.NpyLoader("weights.npz") w = loader.load("layer_0")

Directo desde .npz merged (sin archivo original)

loader = nx.NpzModelLoader("outputs/merged.npz", "model") w = loader.load("model.embed_tokens.weight") loader.close()


Detectar schema y arquitectura

import micronnx as nx

Desde archivo original

schema_name, hp = nx.detect_schema_gguf("model.gguf") print(schema_name)

gguf_llama / gguf_gemma2 / gguf_phi3 / gguf_falcon ...

print(hp)

{"n_layers": 30, "n_heads": 9, "n_kv_heads": 3, "n_embd": 576, "vocab_size": 49152}

schema_name, hp = nx.detect_schema_safetensors("model.safetensors")

Desde .npz sin archivo original

schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M")

print(list(nx.SCHEMAS.keys()))

['llama', 'gemma', 'gemma2', 'phi3', 'mistral', 'mixtral',

'falcon', 'bloom', 'chatglm', 'deepseek_moe', ...]


Ops NumPy directas

import numpy as np import micronnx as nx

x = np.random.randn(1, 16, 576).astype(np.float32) w = np.ones(576, dtype=np.float32)

x = nx.rmsnorm(x, w) x = nx.layernorm(x, w, w) x = nx.silu(x) x = nx.gelu(x) x = nx.relu(x) x = nx.softmax(x, axis=-1)

q = np.random.randn(1, 4, 9, 64).astype(np.float32) k = np.random.randn(1, 4, 3, 64).astype(np.float32) v = np.random.randn(1, 4, 3, 64).astype(np.float32) out = nx.attention(q, k, v, n_heads=9, n_kv_heads=3)

img = np.random.randn(112, 112, 32).astype(np.float32) filt = np.random.randn(3, 3, 32, 64).astype(np.float32) out = nx.conv2d(img, filt, stride=1, padding=1) out = nx.global_avg_pool(out)


API completa

Loaders nx.GGUFLoader(path) nx.SafeTensorsLoader(path) nx.H5Loader(path) nx.NpyLoader(path) nx.NpzModelLoader(npz_path, model_key) ← carga desde .npz sin archivo original

Detección de schema nx.detect_schema_gguf(path) → (schema_name, hp) nx.detect_schema_safetensors(path) → (schema_name, hp) nx.detect_schema_hf(path) → (schema_name, hp) nx.detect_schema_npz(npz_path, key) → (schema_name, hp) ← desde .npz nx.list_models(npz_path) → dict con metadata de todos los modelos nx.SCHEMAS → dict con todos los schemas

Runtime LLM nx.ModelRunner(loader, schema_name, hp, max_seq=2048) nx.ActivationExtractor(runner, hooks=None, reduce=None, layer_fn=None) reduce: None | "last" | "mean" hooks: ["embed", "post_attn", "residual_ffn", "final_norm", ...] layer_fn: callable(i: int) -> bool

Runtime CNN nx.CNNRunner(loader, n_blocks=13) nx.CNNActivationExtractor(runner, reduce=None) reduce: None | "spatial_mean" nx.CanonicalLoader(raw_loader, mapped)

Canónico nx.map_tensors(tensors, fmt) nx.find_unmapped(tensors, fmt, mapped) nx.resolve_tied_embeddings(mapped) nx.detect_format(tensors)

Exportador nx.export_to_npz(src, dst, fmt=None, merge=False, verbose=True) nx.load_index(path) nx.inspect_npz(path, n=20) nx.save_activations(path, acts, model_key=None) nx.load_activations(path, model_key=None)

Ops nx.rmsnorm / nx.layernorm / nx.batchnorm nx.softmax / nx.sigmoid / nx.relu / nx.gelu / nx.silu nx.attention / nx.rope / nx.linear / nx.embedding nx.swiglu / nx.swiglu_fused / nx.geglu / nx.ffn_gelu nx.conv2d / nx.depthwise_conv2d / nx.global_avg_pool / nx.max_pool2d


Dependencias

numpy >= 1.24 pyfive (para archivos .h5 / .keras)


Parte del ecosistema UFM

micronnx es la capa de recolección de UFM (Unified Fusion Model). Su única responsabilidad es exponer pesos y activaciones en una estructura uniforme. La lógica de fusión, compatibilidad y ajuste fino vive en UFM, no aquí.


Licencia

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micronnx-0.1.2.1.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

micronnx-0.1.2.1-py3-none-any.whl (47.0 kB view details)

Uploaded Python 3

File details

Details for the file micronnx-0.1.2.1.tar.gz.

File metadata

  • Download URL: micronnx-0.1.2.1.tar.gz
  • Upload date:
  • Size: 45.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.33.1

File hashes

Hashes for micronnx-0.1.2.1.tar.gz
Algorithm Hash digest
SHA256 3dac138cc498d120e64dcad31c47654bca64dc5e6cbcf1deb4aa90472f9a1650
MD5 d626197b1de843c10bce639e289624be
BLAKE2b-256 cbeda5fd545d9906670c37821b414aa36f0acc62e04f0d94c3beef316f222d51

See more details on using hashes here.

File details

Details for the file micronnx-0.1.2.1-py3-none-any.whl.

File metadata

  • Download URL: micronnx-0.1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 47.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.33.1

File hashes

Hashes for micronnx-0.1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c538c8033c3d5e867a254ea0d09b752d180037c9a14f951cb05392901dfc1afd
MD5 7a45cdfea1564e7939ef3f72aded4947
BLAKE2b-256 7265cabad9501659233a0aa4323f76813828db8660408ed91eccfa037eef34f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page