Runtime de inferencia puro NumPy para extraccion de pesos, capas y activaciones de LLMs y CNNs
Project description
micronnx
Capa de extracción de pesos, capas y activaciones para sistemas de fusión de modelos. Parte del ecosistema UFM (Unified Fusion Model).
micronnx no es un framework de entrenamiento ni un motor de inferencia general. Es una pieza de recolección: carga cualquier modelo desde cualquier formato, extrae sus pesos y activaciones capa a capa, y los expone en una estructura unificada lista para que UFM tome decisiones de fusión.
Sin PyTorch. Sin TensorFlow. Solo NumPy.
Instalación
pip install micronnx
¿Qué hace micronnx?
- Carga de modelos: GGUF (Q2_K–Q6_K), SafeTensors, HDF5/Keras, NPY/NPZ
- Extracción de pesos: todos los tensores en float16, etiquetados con rol y capa
- Extracción de activaciones: capa a capa — embed, attn, ffn, residual, norm, pool
- Exportación unificada: uno o varios modelos a un solo .npz con índice completo (incluye schema y hp)
- Carga desde .npz sin archivo original: NpzModelLoader lee pesos directo del .npz
- Ops vectorizadas: RMSNorm, Attention, RoPE, SwiGLU, Conv2D, etc. en NumPy puro
Formatos soportados
GGUF (.gguf) — Llama, Mistral, Gemma, Phi, Falcon, ChatGLM, Mixtral-MoE, DeepSeek-MoE SafeTensors (.safetensors) — cualquier modelo HuggingFace HDF5/Keras (.h5, .keras) — MobileNetV1 y CNNs feed-forward NumPy (.npy, .npz) — arrays directos
Uso
Exportar modelos a .npz
import micronnx as nx
Un solo modelo
nx.export_to_npz("model.gguf", "model.npz")
Varios modelos, un .npz por cada uno
nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/" )
Varios modelos, un solo .npz fusionado (guarda schema y hp automáticamente)
nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/merged.npz", merge=True )
Con string separado por comas
nx.export_to_npz("model.gguf, model.safetensors", "outputs/", merge=False)
Inspeccionar un .npz
import micronnx as nx
nx.inspect_npz("outputs/merged.npz")
Merged : 2 modelos
Total : 269,030,016 params | 513.14 MB float16
[SmolLM2-135M] 272 tensores | 30 capas | 256.57 MB | schema: gguf_llama
[model] 272 tensores | 30 capas | 256.57 MB | schema: hf_llama
Leer el índice sin cargar tensores
idx = nx.load_index("outputs/merged.npz") print(idx["n_models"]) print(idx["total_params"]) print(idx["models"].keys())
Ver schema y hp de cada modelo
for name, info in idx["models"].items(): print(name, info["schema_name"], info["hp"])
Cargar desde .npz sin archivo original
import micronnx as nx
Lista modelos disponibles en el .npz
models = nx.list_models("outputs/merged.npz") for name, info in models.items(): print(name, info["schema_name"], info["has_hp"], info["has_activations"])
Cargar un modelo directo del .npz
loader = nx.NpzModelLoader("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)
Extraer activaciones — LLM
import numpy as np import micronnx as nx
Desde archivo original
loader = nx.GGUFLoader("SmolLM2-135M-Instruct-Q4_K_M.gguf") schema_name, hp = nx.detect_schema_gguf("SmolLM2-135M-Instruct-Q4_K_M.gguf") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)
O desde .npz sin archivo original
loader = nx.NpzModelLoader("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)
Extraer todas las activaciones
extractor = nx.ActivationExtractor(runner) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64)) print(extractor.activations.keys())
embed, attn_norm_0, post_attn_0, residual_attn_0,
ffn_norm_0, post_ffn_0, residual_ffn_0, ..., final_norm
Solo las capas que te interesan
extractor = nx.ActivationExtractor(runner, hooks=["post_attn", "residual_ffn"]) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64))
Reducir secuencia al último token
extractor = nx.ActivationExtractor(runner, reduce="last")
Reducir secuencia a la media
extractor = nx.ActivationExtractor(runner, reduce="mean")
Solo capas pares
extractor = nx.ActivationExtractor(runner, layer_fn=lambda i: i % 2 == 0)
Extraer activaciones — CNN (MobileNet)
import numpy as np import micronnx as nx
raw = nx.H5Loader("mobilenet_1_0_224_tf.h5") mapped = nx.map_tensors(dict.fromkeys(raw.tensor_names), fmt="h5") loader = nx.CanonicalLoader(raw, mapped) runner = nx.CNNRunner(loader, n_blocks=13)
Imagen 224x224x3 normalizada en [-1, 1]
image = np.random.uniform(-1, 1, (224, 224, 3)).astype(np.float32)
Forward directo
probs = runner.forward(image) print(f"clase: {probs.argmax()}, confianza: {probs.max():.3f}")
Con extracción de activaciones
extractor = nx.CNNActivationExtractor(runner, reduce="spatial_mean") probs = extractor.run(image) print(extractor.activations.keys())
stem, block_0_dw, block_0_pw, ..., block_12_pw, pooled
Guardar y leer activaciones en el .npz
import numpy as np import micronnx as nx
merged = nx.export_to_npz( ["SmolLM2-135M-Instruct-Q4_K_M.gguf", "model.safetensors"], "outputs/merged.npz", merge=True )
loader = nx.NpzModelLoader(merged, "SmolLM2-135M-Instruct-Q4_K_M") schema_name, hp = nx.detect_schema_npz(merged, "SmolLM2-135M-Instruct-Q4_K_M") runner = nx.ModelRunner(loader, schema_name, hp) extractor = nx.ActivationExtractor(runner, reduce="last") extractor.run(np.array([[1, 2, 3]], dtype=np.int64))
nx.save_activations(merged, extractor.activations, model_key="SmolLM2-135M-Instruct-Q4_K_M")
Leer después sin recargar el modelo
acts = nx.load_activations(merged, model_key="SmolLM2-135M-Instruct-Q4_K_M") print(acts["final_norm"].shape)
Loaders directos
import micronnx as nx
GGUF
loader = nx.GGUFLoader("model.gguf") print(loader.tensor_names[:5]) w = loader.load("token_embd.weight") loader.close()
SafeTensors
loader = nx.SafeTensorsLoader("model.safetensors") w = loader.load("model.embed_tokens.weight") loader.close()
HDF5
loader = nx.H5Loader("mobilenet.h5") w = loader.load("conv1/kernel:0") loader.close()
NPY/NPZ
loader = nx.NpyLoader("weights.npz") w = loader.load("layer_0")
Directo desde .npz merged (sin archivo original)
loader = nx.NpzModelLoader("outputs/merged.npz", "model") w = loader.load("model.embed_tokens.weight") loader.close()
Detectar schema y arquitectura
import micronnx as nx
Desde archivo original
schema_name, hp = nx.detect_schema_gguf("model.gguf") print(schema_name)
gguf_llama / gguf_gemma2 / gguf_phi3 / gguf_falcon ...
print(hp)
{"n_layers": 30, "n_heads": 9, "n_kv_heads": 3, "n_embd": 576, "vocab_size": 49152}
schema_name, hp = nx.detect_schema_safetensors("model.safetensors")
Desde .npz sin archivo original
schema_name, hp = nx.detect_schema_npz("outputs/merged.npz", "SmolLM2-135M-Instruct-Q4_K_M")
print(list(nx.SCHEMAS.keys()))
['llama', 'gemma', 'gemma2', 'phi3', 'mistral', 'mixtral',
'falcon', 'bloom', 'chatglm', 'deepseek_moe', ...]
Ops NumPy directas
import numpy as np import micronnx as nx
x = np.random.randn(1, 16, 576).astype(np.float32) w = np.ones(576, dtype=np.float32)
x = nx.rmsnorm(x, w) x = nx.layernorm(x, w, w) x = nx.silu(x) x = nx.gelu(x) x = nx.relu(x) x = nx.softmax(x, axis=-1)
q = np.random.randn(1, 4, 9, 64).astype(np.float32) k = np.random.randn(1, 4, 3, 64).astype(np.float32) v = np.random.randn(1, 4, 3, 64).astype(np.float32) out = nx.attention(q, k, v, n_heads=9, n_kv_heads=3)
img = np.random.randn(112, 112, 32).astype(np.float32) filt = np.random.randn(3, 3, 32, 64).astype(np.float32) out = nx.conv2d(img, filt, stride=1, padding=1) out = nx.global_avg_pool(out)
API completa
Loaders nx.GGUFLoader(path) nx.SafeTensorsLoader(path) nx.H5Loader(path) nx.NpyLoader(path) nx.NpzModelLoader(npz_path, model_key) ← carga desde .npz sin archivo original
Detección de schema nx.detect_schema_gguf(path) → (schema_name, hp) nx.detect_schema_safetensors(path) → (schema_name, hp) nx.detect_schema_hf(path) → (schema_name, hp) nx.detect_schema_npz(npz_path, key) → (schema_name, hp) ← desde .npz nx.list_models(npz_path) → dict con metadata de todos los modelos nx.SCHEMAS → dict con todos los schemas
Runtime LLM nx.ModelRunner(loader, schema_name, hp, max_seq=2048) nx.ActivationExtractor(runner, hooks=None, reduce=None, layer_fn=None) reduce: None | "last" | "mean" hooks: ["embed", "post_attn", "residual_ffn", "final_norm", ...] layer_fn: callable(i: int) -> bool
Runtime CNN nx.CNNRunner(loader, n_blocks=13) nx.CNNActivationExtractor(runner, reduce=None) reduce: None | "spatial_mean" nx.CanonicalLoader(raw_loader, mapped)
Canónico nx.map_tensors(tensors, fmt) nx.find_unmapped(tensors, fmt, mapped) nx.resolve_tied_embeddings(mapped) nx.detect_format(tensors)
Exportador nx.export_to_npz(src, dst, fmt=None, merge=False, verbose=True) nx.load_index(path) nx.inspect_npz(path, n=20) nx.save_activations(path, acts, model_key=None) nx.load_activations(path, model_key=None)
Ops nx.rmsnorm / nx.layernorm / nx.batchnorm nx.softmax / nx.sigmoid / nx.relu / nx.gelu / nx.silu nx.attention / nx.rope / nx.linear / nx.embedding nx.swiglu / nx.swiglu_fused / nx.geglu / nx.ffn_gelu nx.conv2d / nx.depthwise_conv2d / nx.global_avg_pool / nx.max_pool2d
Dependencias
numpy >= 1.24 pyfive (para archivos .h5 / .keras)
Parte del ecosistema UFM
micronnx es la capa de recolección de UFM (Unified Fusion Model). Su única responsabilidad es exponer pesos y activaciones en una estructura uniforme. La lógica de fusión, compatibilidad y ajuste fino vive en UFM, no aquí.
Licencia
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file micronnx-0.1.2.1.tar.gz.
File metadata
- Download URL: micronnx-0.1.2.1.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.33.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dac138cc498d120e64dcad31c47654bca64dc5e6cbcf1deb4aa90472f9a1650
|
|
| MD5 |
d626197b1de843c10bce639e289624be
|
|
| BLAKE2b-256 |
cbeda5fd545d9906670c37821b414aa36f0acc62e04f0d94c3beef316f222d51
|
File details
Details for the file micronnx-0.1.2.1-py3-none-any.whl.
File metadata
- Download URL: micronnx-0.1.2.1-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.33.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c538c8033c3d5e867a254ea0d09b752d180037c9a14f951cb05392901dfc1afd
|
|
| MD5 |
7a45cdfea1564e7939ef3f72aded4947
|
|
| BLAKE2b-256 |
7265cabad9501659233a0aa4323f76813828db8660408ed91eccfa037eef34f6
|