Skip to main content

Runtime de inferencia puro NumPy para extraccion de pesos, capas y activaciones de LLMs y CNNs

Project description

micronnx

Capa de extracción de pesos, capas y activaciones para sistemas de fusión de modelos. Parte del ecosistema UFM (Unified Fusion Model).

micronnx no es un framework de entrenamiento ni un motor de inferencia general. Es una pieza de recolección: carga cualquier modelo desde cualquier formato, extrae sus pesos y activaciones capa a capa, y los expone en una estructura unificada lista para que UFM tome decisiones de fusión.

Sin PyTorch. Sin TensorFlow. Solo NumPy.


Instalación

pip install micronnx


¿Qué hace micronnx?

  • Carga de modelos: GGUF (Q2_K–Q6_K), SafeTensors, HDF5/Keras, NPY/NPZ
  • Extracción de pesos: todos los tensores en float16, etiquetados con rol y capa
  • Extracción de activaciones: capa a capa — embed, attn, ffn, residual, norm, pool
  • Exportación unificada: uno o varios modelos a un solo .npz con índice completo
  • Ops vectorizadas: RMSNorm, Attention, RoPE, SwiGLU, Conv2D, etc. en NumPy puro

Formatos soportados

GGUF (.gguf) — Llama, Mistral, Gemma, Phi, Falcon, ChatGLM, Mixtral-MoE, DeepSeek-MoE SafeTensors (.safetensors) — cualquier modelo HuggingFace HDF5/Keras (.h5, .keras) — MobileNetV1 y CNNs feed-forward NumPy (.npy, .npz) — arrays directos


Uso

Exportar modelos a .npz

import micronnx as nx

Un solo modelo

nx.export_to_npz("model.gguf", "model.npz")

Varios modelos, un .npz por cada uno

nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/" )

Varios modelos, un solo .npz fusionado

nx.export_to_npz( ["model.gguf", "model.safetensors", "mobilenet.h5"], "outputs/merged.npz", merge=True )

Con string separado por comas

nx.export_to_npz("model.gguf, model.safetensors", "outputs/", merge=False)


Inspeccionar un .npz

import micronnx as nx

nx.inspect_npz("outputs/merged.npz")

Merged : 3 modelos

Total : 273,283,880 params | 521.25 MB float16

[SmolLM2-135M] 272 tensores | 30 capas | 256.57 MB

[model] 272 tensores | 30 capas | 256.57 MB

[mobilenet] 137 tensores | 14 capas | 8.11 MB

Leer el índice sin cargar tensores

idx = nx.load_index("outputs/merged.npz") print(idx["n_models"]) print(idx["total_params"]) print(idx["models"].keys())


Extraer activaciones — LLM

import numpy as np import micronnx as nx

Cargar modelo GGUF

loader = nx.GGUFLoader("SmolLM2-135M-Instruct-Q4_K_M.gguf") schema_name, hp = nx.detect_schema_gguf("SmolLM2-135M-Instruct-Q4_K_M.gguf") runner = nx.ModelRunner(loader, schema_name, hp, max_seq=512)

Extraer todas las activaciones

extractor = nx.ActivationExtractor(runner) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64)) print(extractor.activations.keys())

embed, attn_norm_0, post_attn_0, residual_attn_0,

ffn_norm_0, post_ffn_0, residual_ffn_0, ..., final_norm

Solo las capas que te interesan

extractor = nx.ActivationExtractor(runner, hooks=["post_attn", "residual_ffn"]) extractor.run(np.array([[1, 2, 3, 4, 5]], dtype=np.int64))

Reducir secuencia al último token

extractor = nx.ActivationExtractor(runner, reduce="last")

Reducir secuencia a la media

extractor = nx.ActivationExtractor(runner, reduce="mean")

Solo capas pares

extractor = nx.ActivationExtractor(runner, layer_fn=lambda i: i % 2 == 0)


Extraer activaciones — CNN (MobileNet)

import numpy as np import micronnx as nx

raw = nx.H5Loader("mobilenet_1_0_224_tf.h5") mapped = nx.map_tensors(dict.fromkeys(raw.tensor_names), fmt="h5") loader = nx.CanonicalLoader(raw, mapped) runner = nx.CNNRunner(loader, n_blocks=13)

Imagen 224x224x3 normalizada en [-1, 1]

image = np.random.uniform(-1, 1, (224, 224, 3)).astype(np.float32)

Forward directo

probs = runner.forward(image) print(f"clase: {probs.argmax()}, confianza: {probs.max():.3f}")

Con extracción de activaciones

extractor = nx.CNNActivationExtractor(runner, reduce="spatial_mean") probs = extractor.run(image) print(extractor.activations.keys())

stem, block_0_dw, block_0_pw, ..., block_12_pw, pooled


Guardar y leer activaciones en el .npz

import numpy as np import micronnx as nx

merged = nx.export_to_npz( ["SmolLM2-135M-Instruct-Q4_K_M.gguf", "model.safetensors"], "outputs/merged.npz", merge=True )

loader = nx.GGUFLoader("SmolLM2-135M-Instruct-Q4_K_M.gguf") schema_name, hp = nx.detect_schema_gguf("SmolLM2-135M-Instruct-Q4_K_M.gguf") runner = nx.ModelRunner(loader, schema_name, hp) extractor = nx.ActivationExtractor(runner, reduce="last") extractor.run(np.array([[1, 2, 3]], dtype=np.int64))

nx.save_activations(merged, extractor.activations, model_key="SmolLM2-135M-Instruct-Q4_K_M")

Leer después sin recargar el modelo

acts = nx.load_activations(merged, model_key="SmolLM2-135M-Instruct-Q4_K_M") print(acts["final_norm"].shape)


Loaders directos

import micronnx as nx

GGUF

loader = nx.GGUFLoader("model.gguf") print(loader.tensor_names[:5]) w = loader.load("token_embd.weight") loader.close()

SafeTensors

loader = nx.SafeTensorsLoader("model.safetensors") w = loader.load("model.embed_tokens.weight") loader.close()

HDF5

loader = nx.H5Loader("mobilenet.h5") w = loader.load("conv1/kernel:0") loader.close()

NPY/NPZ

loader = nx.NpyLoader("weights.npz") w = loader.load("layer_0")


Detectar schema y arquitectura

import micronnx as nx

schema_name, hp = nx.detect_schema_gguf("model.gguf") print(schema_name)

llama / gemma2 / phi3 / falcon / mistral / mixtral / chatglm / deepseek_moe ...

print(hp)

{"n_layers": 30, "n_heads": 9, "n_kv_heads": 3, "n_embd": 576, "vocab_size": 49152}

schema_name, hp = nx.detect_schema_safetensors("model.safetensors")

print(list(nx.SCHEMAS.keys()))

['llama', 'gemma', 'gemma2', 'phi3', 'mistral', 'mixtral',

'falcon', 'bloom', 'chatglm', 'deepseek_moe', ...]


Ops NumPy directas

import numpy as np import micronnx as nx

x = np.random.randn(1, 16, 576).astype(np.float32) w = np.ones(576, dtype=np.float32)

x = nx.rmsnorm(x, w) x = nx.layernorm(x, w, w) x = nx.silu(x) x = nx.gelu(x) x = nx.relu(x) x = nx.softmax(x, axis=-1)

q = np.random.randn(1, 4, 9, 64).astype(np.float32) k = np.random.randn(1, 4, 3, 64).astype(np.float32) v = np.random.randn(1, 4, 3, 64).astype(np.float32) out = nx.attention(q, k, v, n_heads=9, n_kv_heads=3)

img = np.random.randn(112, 112, 32).astype(np.float32) filt = np.random.randn(3, 3, 32, 64).astype(np.float32) out = nx.conv2d(img, filt, stride=1, padding=1) out = nx.global_avg_pool(out)


API completa

Loaders nx.GGUFLoader(path) nx.SafeTensorsLoader(path) nx.H5Loader(path) nx.NpyLoader(path)

Detección de schema nx.detect_schema_gguf(path) → (schema_name, hp) nx.detect_schema_safetensors(path) → (schema_name, hp) nx.detect_schema_hf(path) → (schema_name, hp) nx.SCHEMAS → dict con todos los schemas

Runtime LLM nx.ModelRunner(loader, schema_name, hp, max_seq=2048) nx.ActivationExtractor(runner, hooks=None, reduce=None, layer_fn=None) reduce: None | "last" | "mean" hooks: ["embed", "post_attn", "residual_ffn", "final_norm", ...] layer_fn: callable(i: int) -> bool

Runtime CNN nx.CNNRunner(loader, n_blocks=13) nx.CNNActivationExtractor(runner, reduce=None) reduce: None | "spatial_mean" nx.CanonicalLoader(raw_loader, mapped)

Canónico nx.map_tensors(tensors, fmt) nx.find_unmapped(tensors, fmt, mapped) nx.resolve_tied_embeddings(mapped) nx.detect_format(tensors)

Exportador nx.export_to_npz(src, dst, fmt=None, merge=False, verbose=True) nx.load_index(path) nx.inspect_npz(path, n=20) nx.save_activations(path, acts, model_key=None) nx.load_activations(path, model_key=None)

Ops nx.rmsnorm / nx.layernorm / nx.batchnorm nx.softmax / nx.sigmoid / nx.relu / nx.gelu / nx.silu nx.attention / nx.rope / nx.linear / nx.embedding nx.swiglu / nx.swiglu_fused / nx.geglu / nx.ffn_gelu nx.conv2d / nx.depthwise_conv2d / nx.global_avg_pool / nx.max_pool2d


Dependencias

numpy >= 1.24 pyfive (para archivos .h5 / .keras)


Parte del ecosistema UFM

micronnx es la capa de recolección de UFM (Unified Fusion Model). Su única responsabilidad es exponer pesos y activaciones en una estructura uniforme. La lógica de fusión, compatibilidad y ajuste fino vive en UFM, no aquí.


Licencia

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micronnx-0.1.2.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

micronnx-0.1.2-py3-none-any.whl (44.6 kB view details)

Uploaded Python 3

File details

Details for the file micronnx-0.1.2.tar.gz.

File metadata

  • Download URL: micronnx-0.1.2.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.33.1

File hashes

Hashes for micronnx-0.1.2.tar.gz
Algorithm Hash digest
SHA256 53965a8410befc38d9b4ceb3c20ec6ac679f2aa21c55e2eb7ea0738d8a71d9ae
MD5 5a68dd6915b8124988e14a0709daf8ff
BLAKE2b-256 ce164fbefb5217cd1a802c83454274fe700b4f7d554bab816e99470304797fa5

See more details on using hashes here.

File details

Details for the file micronnx-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: micronnx-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 44.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.33.1

File hashes

Hashes for micronnx-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fee9a49e6a8f6972e451eef17eeb2dba9a8a159e8faaa6a3434b088d82564a03
MD5 c27612235cfef851c3d194c250241685
BLAKE2b-256 8934bd1a6053148f56159a8c43c7e2dde1a5567fe49f4c836e6e3708946cc2d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page