Aparecium v2: pooled MPNet embedding reversal for crypto social-media posts.

These details have not been verified by PyPI

Project links

Project description

Aparecium v2 – Pooled MPNet Reverser

Aparecium v2 is a Python package for reconstructing crypto‑domain social‑media posts from a single pooled embedding vector. It is the pooled‑embedding counterpart to the original token‑level seq2seq model SentiChain/aparecium-seq2seq-reverser, but with a stricter input contract:

Input: one 768‑D pooled vector from sentence-transformers/all-mpnet-base-v2 (not a token‑level matrix).
Output: natural‑language text that matches the crypto market context of the embedding.

This package contains:

Low‑level components: EmbAdapter, Sketcher, Decoder, surrogate similarity scorer r(x, e).
Training scripts for supervised S1 and optional SCST (S2) fine‑tuning.
A high‑level Aparecium wrapper for easy use from PyPI and Hugging Face Hub.

Features

Pooled‑only embedding reversal: works directly from a single pooled MPNet embedding (size 768), no token‑level memory required.
Crypto‑domain specialization: trained on synthetic crypto social‑media posts (markets, DeFi, L2s, MEV, NFTs, governance).
Modern architecture:
- Multi‑channel EmbAdapter (pooled → pseudo‑sequence memory).
- Sketcher plan head (optional constraints from simple signals).
- Transformer decoder, surrogate similarity scorer, and beam‑search reranking.
High‑level API:
- Aparecium.invert_embedding(...) for direct vector → text.
- Aparecium.invert_text(...) for raw text → embed → invert (for diagnostics).
Service‑ready:
- FastAPI inference server with /invert endpoint, suitable for batch/online use.

Limitations & Caveats

Reconstruction is not exact: outputs preserve semantic gist and entities but may differ in wording or style.
Quality depends on:
- Encoder alignment (sentence-transformers/all-mpnet-base-v2),
- Domain match (crypto / finance social‑media posts),
- Decode settings (beam size, constraints, rerank weights).
Data are synthetic crypto market posts, not real social‑media timelines; there may be domain‑shift in practice.
Do not use this model to attempt to reconstruct sensitive or personally identifiable content from embeddings.

Model Architecture (v2)

At a high level, Aparecium v2 reverses a pooled vector ( e \in \mathbb{R}^{768} ) as follows:

EmbAdapter: e → H
- Takes a pooled MPNet embedding and produces a multi‑scale pseudo‑sequence memory H ∈ R^{B × S × D}.
Sketcher (optional at inference):
- Predicts simple crypto‑domain signals, such as presence of URLs or basic plan fields.
RealizerDecoder (Transformer decoder):
- GPT‑style transformer decoder with cross‑attention over H.
- Typical configuration:
  - d_model = 768
  - n_layer = 12
  - n_head = 8
  - d_ff = 3072
Surrogate scorer r(x, e):
- Neural surrogate that approximates cosine similarity between the MPNet embedding of the generated text and the target embedding e.
- Used for sequence‑level reranking.
Decoding:
- Deterministic beam search or stochastic sampling.
- Optional constraints (tickers/hashtags/amounts) and surrogate‑based rerank.

The v2 S1 checkpoint released on Hugging Face at SentiChain/aparecium-v2-pooled-reverser contains the EmbAdapter, Sketcher, Decoder, tokenizer name, and (optionally) surrogate r state.

Installation

From PyPI

Once published as the new major version, you will be able to install with:

pip install aparecium

From Source (this repo)

git clone https://github.com/SentiChain/aparecium.git
cd aparecium
pip install -e .

This installs the aparecium package (v2 pooled‑only variant) in editable mode for development and experiments.

Quick Start (High‑Level API)

1. Invert a pooled embedding from Python

The HF v2 checkpoint lives at SentiChain/aparecium-v2-pooled-reverser.
The Aparecium wrapper downloads it automatically and exposes a simple interface:

from aparecium import Aparecium
from sentence_transformers import SentenceTransformer

# 1) Embed a crypto-domain social-media post with pooled MPNet
encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
text = "Bitcoin ETF inflows hit a new weekly high as markets turn risk-on."
e = encoder.encode([text], convert_to_numpy=True, normalize_embeddings=True)[0]  # shape (768,)

# 2) Load Aparecium v2 (S1 baseline) from Hugging Face
model = Aparecium()  # defaults to SentiChain/aparecium-v2-pooled-reverser, aparecium_v2_s1.pt

# 3) Invert the pooled embedding
res = model.invert_embedding(e, beam=5, max_len=64)
print("Reconstruction:", res.text)
print("Candidates:", res.candidates)

2. End‑to‑end inversion from raw text

This is mostly useful for diagnostics (how much information is lost by pooling):

from aparecium import Aparecium

model = Aparecium()
text = "Ethereum L2 blob fees spiked after EIP-4844; MEV still shapes order flow."
out = model.invert_text(text, beam=5, max_len=64)
print(out.text)

Internally this calls the same MPNet encoder you would use upstream and then runs the inversion pipeline.

3. CLI usage

You can also use the package as a simple CLI:

echo "Macro: DXY rallies while risk assets chop; crypto narratives rotate to AI tokens." | \
  python -m aparecium

The CLI uses the default HF repo and S1 checkpoint and prints one reconstructed text to stdout.

Training & Pipeline (v2)

The training and data‑prep scripts live inside the package under aparecium.aparecium:

aparecium/aparecium/scripts/embed_mpnet.py – embed raw posts into pooled MPNet vectors.
aparecium/aparecium/train/train_s1_supervised.py – S1 supervised training.
aparecium/aparecium/train/train_surrogate_r.py – surrogate r training.
aparecium/aparecium/train/train_s2_scst.py – optional SCST RL fine‑tuning.
aparecium/aparecium/infer/service.py – FastAPI inference service.
aparecium/aparecium/data/*.py – dataset and crypto plan utilities.

Example S1 training command (from the aparecium project root):

python -m aparecium.aparecium.train.train_s1_supervised \
  --shards ./data/train \
  --val_shards ./data/val \
  --save_dir ./checkpoints \
  --batch_size 64 \
  --epochs 1 \
  --steps 6000 \
  --warmup_steps 1000 \
  --lr 3e-4 \
  --max_len 96 \
  --device cuda \
  --log_every 50

Note: for most users of the PyPI package, you do not need to run training. You can simply use the HF checkpoint with Aparecium.

Inference Service

For higher‑throughput use, you can run the FastAPI service:

python -m aparecium.aparecium.infer.service --ckpt checkpoints/aparecium_v2_s1.pt

Then POST to /invert:

{
  "embedding": [0.123, 0.456, "...", 0.789],
  "deterministic": true,
  "beam": 5,
  "max_len": 64,
  "constraints": true,
  "final_mpnet": true
}

The response includes:

text: top‑1 reconstruction,
candidates: all beam candidates,
scores.lm_logp[]: language‑model log‑prob scores,
scores.cos_mpnet[] (if final_mpnet=true): MPNet cosine per candidate,
plan: optional extracted or predicted plan information.

Model Input Contract & Defaults

Input to the v2 reverser is a pooled MPNet vector with shape (768,) (L2‑normalized recommended).
Recommended encoder: sentence-transformers/all-mpnet-base-v2.
Suggested decode defaults for general use:
- Beam size: beam=5
- Max length: max_len≈64–96
- Determinism: deterministic beam (via deterministic_beam_search)
- Rerank weight for surrogate r: alpha≈1.0–1.5
- Optional: use constraints for tickers/hashtags/amounts if plans are available.

This differs from the v1 seq2seq model, which expects a token‑level (seq_len, 768) matrix and uses a slightly different decode configuration (see the v1 model card at SentiChain/aparecium-seq2seq-reverser).

Requirements

At runtime, Aparecium v2 depends on:

Python ≥ 3.9
PyTorch ≥ 2.0
Transformers ≥ 4.40
sentence-transformers ≥ 2.2
huggingface-hub ≥ 0.25
NumPy ≥ 1.23
tqdm

GPU (CUDA) is auto‑detected when available; CPU works but is slower for training and beam‑search decoding.

Project Structure (v2 subset)

aparecium/
├── aparecium/              # v2 pooled-only Python package
│   ├── api.py              # High-level Aparecium wrapper
│   ├── __init__.py         # Package export
│   ├── __main__.py         # CLI entrypoint (python -m aparecium)
│   ├── config.py           # Config utilities
│   ├── data/               # Dataset + plan utilities
│   ├── infer/              # Decoding + FastAPI service
│   ├── models/             # EmbAdapter, Decoder, Sketcher, SurrogateR, Constraints
│   ├── scripts/            # Data prep, embedding, inspection
│   ├── train/              # S1/S2/r training scripts
│   └── utils/              # Common helpers, tokenization utilities
├── checkpoints/            # Local training outputs (S1/S2/r)
└── data/                   # Local data shards (train/val/test)

License

This project is licensed under the MIT License – see the LICENSE file for details.

Citation

If you use Aparecium v2 in research or production, please cite the project and, when relevant, also reference the v1 model card:

@software{apareciumv2_2025,
  author    = {SentiChain},
  title     = {Aparecium v2: Pooled MPNet Embedding Reversal for Crypto Social-Media Posts},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/SentiChain/aparecium-v2-pooled-reverser}
}

For the original token‑level seq2seq reverser, see: SentiChain/aparecium-seq2seq-reverser.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Nov 17, 2025

0.3.0

Sep 22, 2025

0.2.3

Apr 3, 2025

0.2.2

Mar 31, 2025

0.2.1

Mar 31, 2025

0.2.0

Mar 31, 2025

0.1.4

Mar 31, 2025

0.1.3

Mar 30, 2025

0.1.2

Mar 30, 2025

0.1.1

Mar 29, 2025

0.1.0 yanked

Mar 29, 2025

Reason this release was yanked:

Import issue fixed in v0.1.1

0.0.1

Mar 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aparecium-1.0.0.tar.gz (34.1 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aparecium-1.0.0-py3-none-any.whl (42.1 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file aparecium-1.0.0.tar.gz.

File metadata

Download URL: aparecium-1.0.0.tar.gz
Upload date: Nov 17, 2025
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for aparecium-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8edb34d3755500037dee7eb446752620a04e0f6b7f2e9390624f9ba850faaa5f`
MD5	`9804b7b99e8fd47ac8f553e5d4fb7cf3`
BLAKE2b-256	`fe7186c6cc0a2956acd50abbd212e822f91dc8059e82cefbe8612ff8de87e34d`

See more details on using hashes here.

File details

Details for the file aparecium-1.0.0-py3-none-any.whl.

File metadata

Download URL: aparecium-1.0.0-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 42.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for aparecium-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f4458db691d234cafa59b8bff7f74df5e15b0320b3801d7259def1c8cc23e65`
MD5	`040404ca1b1964fb3efba01f5874cea1`
BLAKE2b-256	`99b52a5be6f059f89318382e423b8cc390289b5a1529568e5e6086f4bd7c6476`

See more details on using hashes here.

aparecium 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Aparecium v2 – Pooled MPNet Reverser

Features

Limitations & Caveats

Model Architecture (v2)

Installation

From PyPI

From Source (this repo)

Quick Start (High‑Level API)

1. Invert a pooled embedding from Python

2. End‑to‑end inversion from raw text

3. CLI usage

Training & Pipeline (v2)

Inference Service

Model Input Contract & Defaults

Requirements

Project Structure (v2 subset)

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes