Structured NLP tasks powered by a fine-tuned small language model
Project description
neural-txt
Structured NLP tasks powered by a fine-tuned 135M parameter language model. Extract bullets, generate Q&A pairs, build knowledge graphs, and more — all running locally. Narrow vertical local intelligence that runs super cheaply in resource constrained envs.
https://github.com/user-attachments/assets/04774af0-dc51-42e7-b2a6-d6f50bf4e258
Support
If you find this helpful, consider supporting on Patreon — it hosts all code, projects, slides, and write-ups from the YouTube channel.
Install
# Base (no inference backend)
pip install neural-txt
# With HuggingFace backend (torch)
pip install neural-txt[hf]
# With MLX backend (Apple Silicon)
pip install neural-txt[mlx]
NeuralTxtReward works with either backend: install neural-txt[hf] for the
Hugging Face / torch scorer, or neural-txt[mlx] for Apple Silicon MLX.
Quick start
from neuraltxt import NeuralTxt
model = NeuralTxt(backend="mlx") # or backend="hf"
passage = """
Transformers have revolutionized NLP by introducing the self-attention
mechanism. Unlike RNNs, transformers process all tokens in parallel,
leading to significant training speedups.
"""
# Extract key points
bullets = model.extract_bullets(passage)
# Generate question-answer pairs
pairs = model.generate_qa_pairs(passage)
# Extract knowledge graph triplets
triplets = model.extract_triplets(passage)
Reward scoring
NeuralTxtReward scores generated responses against a reference answer with
paperbd/neuraltxt-reward-22M.
Use it to score one answer, score a batch, or rank candidate responses.
from neuraltxt import NeuralTxtReward
reward = NeuralTxtReward(backend="mlx") # or backend="hf"
reference = "The capital of France is Paris."
responses = [
"Paris is the capital of France.",
"France's capital is Lyon.",
]
score = reward.score(responses[0], reference) # float between 0 and 1
scores = reward.batch_score(responses, reference) # list[float]
ranked = reward.rank(responses, reference) # list[RankedResponse]
for item in ranked:
print(item.index, item.score, item.response)
rank() preserves the original response index and sorts highest score first.
Pass a local model directory with NeuralTxtReward("path/to/reward-model").
Beam candidates
Generation methods accept num_beams with a default of 1. The public methods
still return one parsed result: the first / highest-ranked candidate. With the
HuggingFace backend, num_beams is forwarded as beam search with
num_return_sequences=num_beams. With MLX, candidates are generated the same way
as the existing repeated generation path.
bullets = model.extract_bullets(passage, num_beams=4)
See examples/beam_candidates.py for a complete example, including how to inspect all raw beam candidates.
JSON mode
Every method supports json=True for guaranteed structured output via outlines:
# Returns a BulletsOutput pydantic model
bullets = model.extract_bullets(passage, json=True)
print(bullets.bullets) # list[str]
# Returns a QAPairsOutput pydantic model
qa = model.generate_qa_pairs(passage, json=True)
for pair in qa.pairs:
print(pair.question, pair.answer)
# Returns a TripletsOutput pydantic model
triplets = model.extract_triplets(passage, json=True)
for t in triplets.triplets:
print(t.subject, t.relation, t.object)
API
Generation API
| Method | Input | Output | JSON Output |
|---|---|---|---|
extract_bullets(passage) |
passage | list[str] |
BulletsOutput |
generate_qa_pairs(passage) |
passage | list[QAPair] |
QAPairsOutput |
generate_question(passage) |
passage | str |
QuestionOutput |
generate_questions_list(passage) |
passage | list[str] |
QuestionsListOutput |
extract_fact(passage) |
passage | str |
FactOutput |
answer(question, passage) |
question + passage | str |
AnswerOutput |
rephrase(passage) |
passage | str |
RephraseOutput |
continue_from(passage) |
passage start | str |
ContinuationOutput |
extract_triplets(passage) |
passage | list[Triplet] |
TripletsOutput |
compare(passage_a, passage_b) |
two passages | str |
ComparisonOutput |
find_relevant(question, passages) |
question + passage list | RetrievalResult |
RetrievalOutput |
Reward API
| Method | Input | Output |
|---|---|---|
score(response, reference) |
one response + reference answer | float |
batch_score(responses, reference) |
response list + reference answer | list[float] |
rank(responses, reference) |
response list + reference answer | list[RankedResponse] |
NeuralTxtReward accepts backend="hf" or backend="mlx".
Models
| Interface | Default model |
|---|---|
NeuralTxt(backend="hf") |
paperbd/neuraltxt-v1-135M |
NeuralTxt(backend="mlx") |
paperbd/neuraltxt-v1-135M-mlx |
NeuralTxtReward(backend="hf") |
paperbd/neuraltxt-reward-22M |
NeuralTxtReward(backend="mlx") |
paperbd/neuraltxt-reward-22M-mlx |
Pass a custom path: NeuralTxt("path/to/model", backend="hf")
- Training dataset:
paperbd/paper_instructions_300K-v1 - Synthetic data generation:
text-albumentations
Gradio demo
pip install neural-txt[app]
# HuggingFace (default)
python app.py
# MLX (Apple Silicon)
python app.py --mlx
# Options
# --temperature 0.4 sampling temperature (default 0.4)
# --num-beams 2 beam candidates, 1-4 (default 1)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neural_txt-0.1.4.tar.gz.
File metadata
- Download URL: neural_txt-0.1.4.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0eedb9b62fe923810545f4558fb1a775a410d388690430fe4d6a1139670412ac
|
|
| MD5 |
476075eecc95d63d7f423261b507a053
|
|
| BLAKE2b-256 |
dc3fda1020d5bfc3260a7f8bcce6adc87335d75a454d706fcbf5a52fb7b688a6
|
File details
Details for the file neural_txt-0.1.4-py3-none-any.whl.
File metadata
- Download URL: neural_txt-0.1.4-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a72f0d0b3c30489f8ea7957e98eb672b6ddfca0a5f0fe151186b8c814e3ca23
|
|
| MD5 |
61dd4c0eeee2b10a5bf724eda33856ab
|
|
| BLAKE2b-256 |
d07d45553df17b36cf471b71ee689ad731dd60894d95961a7a3469e4b941fc7e
|