Skip to main content

Parse natural language search queries into structured fields using fine-tuned Qwen3.5-0.8B LoRA adapters.

Project description


search-expert



 search-expert

Natural language → structured search queries, instantly.
Fine-tuned Qwen3.5-0.8B LoRA adapters for search query parsing across 10 domains.


PyPI version Python License: MIT Open In Colab




What it does

"Non-stop business class from JFK to Tokyo under $3,000"
{
  "domain":      "flights",
  "origin":      "JFK",
  "destination": "Tokyo",
  "cabin_class": "business",
  "stops":       "lte:0",
  "price":       "lt:3000"
}

Search Expert uses a custom fine-tuned Small Language Model to understand natural language queries and extract only the fields explicitly mentioned — never hallucinating values that aren't there. It works across 10 search verticals out of the box.


Install

pip install search-expert

Usage

Basic

from search_expert import SearchExpert, ModelFormat, ParseResult

expert = SearchExpert()  # loads the JSON adapter by default

result = expert.parse("noise cancelling headphones any colour but red or green, under $200")
print(result.fields)
{
    'domain':  'ecommerce',
    'product': 'headphones',
    'feature': 'noise cancelling',
    'color':   ['ne:red', 'ne:green'],
    'price':   'lt:200'
}

Operator reference

All numeric and exclusion constraints use a consistent prefix so downstream filters need zero NLP — just parse the string.

Query phrase Output value
"under $200", "below $200" lt:200
"up to $200", "max $200" lte:200
"over $150k", "above $150k" gt:150000
"at least $150k", "$150k+" gte:150000
"around $200", "~$200" approx:200
"$100–$200", "between $100 and $200" between:100:200
"any colour but red or green" ["ne:red", "ne:green"]

Applying a filter in one line:

result = expert.parse("apartments under $2,500/month in Austin")
salary = result.get_numeric_constraint("price")
# {'operator': 'lt', 'value': 2500.0, 'value_hi': None}

filtered = [l for l in listings if l["price"] < salary["value"]]

Supported domains

Domain Example query
real_estate "2BR apartment in Austin under $1,500/month"
ecommerce "Sony noise cancelling headphones under $300"
jobs "Remote senior ML engineer paying over $150k"
flights "Non-stop business class JFK to Tokyo under $3,000"
hotels "5-star hotel in Paris with breakfast under $400/night"
cars "Electric SUV with 300+ mile range under $50k"
restaurants "Vegan Italian in NYC with outdoor seating under $40"
movies "Thriller on Netflix with 8+ IMDB rating"
healthcare "Female therapist in Chicago accepting Aetna"
courses "Python ML course for beginners under $30"
events "Taylor Swift concert in London in July"

The model

Each adapter is a LoRA fine-tune of Qwen3.5-0.8B trained on ~1 million (query, structured output) pairs spanning all 10 domains above.

Adapter HuggingFace Format
JSON (default) sarthakrastogi/search-expert-json-0.8b JSON
YAML sarthakrastogi/search-expert-yaml-0.8b YAML

Format leaderboard (held-out test set, 300 samples per format):

Rank Format Key F1 Value Acc Parse Rate
🥇 JSON 0.913 0.874 98.2%
🥈 YAML 0.901 0.861 97.6%
🥉 TOML 0.887 0.843 96.1%
4. XML 0.871 0.829 94.8%
5. CSV key=value 0.856 0.812 93.3%

Both public adapters return the same Python dict — the format only affects the model's internal generation language.


Repo structure

search-expert/
├── search_expert/        # Library source
│   ├── __init__.py
│   ├── expert.py         # SearchExpert class (main API)
│   ├── config.py         # Model IDs, prompts, format enum
│   ├── loader.py         # HF model loading (unsloth / peft / plain)
│   ├── parser.py         # Raw output → dict parsers
│   ├── result.py         # ParseResult dataclass
│   └── exceptions.py     # Custom exceptions
├── training/             # Fine-tuning pipeline
│   ├── finetune.py       # Training script
│   └── evaluate.py       # Format comparison leaderboard
├── tests/
│   └── test_search_expert.py
├── examples/
│   └── search_expert_colab.ipynb
├── pyproject.toml
└── README.md

Development

git clone https://github.com/sarthakrastogi/search-expert
cd search-expert
pip install -e ".[dev]"

pytest tests/ -v                                          # unit tests (no GPU needed)
SEARCH_EXPERT_RUN_MODEL_TESTS=1 pytest tests/ -v          # includes model inference tests

License

MIT © Sarthak Rastogi

Contributing

Contributions are very welcome! Please open an issue or submit a pull request with any improvements.

Contact

For questions, feedback, or just to say hi, you can reach me at:

Email

LinkedIn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_expert-0.1.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

search_expert-0.1.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file search_expert-0.1.1.tar.gz.

File metadata

  • Download URL: search_expert-0.1.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for search_expert-0.1.1.tar.gz
Algorithm Hash digest
SHA256 30083444fad989214857844ecf3d991459b9920b71889bdef7fdc5ff303b82d6
MD5 3b17c2774f2ea795240379bac56cc970
BLAKE2b-256 d9dfd9e1cbc82c917fb8e0fc929b796dae865f2e92513a1c9f2aea68168de23c

See more details on using hashes here.

File details

Details for the file search_expert-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: search_expert-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for search_expert-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2356760c293beec408d607c1d2ef0a69119a9716d9f04a0ff825f386ae453ef9
MD5 1c1264d2878b26c77ff378fa4e8b2471
BLAKE2b-256 cbab318be0f161a531bb69defbd6779287678d8c42e6bf5924a6b4a548f54af5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page