Build a search engine from a website sitemap.
Project description
nanosearch
Nanosearch is an in-memory search engine designed for small (< 10,000 URL) websites.
With Nanosearch, you can build a search engine in a few lines of code.
Nanosearch supports the BM25 and TF/IDF algorithms.
Nanosearch also computes a link graph and uses the number of inlinks to a page as a ranking factor. This is useful for ranking results for queries where there are multiple relevant pages by keyword.
Installation
pip install nanosearch
Quickstart
Build a Search Engine from a Sitemap
from nanosearch import NanoSearchBM25
engine = NanoSearchBM25().from_sitemap(
"https://jamesg.blog/sitemap.xml",
title_transforms=[lambda x: x.split("|")[0]]
)
results = engine.search("coffee")
print(results)
Build a Search Engine from a List of URLs
from nanosearch import NanoSearchBM25
urls = [
"https://jamesg.blog/",
"https://jamesg.blog/coffee",
]
engine = NanoSearchBM25().from_urls(urls)
results = engine.search("coffee")
print(results)
Save an Index to Disk
You can save an index to disk and load it later with:
engine.to_nanosearch_json("index.json")
engine = NanoSearchBM25().from_nanosearch_json("index.json")
Supported Algorithms
Nanosearch supports the following search algorithms:
- TF/IDF
- BM25
License
This project is licensed under an MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nanosearch-0.1.2.tar.gz
(6.7 kB
view hashes)
Built Distribution
Close
Hashes for nanosearch-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2ea71c6943cb1abb0de4fa4ea8aa5c3b3fe75b0f13020b6dcc82ffadaef5cdc |
|
MD5 | a8162f7754999fefabd502cbbbdb1ebb |
|
BLAKE2b-256 | 5aa949d332513c041dd8dc71718cbf6e792bfda5487924789859365403d53046 |