Build a search engine from a website sitemap.
Project description
nanosearch
Nanosearch is an in-memory search engine designed for small (< 10,000 URL) websites.
With Nanosearch, you can build a search engine in a few lines of code.
Nanosearch supports the BM25 and TF/IDF algorithms.
Nanosearch also computes a link graph and uses the number of inlinks to a page as a ranking factor. This is useful for ranking results for queries where there are multiple relevant pages by keyword.
Installation
pip install nanosearch
Quickstart
Build a Search Engine from a Sitemap
from nanosearch import NanoSearchBM25
engine = NanoSearchBM25().from_sitemap(
"https://jamesg.blog/sitemap.xml",
title_transforms=[lambda x: x.split("|")[0]]
)
results = engine.search("coffee")
print(results)
Build a Search Engine from a List of URLs
from nanosearch import NanoSearchBM25
urls = [
"https://jamesg.blog/",
"https://jamesg.blog/coffee",
]
engine = NanoSearchBM25().from_urls(urls)
results = engine.search("coffee")
print(results)
Save an Index to Disk
You can save an index to disk and load it later with:
engine.to_nanosearch_json("index.json")
engine = NanoSearchBM25().from_nanosearch_json("index.json")
Supported Algorithms
Nanosearch supports the following search algorithms:
- TF/IDF
- BM25
License
This project is licensed under an MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nanosearch-0.1.2.tar.gz.
File metadata
- Download URL: nanosearch-0.1.2.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c7cf5a3d8bfc0e532207c500c5753778412d53fa40cf809a2d0889ded4d5211
|
|
| MD5 |
729d8e23d1dbd420ff2afb43109fed8f
|
|
| BLAKE2b-256 |
00b19db4e92379d5b107df3f6dccf15ff0f10dd0d503f261287c332af89086f7
|
File details
Details for the file nanosearch-0.1.2-py3-none-any.whl.
File metadata
- Download URL: nanosearch-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2ea71c6943cb1abb0de4fa4ea8aa5c3b3fe75b0f13020b6dcc82ffadaef5cdc
|
|
| MD5 |
a8162f7754999fefabd502cbbbdb1ebb
|
|
| BLAKE2b-256 |
5aa949d332513c041dd8dc71718cbf6e792bfda5487924789859365403d53046
|