Skip to main content

Build a search engine from a website sitemap.

Project description

nanosearch

Nanosearch is an in-memory search engine designed for small (< 10,000 URL) websites.

With Nanosearch, you can build a search engine in a few lines of code.

Nanosearch supports the BM25 and TF/IDF algorithms.

Nanosearch also computes a link graph and uses the number of inlinks to a page as a ranking factor. This is useful for ranking results for queries where there are multiple relevant pages by keyword.

Installation

pip install nanosearch

Quickstart

Build a Search Engine from a Sitemap

from nanosearch import NanoSearchBM25

engine = NanoSearchBM25().from_sitemap(
    "https://jamesg.blog/sitemap.xml",
    title_transforms=[lambda x: x.split("|")[0]]
)
results = engine.search("coffee")

print(results)

Build a Search Engine from a List of URLs

from nanosearch import NanoSearchBM25

urls = [
    "https://jamesg.blog/",
    "https://jamesg.blog/coffee",
]

engine = NanoSearchBM25().from_urls(urls)
results = engine.search("coffee")

print(results)

Save an Index to Disk

You can save an index to disk and load it later with:

engine.to_nanosearch_json("index.json")

engine = NanoSearchBM25().from_nanosearch_json("index.json")

Supported Algorithms

Nanosearch supports the following search algorithms:

  • TF/IDF
  • BM25

License

This project is licensed under an MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanosearch-0.1.2.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nanosearch-0.1.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file nanosearch-0.1.2.tar.gz.

File metadata

  • Download URL: nanosearch-0.1.2.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for nanosearch-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9c7cf5a3d8bfc0e532207c500c5753778412d53fa40cf809a2d0889ded4d5211
MD5 729d8e23d1dbd420ff2afb43109fed8f
BLAKE2b-256 00b19db4e92379d5b107df3f6dccf15ff0f10dd0d503f261287c332af89086f7

See more details on using hashes here.

File details

Details for the file nanosearch-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: nanosearch-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for nanosearch-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f2ea71c6943cb1abb0de4fa4ea8aa5c3b3fe75b0f13020b6dcc82ffadaef5cdc
MD5 a8162f7754999fefabd502cbbbdb1ebb
BLAKE2b-256 5aa949d332513c041dd8dc71718cbf6e792bfda5487924789859365403d53046

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page