Skip to main content

No project description provided

Project description

Content map

A way to share content from a specific domain using SQLite as an alternative to RSS feeds. The purpose of this library is to simply create a dataset for all the content on your website, using the XML sitemap as a starting point.

Possibility to include vector search similarity features in the dataset very easily.

Article that explains the rationale behind this type of datasets here.

Installation

pip install contentmap

Quickstart

To build your contentmap.db with vector search capabilities and containing all your content using your XML sitemap as a starting point, you only need to write the following:

from contentmap.sitemap import SitemapToContentDatabase

database = SitemapToContentDatabase(
    sitemap_sources=["https://yourblog.com/sitemap.xml"],
    concurrency=10,
    include_vss=True
)
database.build()

This will automatically create the SQLite database file, with vector search capabilities (piggybacking on sqlite-vss integration on Langchain).

Thanks to @medoror for contributing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentmap-0.5.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

contentmap-0.5.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file contentmap-0.5.0.tar.gz.

File metadata

  • Download URL: contentmap-0.5.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.18 Linux/6.5.0-1025-azure

File hashes

Hashes for contentmap-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2203fbda010a66eb42433d58b1d1ec019399f4d1872a16975f8f312c77518e2c
MD5 4dd85b47d779e73ca978c2af6ea2e815
BLAKE2b-256 4622e70f11c764efd46889d670eb8b1e7c30983e92cb6811f394a308a9c2c0a2

See more details on using hashes here.

File details

Details for the file contentmap-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: contentmap-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.18 Linux/6.5.0-1025-azure

File hashes

Hashes for contentmap-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52fce9020fcce86ecbb701bde4db2be3e671722517a0476a43c02081ef053bc2
MD5 c51927d2cc6a54d811b53a2d8fcae92a
BLAKE2b-256 c52f9891b4673048cf0f53daeae9c05ec0eeff4930050f8abb7f0dd5a9bb3ec4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page