Skip to main content

Open-source, enterprise-grade web search & archiving.

Project description

Sosse 🦦

Discover Sosse — the Selenium Open Source Search Engine built for powerful web archiving, crawling, and search. Explore all its features and capabilities on the official website.

Whether you're a developer, researcher, or data enthusiast, Sosse is ready to support your projects. Join the community on GitHub or GitLab to submit feature requests, report bugs, contribute code, or start a discussion.

Key Features

  • 🌍 Web Page Search: Search the content of web pages, including dynamically rendered ones, with advanced queries. (doc)

  • 🕑 Recurring Crawling: Crawl pages at fixed intervals or adapt the rate based on content changes. (doc)

  • 🔖 Web Page Archiving: Archive HTML content, adjust links for local use, download required assets, and support dynamic content. (doc)

  • 🏷️ Tags: Organize and filter crawled or archived pages using tags for better search and management. (doc)

  • 📂 File Downloads: Batch download binary files from web pages. (doc)

  • 📡 Webhooks: Integrate with external services using highly flexible webhooks. Connect to proprietary AI platforms (doc) or locally hosted solutions (doc) to enable advanced data extraction, summarization, auto-tagging, notifications, and more.

  • 🔔 Atom Feeds: Generate content feeds for websites that don’t have them, or receive updates when a new page containing a keyword is published. (doc)

  • 🔒 Authentication: The crawler can authenticate to access private pages and retrieve content. (doc)

  • 👥 Permissions: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously. (doc)

  • 👤 Search Features: Includes private search history (doc), and external search engine shortcuts (doc), etc.

Explore the 📚 documentation and check out some 📷 screenshots.

Sosse is written in Python and is distributed under the GNU AGPLv3 license. It uses browser-based crawling with Mozilla Firefox or Google Chromium alongside Selenium to index pages that rely on JavaScript. For faster crawling, Requests can also be used. Sosse uses PostgreSQL for data storage.

Try It Out

To quickly try the latest version with Docker:

docker run -p 8005:80 biolds/sosse:stable

Then, open http://127.0.0.1:8005/ and log in with the username admin and password admin.

For persistence of Docker data or alternative installation methods, please refer to the installation guide.

Stay Connected

Join the Discord server to get help, share ideas, or discuss Sosse!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sosse-1.14.2.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sosse-1.14.2-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

File details

Details for the file sosse-1.14.2.tar.gz.

File metadata

  • Download URL: sosse-1.14.2.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sosse-1.14.2.tar.gz
Algorithm Hash digest
SHA256 d247498e1892f3a72c58c57692b6660d85c8a9570d4be2f883f06f32a64eb062
MD5 4d0090eaab8bf371d1ebc4d845d3abb0
BLAKE2b-256 fb5cf2848fcf1abbf14eef4c8d9d6bd904890e262494661f74e8e78453898bef

See more details on using hashes here.

File details

Details for the file sosse-1.14.2-py3-none-any.whl.

File metadata

  • Download URL: sosse-1.14.2-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sosse-1.14.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eac203c62e0046e636c38eec7ac8864cba0dc1e39d7f9a3aeebefb1d6e8ae131
MD5 c5a406a36b6fd5bea89e5ce4b543a2dc
BLAKE2b-256 526ef3403bb112c50963f537bd009156aea000acf51a9b6fbd063e496a91638b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page