Skip to main content

An assistant helping you to index webpages into structured datasets.

Project description

PyPI - Version Website GitHub License X (formerly Twitter) Follow Docker Image Version (tag)

SearchFlow

SearchFlow is an assistant designed to help you index webpages into structured datasets. It leverages various tools and models to scrape, process, and store web content efficiently.

Features

  • Web Scraping: Uses trafilatura for focused crawling and web scraping.
  • Document Processing: Supports chunking and processing of various document types.
  • Database Management: Manages projects, documents, and prompts using PostgreSQL.
  • Vector Search: Utilizes vector search for document retrieval.
  • LLM Integration: Integrates with language models for question answering and document grading.

Installation

To set up the development environment, use the provided Dockerfile and .devcontainer/devcontainer.json for a consistent development setup.

Prerequisites

  • Docker
  • Python 3.11 or higher

Steps

Usage

Install SearchFlow via pip:

pip install searchflow

Quickstart

  1. Initialize the Database
from searchflow.db.postgresql import DB
db = DB()
db.create_project(project_name="example_project")
  1. Create a project
db.create_project(project_name="example_project")
  1. Import Data from a URL
from searchflow.importers import WebScraper
scraper = WebScraper(project_name='MyProject', db=db)
scraper.full_import("https://example.com", max_pages=100)
  1. ** Upload a file to the project **
from searchflow.importers import Files
with open("path/to/your/file.pdf", "rb") as f:
bytes_data = f.read()
files = Files()
files.upload_file(
document_data=[(bytes_data, "file.pdf")],
project_name="MyProject",
inference_type="local"
)
  1. List Files in a Project
files.list_files(project_name="MyProject")
  1. Remove a File from a Project
files.remove_file(project_name="MyProject", file_name="file.pdf")

Question Answering

Vector Search

To perform a similarity search:

from searchflow.db.postgresql import DB
db = DB()
results = db.similarity_search(project_name="example_project", query="example query"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchflow-0.0.111.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searchflow-0.0.111-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file searchflow-0.0.111.tar.gz.

File metadata

  • Download URL: searchflow-0.0.111.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for searchflow-0.0.111.tar.gz
Algorithm Hash digest
SHA256 dde33cd007a8c56babc12bf973759a0f1e5a617fc6dc89fe8505b1b0b6116e7f
MD5 c574704340392a7defd678c68c83f9a8
BLAKE2b-256 78c55b747bfe7c1f5059d066d268fa7037e7980c6ab3ed54e78a9bddb114fda9

See more details on using hashes here.

File details

Details for the file searchflow-0.0.111-py3-none-any.whl.

File metadata

  • Download URL: searchflow-0.0.111-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for searchflow-0.0.111-py3-none-any.whl
Algorithm Hash digest
SHA256 8fabd2cad973a663b4386aad11595cd9f59146d624ad3d0009193d112ad82fcc
MD5 5d763292febda2073be0edc981f047f0
BLAKE2b-256 4436cc6c73b5e3633563f6a3c148d91416726a03cb61a761ddf9020d4203e724

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page