No project description provided

Project description

SynapseAI

SynapseAI is a Python package for semantic search in PDF documents and web pages. It allows users to extract relevant information from PDFs and websites based on natural language queries.

Installation

You can install SynapseAI using pip:

pip install synapseai

Usage

synapseai provides two main functions: process_pdf for analyzing PDF documents and crawl_and_query for searching web pages.

Processing a PDF

To search within a PDF document:

from synapseai import process_pdf

pdf_path = "path/to/your/document.pdf"
query = "What is the main topic of this document?"

results = process_pdf(pdf_path, query)

for chunk, similarity in results:
    print(f"Similarity: {similarity:.4f}")
    print(f"Chunk: {chunk[:200]}...")
    print("-" * 50)

Crawling and Querying a Web Page

To search the content of a web page:

from synapseai import crawl_and_query

url = "https://example.com"
query = "What services does this website offer?"

results = crawl_and_query(url, query)

for chunk, similarity in results:
    print(f"Similarity: {similarity:.4f}")
    print(f"Chunk: {chunk[:200]}...")
    print("-" * 50)

Both functions return a list of tuples, where each tuple contains a relevant text chunk and its similarity score to the query.

Configuration

synapseai can be customized by modifying the config.py file in the package directory. Here are the available configuration options:

Embedding Model

Variable: EMBEDDING_MODEL
Default: "sentence-transformers/all-MiniLM-L6-v2"
Description: The name of the Hugging Face model used for text embeddings.

Text Chunking

Variables: CHUNK_SIZE and CHUNK_OVERLAP
Defaults: CHUNK_SIZE = 1024, CHUNK_OVERLAP = 80
Description: Control how the text is split into chunks for processing.
- CHUNK_SIZE: Maximum number of characters in each chunk.
- CHUNK_OVERLAP: Number of characters that overlap between consecutive chunks.

Search Results

Variable: TOP_K_RESULTS
Default: 5
Description: The number of top results to return from the semantic search.

How to Modify Configuration

To change these settings, locate the config.py file in your synapseai installation directory and edit the values. For example:

# config.py

EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2"
CHUNK_SIZE = 2048
CHUNK_OVERLAP = 100
TOP_K_RESULTS = 10

After modifying the config file, restart your Python environment or reload the synapseai package for the changes to take effect.

Viewing Current Configuration

You can view the current configuration settings in your Python script or interactive session:

import synapseai

synapseai.print_config()

This will display the current values of all configuration options.

Examples

Analyzing a Research Paper

from synapseai import process_pdf

pdf_path = "research_paper.pdf"
query = "What are the key findings of this research?"

results = process_pdf(pdf_path, query)

print("Key findings from the research paper:")
for chunk, similarity in results:
    print(f"Relevance: {similarity:.2f}")
    print(chunk)
    print("-" * 50)

Extracting Information from a Company Website

from synapseai import crawl_and_query

url = "https://www.company.com/about"
query = "What is the company's mission statement?"

results = crawl_and_query(url, query)

print("Company mission statement:")
for chunk, similarity in results:
    if similarity > 0.8:  # Only print highly relevant results
        print(chunk)
        break

Troubleshooting

If you encounter any issues:

Ensure you have the latest version of synapseai installed.
Check that all dependencies are correctly installed.
Verify that the PDF file or URL you're trying to access is valid and accessible.
If you've modified the configuration, try reverting to default settings.

For persistent issues, please open an issue on our GitHub repository. https://github.com/FastianAbdullah/Synapse-AI

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.5.6

Sep 27, 2024

This version

0.5.5

Sep 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engineX-0.5.5.tar.gz (4.7 kB view details)

Uploaded Sep 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

engineX-0.5.5-py3-none-any.whl (9.8 kB view details)

Uploaded Sep 26, 2024 Python 3

File details

Details for the file engineX-0.5.5.tar.gz.

File metadata

Download URL: engineX-0.5.5.tar.gz
Upload date: Sep 26, 2024
Size: 4.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for engineX-0.5.5.tar.gz
Algorithm	Hash digest
SHA256	`6d8ad9fb0ce15cf69d777ae9417eb5b27a6fb34dfecc2bf82c52edb16aed9ddf`
MD5	`7256d47d574707e5b57a3ad00c4fd129`
BLAKE2b-256	`7239f97d99b9efaff33e1bcf2774781951564be45ae00f6aad8e7b733e0aee0f`

See more details on using hashes here.

File details

Details for the file engineX-0.5.5-py3-none-any.whl.

File metadata

Download URL: engineX-0.5.5-py3-none-any.whl
Upload date: Sep 26, 2024
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for engineX-0.5.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69e66cf90cff2dd996cc5c1ef3c58c203259f3003a632cd8b92e0369b3bcdf4a`
MD5	`732a1f502432f28eb7bd7db5726bec8f`
BLAKE2b-256	`b88c390877438a486543144f6ec50fdecb2fedf4d7a4298fbfcbe1aa8177ec26`

See more details on using hashes here.

engineX 0.5.5

Navigation

Verified details

Maintainers

Unverified details

Project description

SynapseAI

Installation

Usage

Processing a PDF

Crawling and Querying a Web Page

Configuration

Embedding Model

Text Chunking

Search Results

How to Modify Configuration

Viewing Current Configuration

Examples

Analyzing a Research Paper

Extracting Information from a Company Website

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes