Skip to main content

A Python library for semantic text chunking using Sentence Transformers.

Project description

Semantic Chunking

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

Features

  • Splits text into semantic chunks using cosine similarity
  • Configurable chunk size and similarity thresholds
  • Based on sentence-transformers

Installation

pip install semantic-chunking

USAGE

from sematic_chunking import SemanticChunker

def test_semantic_chunker():
    text = """Machine learning is a subset of artificial intelligence that focuses on the use of data and algorithms to imitate the way that humans learn.
    Deep learning is a more specialized version of machine learning that uses neural networks with multiple layers.
    These advanced algorithms can process complex patterns and make intelligent decisions with minimal human intervention.
    The applications of machine learning are vast, ranging from image recognition to natural language processing."""

    chunker = SemanticChunker(model_name='all-MiniLM-L6-v2', max_chunk_size=128, similarity_threshold=0.3)
    chunks = chunker.semantic_chunk(text)

    print("Semantic Chunks:")
    for i, chunk in enumerate(chunks, 1):
        print(f"Chunk {i}: {chunk}")
        print(f"Words: {len(chunk.split())}\n")

# Uncomment to test
test_semantic_chunker()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_chunking-0.1.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_chunking-0.1.0-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file semantic_chunking-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_chunking-0.1.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for semantic_chunking-0.1.0.tar.gz
Algorithm Hash digest
SHA256 effb0e292c2337c314f4d99c72e100c6937ac99475bd7890c8bdc680d2a5d6b6
MD5 4ee522cac795ed70675670373d2fbf51
BLAKE2b-256 8d4b314303e29e01eed05c83ade55b033a1ab64260bdb248d9d221d432835f3c

See more details on using hashes here.

File details

Details for the file semantic_chunking-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_chunking-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82cfd43f362238ac2a7c64c04a59a16aacebc94b5d4fc437bca411aa3a3b2de8
MD5 b29ccdc4f9a32a14c6069bb1b1f5c20c
BLAKE2b-256 b13cd2761c6fcde351dd16be133ea506d745fefec13d8edc2ecf957e16a8a4a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page