Lightweight Python library that uses sentences embeddings to create naturally coherent segments of text akin to paragraphs.
Project description
cohesive
cohesive is a lightweight segmenter that uses sentence embeddings to split documents into naturally coherent segments akin to paragraphs.
Installation
You can install 'cohesive' using pip:
pip install cohesive
Using cohesive
To start using the SDK, simply import the CohesiveTextSegmenter and create a new instance:
from cohesive import CohesiveTextSegmenter
# Instantiate the CohesiveTextSegmenter with the model that you want to use.
# By default, cohesive utilizes paraphrase-MiniLM-L6-v2, which has produced good results.
cohesive = CohesiveTextSegmenter("all-MiniLM-L6-v2")
# Then, all you need to do is call the generate_tiles method and pass in an array of sentences.
cohesive.generate_segments(sentences)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cohesive-0.1.tar.gz
(8.1 kB
view details)
Built Distribution
File details
Details for the file cohesive-0.1.tar.gz
.
File metadata
- Download URL: cohesive-0.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7d3b390dd2e6b8dfbfcdf342ad5b644004b2e266d19444ae55e023852dcb068 |
|
MD5 | c978ae71edcba8c454a1a6d60d567b44 |
|
BLAKE2b-256 | e55bad366c79a5139ad06c94fa2270d279ae43c6226dd26d1d02e6c9a49c70e0 |
File details
Details for the file cohesive-0.1-py3-none-any.whl
.
File metadata
- Download URL: cohesive-0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18865b68ff4afa98b884631c9d47881ed95a02e7133ef6fd95185e9a1370a10d |
|
MD5 | 110d51d778e1696ba957b0dc32e267e5 |
|
BLAKE2b-256 | b289e658ee423512a8855a89b0fcf79768f7a19512b72e9f553f2b933e7b0c8c |