A project to use Sentence Transformers and embeddings to make a pocket search engine
Project description
Project: Oráculo
Oráculo is a versatile CLI and WebApp application developed for transcription of audios and semantic search. It leverages Sentence Transformers and embeddings to create a compact search engine that aids in retrieving and organizing important information from a collection of documents.
This application is particularly useful for professionals dealing with substantial amounts of audio data and requiring an efficient system to transcribe and conduct semantic search operations on the data.
Features:
- Audio Transcription: Oráculo can transcribe audio files. You can transcribe a single file or bulk transcribe a folder.
- Semantic Search: A web app to perform semantic searches on the transcribed audio data.
Requirements:
:warning: IMPORTANT :warning: In order to run Oráculo, you need to have the following requirements installed on your machine:
- Python 3.10
- FFmpeg
- Git
Installation:
You can install Oráculo with pip:
pip install oraculo
Setup:
:warning::warning:Warning:warning::warning:: The following steps are required to run Oráculo. Please follow the steps carefully.
Initialize the Oráculo application with the following command:
oraculo init
You will be prompted to enter the following information:
Information | Description |
---|---|
ChromaDB Persist Directory | The directory where the ChromaDB will be stored. This is important to store vector embeddings of text |
ChromaDB Implementation | Defaults to duckdb+parquet . For more implementations, please refer to Source Code |
Whenever you want to change the config file, just run the same command again.
Usage:
Semantic Search:
To start the Semantic Search Application, use the following command:
oraculo webapp
Single File Transcription:
To initiate a transcription for a single file:
oraculo transcribe
Multiple File Transcription:
To initiate bulk transcription for a folder:
oraculo bulk-transcribe
to transcribe youtube videos:
YouTube Video Transcription:
oraculo transcribe-yt
Help:
If you need help with the commands, use the following command:
oraculo --help
About
- Version: 0.1.14
- Author: Joao Tedeschi
- Contact: joaorafaelbt@gmail.com
The development of Oráculo is aimed at information retrieval capabilities for businesses and individual users. Please feel free to reach out with any feedback or suggestions to improve Oráculo further.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oraculo-0.1.14.tar.gz
.
File metadata
- Download URL: oraculo-0.1.14.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.90.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98705621c35bb1bd30a4632f0dd28b017f76d42300bdd962c2db36c9412e74f1 |
|
MD5 | a3042e67dd99dbfaefde87d95769b54d |
|
BLAKE2b-256 | 9cf1906d7817968151284877f22b730893aa076ddd0d277b3c55ce1e6ab9bffc |
File details
Details for the file oraculo-0.1.14-py3-none-any.whl
.
File metadata
- Download URL: oraculo-0.1.14-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.90.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba91a8714118e40b2354f70f30f93dbc7ccb9f8139cbc90bb671c584fe38831a |
|
MD5 | 085ecc625399ede1cf8e5df6955b0b02 |
|
BLAKE2b-256 | d98e563ce36390af8cf8fab15b132642e8f380738c4d0ca0a5800d4937601de8 |