Skip to main content

A project to use Sentence Transformers and embeddings to make a pocket search engine

Project description

Project: Oráculo

Oráculo is a versatile CLI and WebApp application developed for transcription of audios and semantic search. It leverages Sentence Transformers and embeddings to create a compact search engine that aids in retrieving and organizing important information from a collection of documents.

This application is particularly useful for professionals dealing with substantial amounts of audio data and requiring an efficient system to transcribe and conduct semantic search operations on the data.

Features:

  • Audio Transcription: Oráculo can transcribe audio files. You can transcribe a single file or bulk transcribe a folder.
  • Semantic Search: A web app to perform semantic searches on the transcribed audio data.

Requirements:

:warning: IMPORTANT :warning: In order to run Oráculo, you need to have the following requirements installed on your machine:

  • Python 3.10
  • FFmpeg
  • Git

Installation:

You can install Oráculo with pip:

pip install oraculo

Setup:

:warning::warning:Warning:warning::warning:: The following steps are required to run Oráculo. Please follow the steps carefully.

Initialize the Oráculo application with the following command:

oraculo init

You will be prompted to enter the following information:

Information Description
ChromaDB Persist Directory The directory where the ChromaDB will be stored. This is important to store vector embeddings of text
ChromaDB Implementation Defaults to duckdb+parquet. For more implementations, please refer to Source Code

Whenever you want to change the config file, just run the same command again.

Usage:

Semantic Search:

To start the Semantic Search Application, use the following command:

oraculo webapp

Single File Transcription:

To initiate a transcription for a single file:

oraculo transcribe

Multiple File Transcription:

To initiate bulk transcription for a folder:

oraculo bulk-transcribe

to transcribe youtube videos:

YouTube Video Transcription:

oraculo transcribe-yt

Help:

If you need help with the commands, use the following command:

oraculo --help

About

The development of Oráculo is aimed at information retrieval capabilities for businesses and individual users. Please feel free to reach out with any feedback or suggestions to improve Oráculo further.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oraculo-0.1.14.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

oraculo-0.1.14-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file oraculo-0.1.14.tar.gz.

File metadata

  • Download URL: oraculo-0.1.14.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.90.1-microsoft-standard-WSL2

File hashes

Hashes for oraculo-0.1.14.tar.gz
Algorithm Hash digest
SHA256 98705621c35bb1bd30a4632f0dd28b017f76d42300bdd962c2db36c9412e74f1
MD5 a3042e67dd99dbfaefde87d95769b54d
BLAKE2b-256 9cf1906d7817968151284877f22b730893aa076ddd0d277b3c55ce1e6ab9bffc

See more details on using hashes here.

File details

Details for the file oraculo-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: oraculo-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.90.1-microsoft-standard-WSL2

File hashes

Hashes for oraculo-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 ba91a8714118e40b2354f70f30f93dbc7ccb9f8139cbc90bb671c584fe38831a
MD5 085ecc625399ede1cf8e5df6955b0b02
BLAKE2b-256 d98e563ce36390af8cf8fab15b132642e8f380738c4d0ca0a5800d4937601de8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page