AssemblyAI Haystack Integration

These details have not been verified by PyPI

Project links

Project description

PyPI - Wheel

AssemblyAI Audio Transcript Loader

The AssemblyAI Audio Transcript Loader allows you to transcribe audio files with the AssemblyAI API and load the transcribed text into Haystack documents.

To use this package, you should have the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument while adding a component (see usage code example below).

More info about AssemblyAI:

Installation

First, install the assemblyai-haystack python package.

pip install assemblyai-haystack

This package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the assemblyai-python-sdk GitHub repo.

Usage

The AssemblyAITranscriber needs to be initialized with the AssemblyAI API key. The run function needs at least the file_path argument. Audio files can be specified as an URL or a local file path. You can also specify whether you want summarization and speaker diarization results in the run function.

import os

from assemblyai_haystack.transcriber import AssemblyAITranscriber
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Pipeline
from haystack.components.writers import DocumentWriter

ASSEMBLYAI_API_KEY = os.environ.get("ASSEMBLYAI_API_KEY")

## Use AssemblyAITranscriber in a pipeline
document_store = InMemoryDocumentStore()
file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

indexing = Pipeline()
indexing.add_component("transcriber", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("transcriber.transcription", "writer.documents")
indexing.run(
    {
        "transcriber": {
            "file_path": file_url,
            "summarization": None,
            "speaker_labels": None,
        }
    }
)

print("Indexed Document Count:", document_store.count_documents())

Note: Calling indexing.run() blocks until the transcription is finished.

The results of the transcription, summarization and speaker diarization are returned in separate document lists:

transcription
summarization
speaker_labels

The metadata of the transcription document contains the transcription ID and url of the uploaded audio file.

{
   "transcript_id":"73089e32-...-4ae9-97a4-eca7fe20a8b1",
   "audio_url":"https://storage.googleapis.com/aai-docs-samples/nbc.mp3"
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jan 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assemblyai-haystack-0.1.1.tar.gz (7.9 kB view details)

Uploaded Jan 11, 2024 Source

Built Distribution

assemblyai_haystack-0.1.1-py3-none-any.whl (8.3 kB view details)

Uploaded Jan 11, 2024 Python 3

File details

Details for the file assemblyai-haystack-0.1.1.tar.gz.

File metadata

Download URL: assemblyai-haystack-0.1.1.tar.gz
Upload date: Jan 11, 2024
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for assemblyai-haystack-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`dbf6a00dbc503876e4f2d7be49cdc1297148a18b4b61f60eaa227c9b438cd1a4`
MD5	`952f1eb8d7753b526c4f2ffba4d108ed`
BLAKE2b-256	`6d7a6b125ba117a5bcd98c2178c7d6290af346e85dd7c2c110e0680a287e9dfb`

See more details on using hashes here.

File details

Details for the file assemblyai_haystack-0.1.1-py3-none-any.whl.

File metadata

Download URL: assemblyai_haystack-0.1.1-py3-none-any.whl
Upload date: Jan 11, 2024
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for assemblyai_haystack-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e070b58f334776c79f9ff8607d793bab87edee393269da1aa65cff533c4f80a0`
MD5	`691b5fda7e946ac4de18893d7fc47964`
BLAKE2b-256	`a47b4b3d88a521d753d2c594e3b5ccdda409fa77f9542d42e8b74a1c40cb0ecb`

See more details on using hashes here.

assemblyai-haystack 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AssemblyAI Audio Transcript Loader

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes