Skip to main content

AssemblyAI Haystack Integration

Project description


CI Passing GitHub License PyPI version PyPI Python Versions PyPI - Wheel AssemblyAI Twitter AssemblyAI YouTube Discord

AssemblyAI Audio Transcript Loader

The AssemblyAI Audio Transcript Loader allows you to transcribe audio files with the AssemblyAI API and load the transcribed text into Haystack documents.

To use this package, you should have the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument while adding a component (see usage code example below).

More info about AssemblyAI:

Installation

First, install the assemblyai-haystack python package.

pip install assemblyai-haystack

This package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the assemblyai-python-sdk GitHub repo.

Usage

The AssemblyAITranscriber needs to be initialized with the AssemblyAI API key. The run function needs at least the file_path argument. Audio files can be specified as an URL or a local file path. You can also specify whether you want summarization and speaker diarization results in the run function.

import os

from assemblyai_haystack.transcriber import AssemblyAITranscriber
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Pipeline
from haystack.components.writers import DocumentWriter

ASSEMBLYAI_API_KEY = os.environ.get("ASSEMBLYAI_API_KEY")

## Use AssemblyAITranscriber in a pipeline
document_store = InMemoryDocumentStore()
file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

indexing = Pipeline()
indexing.add_component("transcriber", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("transcriber.transcription", "writer.documents")
indexing.run(
    {
        "transcriber": {
            "file_path": file_url,
            "summarization": None,
            "speaker_labels": None,
        }
    }
)

print("Indexed Document Count:", document_store.count_documents())

Note: Calling indexing.run() blocks until the transcription is finished.

The results of the transcription, summarization and speaker diarization are returned in separate document lists:

  • transcription
  • summarization
  • speaker_labels

The metadata of the transcription document contains the transcription ID and url of the uploaded audio file.

{
   "transcript_id":"73089e32-...-4ae9-97a4-eca7fe20a8b1",
   "audio_url":"https://storage.googleapis.com/aai-docs-samples/nbc.mp3"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assemblyai-haystack-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

assemblyai_haystack-0.1.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file assemblyai-haystack-0.1.1.tar.gz.

File metadata

  • Download URL: assemblyai-haystack-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for assemblyai-haystack-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dbf6a00dbc503876e4f2d7be49cdc1297148a18b4b61f60eaa227c9b438cd1a4
MD5 952f1eb8d7753b526c4f2ffba4d108ed
BLAKE2b-256 6d7a6b125ba117a5bcd98c2178c7d6290af346e85dd7c2c110e0680a287e9dfb

See more details on using hashes here.

File details

Details for the file assemblyai_haystack-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for assemblyai_haystack-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e070b58f334776c79f9ff8607d793bab87edee393269da1aa65cff533c4f80a0
MD5 691b5fda7e946ac4de18893d7fc47964
BLAKE2b-256 a47b4b3d88a521d753d2c594e3b5ccdda409fa77f9542d42e8b74a1c40cb0ecb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page