3 projects
opencrawl
Integrated website crawler and content analysis library
ichigo
Ichigo is an open, ongoing research experiment to extend a text-based LLM to have native listening ability. Think of it as an open data, open weight, on device Siri.
ichigo-asr
Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for the whisper-medium model, designed to enhance performance on multilingual with minimal impact on its original English capabilities. Unlike models that output continuous embeddings, Ichigo Whisper compresses speech into discrete tokens, making it more compatible with large language models (LLMs) for immediate speech understanding.