Skip to main content

AI-powered music generation engine for lyrics, vocals, and instrumentation.

Project description

SonosphereAI 🎶

Description

SonosphereAI is a unified, AI-powered music production suite designed to streamline the entire creative workflow from concept to track. Leveraging state-of-the-art models like Musicgen for instrumentals and Bark AI/Coqui XTTS for voice synthesis, the platform offers two dedicated environments: the Sonic Studio for diverse, multi-lingual AI voices, and the Vocal Studio for personalized voice cloning and pitch refinement. Core features include a Markovify-driven Lyrics Studio, an automated Quality Assurance pipeline with professional Noise Reduction and Auto-Tuning, and flexible output for generating either a complete song or Text-to-Speech narration. It is the end-to-end tool for creators seeking to transform ideas into broadcast-ready, high-fidelity musical pieces.


✨ Features

🎶 AI Music Studio: Production Suite

This application provides a complete, professional workflow for music creation, from generating lyrics and instrumentals to synthesizing vocals and mastering the final track. The entire system is built upon a unified processing pipeline, meaning that all core AI models are fully integrated and accessible across the entire platform.

The core engine is powered by an advanced suite of AI models and tools:

  • Musicgen: For generating complex, high-fidelity instrumental tracks.
  • Bark AI & Coqui XTTS: For highly expressive and multi-lingual Text-to-Speech (TTS) vocal synthesis.
  • Markovify: For generating coherent, contextually relevant song lyrics.

⚙️ Supported Capabilities Overview

The SonosphereAI platform offers extensive support across multiple dimensions for deep customization of your music.

Category Supported Items Notes
Instruments Full Music, Piano, Guitar, Violin, Drums, Bass, Flute, Bell Used by Musicgen to tailor instrumental tracks.
Genres Pop, Rock, Hip-Hop, Electronic, Country, R&B, Metal, Reggae Guides the style of both music and lyric generation.
TTS (Coqui XTTS) English, Chinese, French, German, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Turkish, Arabic Provides diverse, high-fidelity, and multi-lingual voice synthesis.
TTS (Bark AI) English, Chinese, French, German, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Turkish Used for voice cloning in the Vocal Studio (13 languages).
Lyrics Studio English, Chinese, French, German, Italian, Russian, Spanish, Arabic Languages supported by the Markovify text generator (8 languages).

🎙️ Sonic Studio & Vocal Studio: The Creation Workflow

The two main studios guide the user's input, but both feed into the same backend system for processing and mixing. This application is built on a single, powerful architecture that merges instrumental creation and vocal synthesis into one seamless process. The two Studio tabs act as input interfaces, but they both utilize the same underlying AI engine, ensuring a consistent workflow and professional-grade output.

The entire system is powered by an integrated suite of advanced AI models:

AI Model Primary Function Secondary Function (Shared Pipeline)
Musicgen Generating complex, high-fidelity Instrumental Tracks. Integrated into the final mixing process.
Bark AI Highly expressive Text-to-Speech (TTS) Synthesis. Available for vocal generation in both Sonic & Vocal Studio contexts.
Coqui XTTS Multi-lingual, Multi-speaker TTS and Voice Cloning. Available for vocal generation in both Sonic & Vocal Studio contexts.

🎹 Sonic Studio: Ready-Made Voice Generation

The Sonic Studio is focused on efficiency, allowing you to create an entire song using pre-trained, high-quality AI voices.

  • Voice Source: You choose from a variety of male and female voices across 14 supported languages (English, Chinese, French, German, Spanish, etc.).
  • Core AI Technology: Instrumental generation is primarily driven by Musicgen. The vocals are created simultaneously by the integrated Bark AI or Coqui XTTS model you select.
  • Creative Input: Guide the generation using a Text Prompt, selecting a Genre (Pop, Rock, Hip-Hop, etc.), and focusing on specific Instruments.
  • Text-to-Speech (TTS) Only Mode: In addition to full music generation, the studio offers the flexibility to generate just the synthesized vocal track without the instrumental music. This allows you to use the diverse variety of pre-set, multi-lingual voices (powered by Bark AI and Coqui XTTS) for narration, audiobooks, or focusing purely on vocal output.
  • Vocal Refinement (Auto-Tuning): Using the librosa library, the system automatically analyzes the fundamental frequency of the vocal track. It then applies an intelligent pitch shift correction to stabilize the vocal tuning, guaranteeing the voice is perfectly in key with the instrumental track.
  • Combining and Mixing: The refined vocal track and the instrumental track are combined using professional audio processing tools (FFmpeg is executed via subprocess). This step automatically balances the audio levels: the * vocals are kept prominent* (volume 1.0) over the backing instrumental (volume 0.6), delivering a balanced, broadcast-quality mix.
  • Final Output: The finished song is saved in the user's selected file format (WAV, MP3, FLAC, OPUS, and OGG).

🎤 Vocal Studio: Your Voice, Your Song

The Vocal Studio is dedicated to Vocal Cloning and Pitch Refinement, allowing you to use your own recorded voice for the track.

  • Voice Source: You record a sample directly in-app. This recording is then cloned or used as the basis for the final song.
  • Core AI Technology: While it relies on Musicgen to create the instrumental track and uses Bark AI / Coqui XTTS for internal processing and analysis, the main focus is on high-fidelity vocal processing (Noise Reduction, Pitch Correction) applied to your uploaded audio.
  • Process: Your voice is cleaned and refined before being mixed with the instrumental track.
  • Text-to-Speech (TTS) Only Mode: The studio allows you to generate just the synthesized vocal track without the instrumental accompaniment. This functionality is essential for reading text using your own cloned/recorded voice, perfect for custom narration, audio letters, or creating unique vocal samples.
  • Vocal Refinement (Auto-Tuning): Using the librosa library, the system automatically analyzes the fundamental frequency of the vocal track. It then applies an intelligent pitch shift correction to stabilize the vocal tuning, guaranteeing the voice is perfectly in key with the instrumental track.
  • Combining and Mixing: The refined vocal track and the instrumental track are combined using professional audio processing tools (FFmpeg is executed via subprocess). This step automatically balances the audio levels: the * vocals are kept prominent* (volume 1.0) over the backing instrumental (volume 0.6), delivering a balanced, broadcast-quality mix.
  • Final Output: The finished song is saved in the user's selected file format (WAV, MP3, FLAC, OPUS, and OGG).

✍️ Lyrics Studio: AI Lyric Generator

The Lyrics Studio provides a massive creative jumpstart, helping you generate custom, unique lyrics tailored to your specific song parameters.

  • Core AI Technology: Markovify: The system is powered by the markovify library, which builds a complex statistical model of word relationships from a vast database of lyrics. This allows it to construct new verses that are grammatically sound and possess the stylistic coherence of real songs.
  • Lyrics Database & Customization:
    • The generator draws from a massive, multi-genre, and multi-lingual lyrics database (e.g., lyrics_corpus.db).
    • You can provide a short Creative Bias (a few keywords or lines) to weight the AI's output, ensuring the generated lyrics revolve around your core theme.
  • Quality Control: All generated lyrics are automatically scanned and cleaned using a profanity detection library before they are displayed, guaranteeing appropriate content.

🎚️ Automated Audio Cleanup (Noise Reduction)

This section of the Application ensures professional clarity by automatically cleaning your audio, eliminating common recording issues like hiss, static, or environmental background noise.

  • Process Overview: As soon as a vocal track is created or uploaded, it is routed through an advanced denoising stage. This employs sophisticated algorithms (from libraries like noisereduce and denoiser) to analyze the audio spectrum, identify the noise profile, and surgically remove the unwanted frequencies.
  • Final Output: The finished song is saved in the user's selected file format (WAV, MP3, FLAC, OPUS, and OGG).

Demonstration of Cleanup Quality

To showcase the effectiveness of this automated process, here are two samples illustrating the difference between an uncleaned source file and the final output:

  • Original Audio: Here is a sample of the original uncleaned file that has bunch of clicks and noises.

Noise File:

https://github.com/user-attachments/assets/9a9c4ae4-a68a-438b-ac56-f92b2d3b07c6

  • Cleaned Output: This is the output after using the app's professional audio cleanup pipeline, where you hear nothing but the pure, clean vocal.

Denoised File:

https://github.com/user-attachments/assets/04e1b069-462e-48f7-9f7e-817fb0336b9f


🛠️ Setup, Installation, and Configuration


Installation

  1. Install Python:
    Download and install Python from python.org.

  2. (Optional) Anaconda:
    If you prefer using Anaconda, you can — but note: PyCharm (or a plain venv) tends to handle this project more predictably. If you do use Anaconda, make sure your environment is clean and dedicated to this project.

  3. Clone the repository:

    git clone https://github.com/MOGHADBAN/SonosphereAI.git
    
  4. Navigate to the backend directory:

    cd SonosphereAI/backend
    
  5. Create a virtual environment:

    python -m venv venv
    
  6. Activate the virtual environment:

    • Windows: venv\Scripts\activate.bat
    • Bash/macOS: source venv/bin/activate
  7. Install dependencies:

    pip install sonosphereai
    
  8. Lyrics Database Setup: The Lyrics Studio requires a large song data corpus to train the Markovify AI. Since the database file is too large to include directly in the repository, we use the dedicated gdown utility to ensure a reliable download from Google Drive.

⚙️ Option 1: Automatic Setup Script (Recommended) This is the fastest and most reliable way to get the database file.

1. Prerequisite Ensure the gdown library is installed on your system:

pip install gdown

2. Run the Setup Script Execute the script from your terminal in the directory .\SonosphereAI\backend

python setup_database.py

This script handles the Google Drive security checks, downloads the file, and places it in the correct location.

📍 Expected File Location: The database file will be placed here relative to the root directory: ../frontend/static/lyrics_corpora/lyrics_corpus.db

⚙️ Option 2: Manual Download (Backup) Use this option only if the automatic script (Option 1) fails.

File Name: lyrics_corpus.db

Download Link: https://drive.google.com/file/d/1dH4wl9wJul1zph1HD0LHsNlBw66rg9Dz/view?usp=drive_link

Required Path:
After downloading the file, you must place lyrics_corpus.db inside the following directory structure relative to your main application folder:

../frontend/static/lyrics_corpora/lyrics_corpus.db

Note:
If the lyrics_corpora folder does not exist within frontend/static/, please create it before placing the database file. If the file is not placed at this exact path, the Lyrics Studio will be unable to access the data needed for generation.

  1. Run the application:

    python3 app.py
    

Usage

Once executed, follow prompt for a link provided in the log when you run the python3 app.py command. Click on that link in a browser to launch the app.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonosphereai-1.1.1.tar.gz (71.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sonosphereai-1.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file sonosphereai-1.1.1.tar.gz.

File metadata

  • Download URL: sonosphereai-1.1.1.tar.gz
  • Upload date:
  • Size: 71.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for sonosphereai-1.1.1.tar.gz
Algorithm Hash digest
SHA256 2d1ded657a3d31319b17b1add9c94e11263ee56018cf9348de30ed3632497492
MD5 346afac2bdf25bca12e1ecf02361c284
BLAKE2b-256 83bf043088b055778e75af907916f7475f8b8a17d48b23082cf5b70cf2ac026b

See more details on using hashes here.

File details

Details for the file sonosphereai-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: sonosphereai-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for sonosphereai-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0deb6c4e8920a5d50cb818962996c18753a89e46c993192b4750c0814284c34d
MD5 965c33511a1a75c8b24c1a476d9e4176
BLAKE2b-256 483efe6e2a607645798f14e69e89ebb00bd53eaf98bb59ea35c8dc217759a633

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page