Open-PodcastLM is inspired by the NotebookLM and NotebookLlama. It transforms PDF documents into engaging podcast-style conversations using opensource language models and text-to-speech technology. The tool processes PDF content, generates natural dialogues, and creates high-quality audio output featuring two distinct voices.
Built with:
- Meta LLaMA 3.1 8B, 405B via Nebius AI Studio
- ParlerTTS for Host Voice
- Bark for Guest Voice
- Intelligent PDF Processing: Advanced text extraction and cleaning
- Natural Dialogue Generation: Creates engaging conversations between host and guest
- Dual Voice System: Distinct voices for host and guest using state-of-the-art TTS models
- High-Quality Audio: Professional-grade audio output with natural speech patterns
- Clone the repository:
git clone https://github.com/krishnaadithya/open-podcastlm.git
cd open-podcastlm
- Install dependencies:
pip install -r requirements.txt
- Set up your Nebius API key:
export NEBIUS_API_KEY='your_api_key_here'
Command Line Arguments --pdf, -p: Path to the input PDF file (required) --output, -o: Output audio file path (default: output.mp3)
python main.py --pdf path/to/document.pdf --output podcast.mp3
Listen to Sample Generated Podcast
├── src/
│ ├── processors/
│ │ ├── text_processor.py
│ │ └── pdf_processor.py
│ ├── generators/
│ │ └── audio_generator.py
│ ├── clients/
│ │ └── llm_client.py
│ └── main.py
├── assets/
├── tmp/
├── README.md
└── requirements.txt
- PDFProcessor: Handles PDF text extraction
- TextProcessor: Cleans and formats extracted text
- LLMClient: Manages API interactions with LLaMA models
- AudioGenerator: Generates podcast audio using dual TTS engines
The system uses two different TTS models:
- ParlerTTS for Speaker 1 (Main host)
- Bark for Speaker 2 (Guest)
- CUDA-compatible GPU with 24GB VRAM
- Nebius API access
MIT License