Skip to content

krishnaadithya/open-podcastlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open PodcastLM

Overview

Open-PodcastLM is inspired by the NotebookLM and NotebookLlama. It transforms PDF documents into engaging podcast-style conversations using opensource language models and text-to-speech technology. The tool processes PDF content, generates natural dialogues, and creates high-quality audio output featuring two distinct voices.

Built with:

Features

  • Intelligent PDF Processing: Advanced text extraction and cleaning
  • Natural Dialogue Generation: Creates engaging conversations between host and guest
  • Dual Voice System: Distinct voices for host and guest using state-of-the-art TTS models
  • High-Quality Audio: Professional-grade audio output with natural speech patterns

Installation

  1. Clone the repository:
git clone https://github.com/krishnaadithya/open-podcastlm.git
cd open-podcastlm
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your Nebius API key:
export NEBIUS_API_KEY='your_api_key_here'

Command Line Arguments --pdf, -p: Path to the input PDF file (required) --output, -o: Output audio file path (default: output.mp3)

Usage

python main.py --pdf path/to/document.pdf --output podcast.mp3

Result:

Listen to Sample Generated Podcast

Project Structure

├── src/
│   ├── processors/
│   │   ├── text_processor.py
│   │   └── pdf_processor.py
│   ├── generators/
│   │   └── audio_generator.py
│   ├── clients/
│   │   └── llm_client.py
│   └── main.py
├── assets/
├── tmp/
├── README.md
└── requirements.txt

Components

  • PDFProcessor: Handles PDF text extraction
  • TextProcessor: Cleans and formats extracted text
  • LLMClient: Manages API interactions with LLaMA models
  • AudioGenerator: Generates podcast audio using dual TTS engines

Configuration

The system uses two different TTS models:

  • ParlerTTS for Speaker 1 (Main host)
  • Bark for Speaker 2 (Guest)

Requirements

  • CUDA-compatible GPU with 24GB VRAM
  • Nebius API access

License

MIT License

About

Convert any PDF into a podcast episode

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages