Skip to content

Latest commit

 

History

History
84 lines (50 loc) · 2.78 KB

VAD_Project_and_Research_Formatted.md

File metadata and controls

84 lines (50 loc) · 2.78 KB

Voice Activity Detection (VAD) in Noisy Environments

Joshua Ball
Edward E. Whitacre Jr. College of Engineering
Department of Computer Science
IEEE
#94164957
Lubbock, USA
[email protected]


Abstract

Voice Activity Detection (VAD) plays a pivotal role in numerous digital audio applications. Distinguishing speech from non-speech in noisy environments remains a challenge. This research aims to design an efficient VAD system that demonstrates high accuracy across various noise conditions.


1. Introduction

Voice Activity Detection, at its core, determines the presence or absence of human speech in an audio signal. Its applications span speech recognition, telecommunication systems, and audio compression. The need for a robust VAD system is accentuated in real-world scenarios where noise is omnipresent. This research delves into the design and validation of such a system.


2. Related Work (Literature Review)

Historically, VAD techniques have ranged from basic energy thresholds and zero-crossing rates to more advanced machine learning approaches. With the advent of deep learning, neural network-based VAD systems have emerged.

  • Traditional VAD techniques.
  • Recent advancements in neural network-based VAD.
  • Challenges and shortcomings of existing methods.

3. Problem Statement

Despite advancements, VAD in noisy conditions remains challenging. This research aims to:

  • Analyze the spectral and temporal characteristics of speech and non-speech.
  • Design a VAD system resilient to varying noise conditions.
  • Validate the system's performance against benchmark datasets.

4. Methodology

4.1 Dataset

A dataset comprising clean speech, non-speech, and speech with various noise backgrounds is employed.

4.2 Preprocessing

Audio signals are resampled and segmented for consistent analysis.

4.3 Feature Extraction

Features like MFCCs, spectral contrast, and chroma features are extracted.

4.4 Classification

An SVM classifier is trained on the extracted features to differentiate between speech and non-speech segments.


5. Experiments & Results

The VAD system's performance is validated against various noise conditions, including white noise, street noise, and cafe ambiance. Performance metrics such as accuracy, precision, recall, and F1-score are computed.


6. Discussion

Insights from experimental results, challenges faced during research, and comparisons with existing VAD systems are discussed.


7. Conclusion

A summary of the research, highlighting contributions and potential future work. The proposed VAD system's viability for real-world applications is discussed.


References

  1. [Typical paper on basic VAD techniques.]
  2. [Recent paper on neural networks for VAD.]
  3. [Review paper on challenges in VAD.]