Audio Processing & Speech-to-Text Pipeline¶

Audio → Transcript → Summary → Keywords¶

Audio files often contain valuable information, but manually reviewing them can be time-consuming. This notebook automates the end-to-end pipeline — transcribing audio with Whisper, cleaning the raw transcript, generating concise summaries using FLAN-T5, and extracting key keywords for quick insights.

Workflow¶

Audio File (.mp3)
    └── Whisper → Raw Transcript
                    └── Clean → FLAN-T5 Summarizer
                                    ├── Chunk Summaries → Final Summary
                                    ├── NLTK → Top Keywords

Import Libraries¶

In [1]:
import os

ffmpeg_path = r"C:\ffmpeg\ffmpeg-8.1-essentials_build\bin"
os.environ["PATH"] += os.pathsep + ffmpeg_path
In [2]:
import whisper
import re
import nltk
import string
from collections import Counter
from transformers import pipeline
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

Transcribe Audio with Whisper¶

Whisper is OpenAI's open-source speech recognition model. The base model balances speed and accuracy well for lecture audio.

In [3]:
model = whisper.load_model("base")
print("Whisper model loaded successfully.")
Whisper model loaded successfully.
In [4]:
audio_file = r"C:\Users\ADMIN\lecture_full.mp3"   # <-- update this path

result = model.transcribe(audio_file)
transcript = result["text"]
C:\Users\ADMIN\anaconda3\envs\trading_env\Lib\site-packages\whisper\transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

Clean the Transcript¶

Raw speech-to-text output contains filler words (um, uh, you know) and irregular spacing. This step strips them out using regex so the text is ready for summarization.

In [6]:
def clean_transcript(text):
    text = re.sub(r"\b(um|uh|hmm|you know|like)\b", "", text, flags=re.IGNORECASE)
    text = re.sub(r"\s+", " ", text)
    text = re.sub(r"[^\w\s.,!?]", "", text)
    return text.strip()

cleaned_text = clean_transcript(transcript)

Summarize with FLAN-T5 and Extract Top Keywords¶

FLAN-T5 (google/flan-t5-base) is an instruction-tuned seq2seq model well-suited for summarization tasks.

Using NLTK, stopwords and punctuation are filtered out, and the most frequent meaningful words are extracted. These keywords give a quick sense of the lecture's core topics.

In [8]:
from transformers import pipeline

summarizer = pipeline(
    "text-generation",
    model="google/flan-t5-base"
)

def chunk_text(text, chunk_size=1000):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

chunks = chunk_text(cleaned_text)

print(f"\nTotal chunks created: {len(chunks)}")

all_summaries = []

for i, chunk in enumerate(chunks):
    print(f"Summarizing chunk {i+1}/{len(chunks)}...")
    
    prompt = f"Summarize this lecture transcript clearly and concisely:\n\n{chunk}"
    
    result = summarizer(
        prompt,
        max_length=200,
        do_sample=False
    )
    
    all_summaries.append(result[0]["generated_text"])

final_summary = " ".join(all_summaries)
stop_words = set(stopwords.words("english"))

words = word_tokenize(cleaned_text.lower())

words = [
    word for word in words
    if word not in stop_words
    and word not in string.punctuation
    and len(word) > 2
]

keywords = Counter(words).most_common(10)

print("\n===== TOP KEYWORDS =====\n")
for word, freq in keywords:
    print(f"{word}: {freq}")
===== TOP KEYWORDS =====

time: 23
series: 21
data: 19
analysis: 14
component: 10
use: 7
thats: 7
one: 6
components: 6
forecasting: 6
In [9]:
print(final_summary[:1000])
Summarize this lecture transcript clearly and concisely:

My smartwatch tracks how much sleep I get each night. If Im feeling curious, I can look on my phone and see my nightly slumber plotted on a graph. It might look something this. And on the graph, on the Y axis, we have the hours of sleep. And then on the X axis, we have days. And this is an example of a time series. And what a time series is is data of the same entity, my sleep hours, collected at regular intervals, over days. And when we have time series, we can perform a time series analysis. And this is where we analyse the timestamp data to extract meaningful insights and predictions about the future. And while its super useful to forecast that I am going to probably get seven hours shut eye tonight based on the data, time series analysis plays a significant role in helping organisations drive better business decisions. So for example, using time series analysis, a retailer can use this functionality to predict future sales a

Summary¶

Step Tool Output
Transcribe audio openai-whisper Raw transcript string
Clean transcript re (regex) Cleaned text
Summarize FLAN-T5 via transformers Chunk-wise + final summary
Extract keywords NLTK Top 10 frequent terms