You can download a Youtube video transcript with
yt-dlp
.
yt-dlp --write-auto-sub --skip-download --sub-format vtt --output transcript "<video_url>"
This will output a file called transcript.en.vtt
. That file can be cleaned
like this, to remove all formatting and metadata except the transcript text.
cat transcript.en.vtt | grep : -v | awk '!seen[$0]++'
This approach is useful for simple way to pipe the contents of a Youtube video into an LLM, my motivation for finding a way to accomplish this task.
Here is a script that pulls the transcript of a video then summarizes it using
llm
.
#!/bin/zsh
if [ $# -eq 0 ]
then
echo "No arguments supplied. Please provide a YouTube URL as an argument."
exit 1
fi
yt-dlp --write-auto-sub --skip-download --sub-format vtt --output transcript "$1" >/dev/null 2>&1
cat transcript.en.vtt | grep : -v | awk '!seen[$0]++' | llm "write a short summary of the contents of this youtube video transcript"
rm transcript.en.vtt
Run it like this
./summarize.sh <youtube_url>