Need Help Setting Up Automatic YouTube Video Transcription for Workflow Automation

I’m trying to build an automated system that can take YouTube videos and create transcripts without me having to do it manually every time. Right now I have to copy video links, paste them into transcription tools, wait for results, then copy the text again. It takes forever and I do this constantly for my channel.

I already use automation tools to post my videos to my website automatically. What I want to do is extend this so it also grabs transcripts from the YouTube links and sends them to other parts of my workflow.

This would let me do cool stuff like adding transcripts to my blog posts for better search rankings, using AI to turn the transcripts into different types of content, and storing everything in cloud folders automatically.

The problem is I can’t find a smooth way to connect YouTube videos to transcript generation to my automation workflows. Everything I’ve tried still needs manual steps or weird workarounds.

Anyone figured out how to do this? I’m open to any suggestions whether it’s automation platforms, AI services, browser tools, or even coding solutions. Just want to eliminate the boring repetitive work.

have u looked into using youtube’s api with google’s cloud speech-to-text? i set it up and it’s pretty slick - once you wrestle with the initial setup, it runs totally automated. just throw in your yt links and it does the audio extraction + transcription for ya!

Zapier works great for this - I’ve used it for 8 months. Set a trigger to watch for new YouTube videos, then auto-send the URL to Rev.com or Otter.ai for transcription. Everything runs in the background and drops finished transcripts into Google Drive or wherever you want them. Way more accurate than free options and handles different accents and crappy audio well. Costs about $1.25 per hour but saves me 3-4 hours weekly. Much easier setup than building your own API solution.

I had this exact problem six months ago. Ended up using Make.com (used to be Integromat) with AssemblyAI for transcripts. It grabs video URLs from wherever you store them, sends them to AssemblyAI which extracts audio and transcribes automatically, then dumps the results wherever you want. Takes 5-10 minutes for an hour of video and costs about $0.37 per hour. Took me a weekend to set up but now it’s completely hands-off. AssemblyAI’s API is way easier than Google’s speech services and works great for YouTube stuff.