Home
/
Blog
/
/
Research Blog

Automating Speech-to-Text: How to Transcribe Audio & Video with Azure Speech Services 

04 Jan 2025
5 min read
Automating Speech-to-Text

Introduction 

In today's digital landscape, businesses and content creators rely on speech-to-text technology for efficient transcription of audio and video files. Whether you're a developer, researcher, or media professional, automating speech transcription saves time and enhances productivity. 

This guide will walk you through building an automated transcription tool using Azure Cognitive Services’ Speech-to-Text API on Linux Ubuntu. By the end of this article, you’ll be able to: 

  • Convert video files to audio for transcription 
  • Normalize audio formats for better accuracy 
  • Leverage Azure Speech-to-Text API for precise transcriptions 
  • Automate the transcription process using Python on Ubuntu 
  • Optionally, run this workflow on an Azure Virtual Machine (VM) 

Why Automate Speech-to-Text Transcription? 

Manual transcription is time-consuming and prone to errors. Automating this process enhances efficiency, ensuring accurate and swift text conversion from multimedia content. Azure Speech Services provides robust AI-powered transcription capabilities, making it a preferred choice for businesses, podcasters, and professionals. 

To learn more about AI-powered development, check out our Custom Software Development Services. 

Prerequisites 

Before setting up the transcription tool, ensure you have: 

  • A Microsoft Azure account with Speech Services enabled 
  • Python 3 installed on Ubuntu 
  • FFmpeg for media file conversion 
  • Required Python libraries: azure-cognitiveservices-speech, moviepy, argparse 

Run the following commands to install dependencies: 

sudo apt update && sudo apt install ffmpeg -ypip install azure-cognitiveservices-speech moviepy argparse

Step 1: Setting Up Azure Speech Services 

  1. Create an Azure Account: Sign up at Azure Portal if you don’t have an account. 
  2. Set Up Speech Services: Navigate to Azure Speech Services, create a resource, select a pricing tier, and copy the API Key and Region from the Keys and Endpoint tab. 
  3. Configure the Speech SDK in Python: 
import azure.cognitiveservices.speech as speechsdkspeech_config = speechsdk.SpeechConfig(    subscription="YOUR_AZURE_SPEECH_KEY",    region="YOUR_AZURE_REGION")

Step 2: Writing the Python Script 

Handling Command-Line Arguments 

import argparseparser = argparse.ArgumentParser(description="Transcribe speech from video and audio files.")parser.add_argument("media_files", nargs="+", help="Paths to video/audio files")args = parser.parse_args()

Extract Audio from Video Files 

import subprocessdef extract_audio(video_file):    audio_file = f"{video_file.rsplit('.', 1)[0]}_audio.wav"    subprocess.run([        "ffmpeg", "-i", video_file, "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", audio_file, "-y"    ], check=True)    return audio_file

Convert Audio to the Required Format 

def convert_audio_to_wav(input_audio):    output_wav = input_audio.rsplit('.', 1)[0] + "_fixed.wav"    subprocess.run([        "ffmpeg", "-i", input_audio, "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", output_wav, "-y"    ], check=True)    return output_wav

Transcribe Audio Using Azure Speech-to-Text 

def transcribe_audio(audio_file, speech_config):    audio_config = speechsdk.audio.AudioConfig(filename=audio_file)    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)    result = speech_recognizer.recognize_once()    return result.text if result.reason == speechsdk.ResultReason.RecognizedSpeech else None

Save the Transcription 

import osdef save_transcription(text, filename):    os.makedirs("transcriptions", exist_ok=True)    with open(f"transcriptions/{filename}_transcription.txt", "w") as file:        file.write(text)

Step 3: Running the Script 

To transcribe an audio or video file, run: 

python transcribe.py video1.mp4 audio1.wav

This script will: 

  1. Extract audio from video (if applicable) 
  2. Convert the audio to the required format 
  3. Send it to Azure’s Speech-to-Text API 
  4. Save the transcribed text in the transcriptions/ folder 

Advanced Features & Future Enhancements 

This workflow can be expanded to support: 

  • Live speech transcription for real-time applications 
  • Multi-speaker recognition for differentiating voices 
  • Automatic translation for multilingual content 

Looking for expert mobile and web solutions? Explore our Mobile App Development Services. 

Conclusion

By leveraging Azure Cognitive Services, this automated speech-to-text transcription tool provides accurate, efficient, and scalable solutions for processing audio and video files. Whether you're handling podcasts, interviews, or business meetings, this approach saves time and ensures high-quality transcriptions. 

For complete source code, visit: GitHub Repository 

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

Flutter App Development Process Illustration
App Development
Mobile App Development
Flutter App Development: The Future of Cross-Platform Mobile Apps
03 Jan 2025
App Store Optimisation Techniques for Success
Mobile App Development
Unlocking the Secrets to App Store Success
04 Oct 2024
iOS App Development Tools
Mobile App Development
Top 5 iOS App Development Tools in 2024
25 May 2023
software development for business
App Development
Application Development Services
Mobile App Development
Updates
Top 5 Benefits of Custom Software Development for Businesses
21 Apr 2023
Artificial intelligence
The Future
Updates
ChatGPT Has a Serious Problem
20 Mar 2023
A side-by-side comparison of ChatGPT and DeepSeek AI models.
Artificial intelligence
Technology
ChatGPT vs DeepSeek | Who is Leading the AI Search Battle?
15 Feb 2023
App Development
Application Development Services
Design
The Future
Updates
Top 5 Mobile App Engagement & User Retention Techniques
30 Jan 2023
App Development
Application Development Services
Awards
The Manifest Features Jhavtech Studios as Melbourne’s Top Reviewed Developer for 2022
17 Nov 2022
App Development
Design
Web App Development
Web App Development Cost: Factors That Matter Most
12 Oct 2022
App Downloads
App Development
Application Development Services
Design
Mobile App Development
5 Fool-Proof Ways to Boost App Downloads By 40%
07 Sep 2022
App Development
Apple Product
Design
Updates
iOS 16: Everything You Need to Know
05 Jul 2022
App Development
Design
Mobile App Development
Web Development Trends of 2022 and Beyond
09 May 2022
App Development
Design
Mobile App Development
The Ultimate Guide for App Store Optimization
18 Apr 2022
Visual Representation of Metaverse App Features
App Development
Mobile App Development
App Development for the Metaverse in 2025: Creating Immersive Experiences
23 Mar 2022
Web App Development
Mobile App Development
iOS or Android: Which Platform Reigns Supreme?
09 Mar 2022
App Development
Application Development Services
Awards
Jhavtech Studios Named by Clutch as One of the Top 2022 Developers in Australia
15 Feb 2022
App Development
Mobile App Development
Understanding and Measuring Mobile App KPIs for Success in 2025
17 Jan 2022
App Development
Mobile App Development
.NET Core and .NET Framework: Key Differences
02 Dec 2021
https://www.jhavtech.com.au/angular-vs-angularjs-which-one-is-better-for-your-project/
App Development
Mobile App Development
Angular vs. AngularJS: Which One is Better for Your Project?
08 Nov 2021
Best PHP Frameworks for Web Development in 2024
Web App Development
Best PHP Frameworks in 2024
01 Aug 2021
App Development
Application Development Services
Crucial Factors that Affect Mobile App Development Cost
25 Jun 2021
Mobile App Development
Top Mobile App KPIs that Matter for 2021
18 Mar 2021
Mobile App Development
Role of Kiosks in the Post Covid-19 World
19 Oct 2020
Mobile App Development
Mobile App Design in a Nutshell
07 Sep 2020
Designing the perfect mobile app UI on a desktop screen
Mobile App Development
Mobile App Design: The Ultimate Comprehensive Guide
31 Aug 2020
App Development
Mobile Apps Are Now the Need of the Hour
07 Jul 2020
Adobe Flash
HTML5
Blended Learning - A New Era of Education
25 Apr 2020
Software Infrastructure Audit
Why You Need a Software Audit & How to Do It
15 Apr 2020
Neomorphism 2.0 in Mobile App Design for 2025
App Development
Top Mobile App Design Trends for 2025
22 Feb 2020
Kiosk Development
What is a Self Service Kiosk?
23 Oct 2019
Adobe Flash
HTML5
Why Convert Flash Games to HTML5?
08 Oct 2019
HTML5
What is HTML5?
10 Sep 2019
Adobe Flash
Why is Flash being put to rest?
11 Jan 2019
Idea Illustration
Do you have an Idea?
Let's start, we'll take it from here.
Circle Pink
Give us a ring
9AM to 5PM (AEDT)
Call (03) 9344 1619
Circle Pink
Decades of experience
into a 30 mins call
Book a Consultation
Consultation Form
Close Button
Select a service
Please fill in this field
Error text
Please fill in this field
Please fill in this field
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.