Speech to Text and Text to Speech – Speech recognition in Python


Speech to Text functionality is a highly requested feature in various software. With Python and two free libraries it's possible.
di Antonio Lamorgese


When we hear about Speech To Text (STT) and Text To Speech (TTS) we are always dealing with a speech recognition system. That is, a real speech recognition tool, this time, made in Python with very few lines of code. Voice recognition systems are a highly requested feature in various types of software. Very often, they are used to automate applications in contexts where the user is required to provide specific and repetitive indications. For example, in call centers, to implement automatic dictation systems, which allow you to dictate entire speeches to the computer.

Implementing a speech recognition feature from scratch can be a very complex undertaking. Fortunately, thanks to Python and three free libraries, SpeechRecognition, pyttsx3 and pyaudio , all this can be done in a very short time and with very few lines of code.

In this guide, you will learn how to implement a speech recognition system from scratch. That is, a speech to text ( Speech To Text – STT ) and Text To Speech (TTS ) conversion tool.


READ MORE: how to turn on text to speech


1. Install the required Python modules

As in any Python project, to implement particular features to a software, it is necessary to install libraries or modules. The libraries in question are SpeechRecognition, pyttsx3 and pyaudio . To use them, therefore, you must first install them on your computer. Installation is super easy, just run this code from command prompt or Linux terminal:

pip install speechrecognition
pip install pyaudio
pip install pyttsx3

1.1 Speech To Text (STT) – From Spoken Voice to Text

Specifically, the SpeechRecognition library will allow you to convert the spoken voice into text. In the example proposed in this guide, for the conversion, we will use the Google API. However, the library supports a considerable number of APIs used for speech recognition. In the examples that can be downloaded from the PyPi home page, it is possible to search for the SpeechRecognition library and try, by copying and pasting the code, inside IDLE, the visual interface provided with Python, the code provided by the developer of the library.


speech to text e text to speech

For example, with this code, you can immediately see how to convert your spoken voice to text format and store the conversion in a variable:


import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# recognize speech using Google Speech Recognition
try:
    # for testing purposes, we're just using the default API key
    # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
    # instead of `r.recognize_google(audio)`
    print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))



In this short video tutorial you can see how to make a Speech To Text converter with speechrecognition and the very used API, for speech recognition, from Google ( Google text-to-speech API ).


Python speech to text converter

1.2 Text To Speech (TTS) – From Text to Speech

AIn the same way as what you did with the speechrecognition library you can do the opposite. That is, by copying and pasting this code, you can convert a text into a spoken voice. The library to use is pyttsx3, also available on the PyPi portal:


import pyttsx3
engine = pyttsx3.init() # object creation

# RATE
rate = engine.getProperty('rate')   # getting details of current speaking rate
print (rate)                        # printing current voice rate
engine.setProperty('rate', 125)     # setting up new voice rate


# VOLUME
volume = engine.getProperty('volume')   # getting to know current volume level (min=0 and max=1)
print (volume)                          # printing current volume level
engine.setProperty('volume',1.0)        # setting up volume level  between 0 and 1

# VOICE
voices = engine.getProperty('voices')       # getting details of current voice
#engine.setProperty('voice', voices[0].id)  # changing index, changes voices. o for male
engine.setProperty('voice', voices[1].id)   # changing index, changes voices. 1 for female

engine.say("Hello World!")
engine.say('My current speaking rate is ' + str(rate))
engine.runAndWait()
engine.stop()

# Saving Voice to a file
# On linux make sure that 'espeak' and 'ffmpeg' are installed
engine.save_to_file('Hello World', 'test.mp3')
engine.runAndWait()

Also for pyttsx3 it is good to follow this short video tutorial where you will be able to appreciate how easily you can make a Text To Speech converter with the pyttsx3 library .Anche per pyttsx3 è bene seguire questo breve video tutorial dove avrai modo di apprezzare con quanta semplicità puoi realizzare un convertitore Text To Speech con la libreria pyttsx3.


Python text to speech
Antonio Lamorgese

Network administrator and developer. After years of experience in the industry, I designed a MYSQL data management system in PHP without writing a single line of code. Find out more....