Control Your Computer With Your Voice

Use the speech recognition library to control your PC. From opening apps to shutting your PC down.

Control Your Computer With Your Voice
Photo by Chris Lynch / Unsplash

Requirements

To be able to follow this tutorial you'll need Python installed on your computer. If you are on Linux, your distribution probably has Python by default. If you don't have it, you can install it (and pip), on Linux and macOS with the following commands:

$ sudo apt-get install python3.9
$ sudo apt-get install python3-pip

For windows, you can download the installer or use the new Microsoft package manager Winget. ( I really recommend you to install it, and the new terminal)

After you successfully install python, you'll need to install some libraries with pip.

  • Linux and Mac
$ pip3 install speechrecognition
$ pip3 install pyaudio
  • Windows:
$ pip install speechrecognition
$ pip install pyaudio

If you encounter problems installing pyaudio on windows, try using pipwin.

$ pip install pipwin
$ pipwin install pyaudio

The Code

Now that we have everything that we need, it's time to start coding. Create a python file (app.py) and copy the following code.

# Import the necessary libraries
import speech_recognition as sr
import os

This snippet of code import the necessary modules and libraries into the code. ( When we import something followed by "as x" we use "x" to refer to that library in the code)

Next, we will create a function that will translate our voice into text.

# Voice to text function
def voice_to_text():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)
    # Speech recognition using Google Speech Recognition
    try:
        command = r.recognize_google(audio)
        print("You said: " + command)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
    return command.lower()

First, we create a recognizer object from the speech recognition library.

# Voice to text function
def voice_to_text():
    r = sr.Recognizer()

Then we use a Microphone object as the source

# Voice to text function
def voice_to_text():
    r = sr.Recognizer()
    with sr.Microphone() as source:

Then we print some messages to the terminal to let the user know that the program is listening and with the recognizer object we listen using the microphone object ( You can use an audio file if you don't want to use your microphone).

# Voice to text function
def voice_to_text():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)

Now, we are going to use the Google speech recognition API to translate our voice into text. ( The speech recognition library is just a wrapper around various API's)

# Voice to text function
def voice_to_text():
    # Record Audio
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)
    # Speech recognition using Google Speech Recognition
    try:
        command = r.recognize_google(audio)
        print("You said: " + command)

We need to wrap the API call inside a try and except because it can return an error. We are going to handle those errors now and return the text that the API generated.

# Voice to text function
def voice_to_text():
    # Record Audio
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)
    # Speech recognition using Google Speech Recognition
    try:
        command = r.recognize_google(audio)
        print("You said: " + command)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
    return command.lower()

Now we are going to create the main function and call the voice_to_text function inside of it.

def main():
    command = voice_to_text()

after that, you can create your own commands. Here's an example:

def main():
    command = voice_to_text()
    # commands for windows 10 
    if command == "open gmail":
        os.system("start chrome https://mail.google.com/mail/u/0/#inbox")
    elif command == "open youtube":
        os.system("start chrome https://www.youtube.com/")
    elif command == "open google":
        os.system("start chrome https://www.google.com/")
    elif command == "open facebook":
        os.system("start chrome https://www.facebook.com/")
    elif command == "open twitter":
        os.system("start chrome https://twitter.com/")
    elif command == "open instagram":
        os.system("start chrome https://www.instagram.com/")
    elif command == "open linkedin":
        os.system("start chrome https://www.linkedin.com/")
    elif command == "open github":
        os.system("start chrome https://www.github.com/")
    elif command == "open stackoverflow":
        os.system("start chrome https://stackoverflow.com/")
    elif command == "open reddit":
        os.system("start chrome https://reddit.com/")
    elif command == "shutdown":
        os.system("shutdown /s /t 1")

You can change these commands to ones you like, or simply play with this and create something cool. After you create your main function don't forget to call it.

if __name__ == '__main__':
    main()

That's it. This is a very simple implementation, you can play with it and create something cool.