Repeat After Me =================================== This tutorial will walk you through manually generating visemes by recording an audio file and then playing it back using the robot face. One critique of text-to-speech systems is that they can sound robotic. One way to improve the naturalness of the speech is to use actual human speech. Human speech can reflect a wide variety of emotions and intentions that are difficult to capture in plain text. If you are using PyLips for an interaction with mostly pre-recorded speech, you can record your own voice (or hire a voice actor) to record the phrases you need in your interaction. In this tutorial, we will be using the ``sounddevice`` and ``soundfile`` libraries to record a 3 second audio clip. We will then use the ``allosaurus`` library to recognize the phonemes in the audio file. Finally, we will use the ``RobotFace`` class to play back the audio file and display the visemes on the robot face. Prior to beginning this tutorial, ensure that you have run ``python3 -m pylips.face.start`` to start the robot face. You may also need to install the ``sounddevice`` and ``soundfile`` libraries using ``python3 -m pip install sounddevice soundfile``. ``allosaurus`` is included in the PyLips requirements, so you should not need to install it separately. First, we will import all the necessary libraries for this tutorial. .. code-block:: python import sounddevice as sd import soundfile as sf import pickle from pylips.speech import RobotFace from pylips.speech.system_tts import IPA2VISEME from allosaurus.app import read_recognizer Next, we will set up some parameters we will use later. To change the behavior of this script, you can experiment with different values for ``duration`` to change the length of the recorded audio. You may also need to modify the ``sd.default.samplerate`` and ``sd.default.channels`` variables to match the audio input of your microphone. .. code-block:: python # sound recording parameters duration = 3 # seconds sd.default.samplerate = 44100 sd.default.channels = 1 # load allosaurus for phoneme recognition phoneme_model = read_recognizer() # create robot face object for speaking robot = RobotFace() Next, we use the ``sounddevice`` library to record an audio clip and save the audio clip to a file in the ``pylips_phrases`` directory, which is automatically created when the pylips face is instantiated. .. code-block:: python #record myrecording = sd.rec(int(duration * sd.default.samplerate)) print( "Recording Audio") sd.wait() sf.write('pylips_phrases/parroted.wav', myrecording, sd.default.samplerate) Next, we use the ``allosaurus`` library to recognize the phonemes in the audio file. We then convert the phonemes to visemes using the ``IPA2VISEME`` dictionary, and save the result in the expected format for PyLips. .. code-block:: python out = phoneme_model.recognize('pylips_phrases/parroted.wav', timestamp=True, lang_id='eng') times = [i.split(' ')[0] for i in out.split('\n')] visemes = [IPA2VISEME[i.split(' ')[-1]] for i in out.split('\n')] times.append(len(myrecording)/sd.default.samplerate + 0.2) visemes.append('IDLE') pickle.dump((times, visemes), open(f'pylips_phrases/parroted.pkl', 'wb')) Finally, we use the ``RobotFace`` class to play back the audio file and display the visemes on the robot face. We use the existing ``say_file`` method to play the files we created in the previous step. .. code-block:: python robot.say_file('parroted') robot.wait() You are done! You can now run the script and record your own voice to play back on the robot face.