Teleoperate the Face

This tutorial will walk you through mimicking your facial expressions on the PyLips face.

It might be unintuitive to programmatically author facial expressions. Luckily, people make facial expressions ALL the time, so we can leverage people’s natural expressions to make PyLips expressions. We used the package MediaPipe to track people’s facial expressions. Then, we can replay this expression on the PyLips face. This makes it easier to generate unique expressions.

In this tutorial, we will be using the MediaPipe library to track a person’s face and render their expression in real time.

Prior to beginning this tutorial, ensure that you have run python3 -m pylips.face.start to start the robot face. You may also need to install the MediaPipe library using python3 -m pip install MediaPipe.

The model for face detection needs to be downloaded to be used. We provide the version we use in this tutorial at this link. Alternatively, you can download it with this command:

# For users with wget (Linux)
wget https://github.com/interaction-lab/PyLips/raw/main/tutorials/models/face_landmarker_v2_with_blendshapes.task -O face_landmarker_v2_with_blendshapes.task
# For users with curl (Windows/Mac)
curl https://github.com/interaction-lab/PyLips/raw/main/tutorials/models/face_landmarker_v2_with_blendshapes.task -o face_landmarker_v2_with_blendshapes.task

First, we will import all the necessary libraries for this tutorial.

import mediapipe as mp
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

from pylips.speech import RobotFace

import numpy as np
import cv2

These are some utility functions provided by MediaPipe to draw the face detection on the video feed.

def cv2_imshow(image):
    cv2.imshow("picture",image)
    cv2.waitKey(0)

def draw_landmarks_on_image(rgb_image, detection_result):
face_landmarks_list = detection_result.face_landmarks
annotated_image = np.copy(rgb_image)

# Loop through the detected faces to visualize.
for idx in range(len(face_landmarks_list)):
    face_landmarks = face_landmarks_list[idx]

    # Draw the face landmarks.
    face_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
    face_landmarks_proto.landmark.extend([
    landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in face_landmarks
    ])

    solutions.drawing_utils.draw_landmarks(
        image=annotated_image,
        landmark_list=face_landmarks_proto,
        connections=mp.solutions.face_mesh.FACEMESH_TESSELATION,
        landmark_drawing_spec=None,
        connection_drawing_spec=mp.solutions.drawing_styles
        .get_default_face_mesh_tesselation_style())
    solutions.drawing_utils.draw_landmarks(
        image=annotated_image,
        landmark_list=face_landmarks_proto,
        connections=mp.solutions.face_mesh.FACEMESH_CONTOURS,
        landmark_drawing_spec=None,
        connection_drawing_spec=mp.solutions.drawing_styles
        .get_default_face_mesh_contours_style())
    solutions.drawing_utils.draw_landmarks(
        image=annotated_image,
        landmark_list=face_landmarks_proto,
        connections=mp.solutions.face_mesh.FACEMESH_IRISES,
        landmark_drawing_spec=None,
        connection_drawing_spec=mp.solutions.drawing_styles
        .get_default_face_mesh_iris_connections_style())

return annotated_image

MediaPipe has some small errors in the values it returns between the right and left side, but people usually use symmetric expressions. If the left and right side expressions slightly differ, we will even them out to appear more realistic.

def even_out_expression(expression, au, threshold):
    if abs(expression[f'AU{au}l'] - expression[f'AU{au}r']) < threshold:
            expression[f'AU{au}l'] = expression[f'AU{au}r']

    return expression

Initialize the PyLips face and the MediaPipe face landmarker model.

base_options = python.BaseOptions(model_asset_path='face_landmarker_v2_with_blendshapes.task')
options = vision.FaceLandmarkerOptions(base_options=base_options,
                                    output_face_blendshapes=True,
                                    output_facial_transformation_matrixes=True,
                                    num_faces=1)
detector = vision.FaceLandmarker.create_from_options(options)

face1 = RobotFace()

Next, we create a mapping between the blendshapes provided by MediaPipe and facial action units. We define a weight for each blendshape to better mirror facial expressions.

NAME2AUWEIGHT = {
    #brows
    'browInnerUp' : ['AU1', 3],
    'browDownLeft' : ['AU4l', 5],
    'browDownRight' : ['AU4r', 5],
    'browOuterUpLeft' : ['AU2l', 3],
    'browOuterUpRight' : ['AU2r', 3],
    # eyes
    'eyeBlinkLeft' : ['AU43l', 2],
    'eyeBlinkRight' : ['AU43r', 2],
    # mouth
    'jawOpen' : ['AU26', 5],
    'mouthStretchLeft' : ['AU27l', 5],
    'mouthStretchRight' : ['AU27r', 5],
    'mouthDimpleLeft' : ['AU14l', 1],
    'mouthDimpleRight' : ['AU14r', 1],
    'mouthPucker' : ['AU18', 1.75],
    'mouthPressLeft' : ['AU24l', 1],
    'mouthPressRight' : ['AU24r', 1],
    'mouthFrownLeft' : ['AU15l', 4],
    'mouthFrownRight' : ['AU15r', 4],
}

Lastly, we combine all these functions together to allow the camera to capture and track faces in real time. Every time we receive an image from a webcam, we follow these steps:

  1. Process image with MediaPipe

  2. Draw results for user

  3. Map from blendshapes to action units

  4. Make action units more symmetric

  5. Pass detected action units through a smoothing filter to reduce jittering

When you finish, select the camera window and press the q key to end the process.

cap = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()

    frame = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
    detection_result = detector.detect(frame)
    annotated_image = draw_landmarks_on_image(frame.numpy_view(), detection_result)


    try:
        expression = {}
        for face_blendshape in detection_result.face_blendshapes[0]:
            if face_blendshape.category_name in NAME2AUWEIGHT:
                au, weight = NAME2AUWEIGHT[face_blendshape.category_name]
                expression[au] = face_blendshape.score * weight

        expression = even_out_expression(expression, 43, .2)
        expression = even_out_expression(expression, 14, .2)
        expression = even_out_expression(expression, 24, .2)
        expression = even_out_expression(expression, 27, .2)
        expression = even_out_expression(expression, 4, .2)
        expression = even_out_expression(expression, 15, .2)

        # blend in the current reading with the previous facial expression
        try:
            for key in expression:
                true_expression[key] = .5 * true_expression[key] + .5 * expression[key]
        except Exception as e:
            print(e)
            true_expression = expression

        face1.express(true_expression, 25)

    except Exception as e:
        print(e)


    # Display the resulting frame
    cv2.imshow('frame', annotated_image)
    if cv2.waitKey(5) == ord('q'):
        break


cap.release()
cv2.destroyAllWindows()