3-D Tracking of MPEG-4 Facial Features in Stereo Videos

 

 

Given

1.    A model of the head with MPEG-4 facial features identified on it.

2.  Pose of the face of the speaker in the first video fame.

3.  A synchronized stereo video of the speaker

 

Required

Location of the facial features in the video frames, and tracking of the features in 3-D.

 

Approach

Deform the head model to match the face recovered in each video frame. Then, map features of the model to those of the face and locate the positions of the features in 3-D.

 

Subtasks

1.    Construct a generic head model with facial features marked on it.

2.  Construct the face of the speaker.

3.  Deform the generic head model to match the speaker's face.

4.  Calibrate the stereo camera system.

5.  Track the face of the speaker in 3-D.

 

A Four-Head Laser Range Scanner

As the generic model of the head, a bust of the Alexander the Great is used. This bust is scanned by a four-head laser range scanner.

   

Fig. 1. Different views of the  bust of the Alexander the Great used as our generic head model.

  

Fig. 2. The structure of our four-head laser range scanner (left). The structure of each head of the scanner (right).

Scanning Method

  

 

Fig. 3. One set of images obtained by the four  cameras simultaneously, recording the surrounds of the head.

Fig. 4. Reconstructed model of the Alexander the Great.

 

A Generic Model of the Head

 

Fig. 5. The generic head model shown in volumetric form (left) and surface form (right).

 

Selecting the MPEG-4 Feature Points

MPEG-4 feature points are carefully marked on the bust of the Alexander the Great with color stickers. By identifying the stickers in the captured texture image, corresponding 3-D coordinates on the head model are identified. Voxels of the generic model corresponding to the MPEG-4 features are set to 2.

Fig. 6. MPEG-4 facial features marked on the bust of the Alexander the Great .

 

Constructing a Model of the Speaker's Face

 

Fig. 7. An example face (left) and a volumetric model of the face (right).

 

Identifying MPEG-4 Features on the Speaker's Face

 

Matching of Deformable Objects

 

Stereo Camera Setup

Fig. 10. Organization of the stereo system.

 

 

Fig. 11. A raw image obtained by one of the cameras (left).  Image after recovering color (right).

 

Stereo Camera Calibration

Fig. 12. The set up used to calibrate the stereo camera system.

 

Determining the Pose of the Speaker's Face

 

Fig. 13. A laser line is swept over the speaker's face while capturing the face in stereo.

 

Fig. 14. Corresponding points on a laser line in a stereo pair.

Fig. 15. From stereo correspondence, 3-D coordinates of points on the laser lines are determined.

 

Determining the Pose and Facial Expression of the Speaker at each Video Frame