Dynamic 3-D Visualization of Vocal Tract Shaping During Speech
Y. Zhu, Y. Kim, M. Proctor, S. Narayanan, K. Nayak
IEEE Trans Med Imaging vol. 32, issue 5: 838-848, May 2013
The vocal tract is the universal human instrument, played with great dexterity and skill in the production of spoken language. Speech production research will be enhanced by methods for studying the shaping and dynamics of the vocal tract during speech, and characterizing the relationship between articulation and acoustics. 3D dynamic MRI with synchronized audio would be a major advance, as it would be safe, non-invasive, and provide 3D dynamic visualization of the entire vocal tract along with acoustic information. Current real-time MRI techniques are sufficient to capture 2D vocal tract motion (typically in a mid-sagittal slice), but do not meet the spatial and temporal resolution requirements for capturing 3D vocal tract dynamics.
We present a novel method for the creation of 3D dynamic movies of vocal tract shaping. Multiple parallel sagittal 2D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /aɹa/, /asa/ and /aʃa/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The temporally aligned image sequences enable creation of synthesized movies in arbitrary scan planes. This also enables dynamic 3D visualization of tissue surfaces and the vocal tract airway after manual segmentation of articulators and smoothing. The resulting volumes allow for dynamic 3D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.