Department of Electrical & Computer Engineering Signal and Image Laboratory (SaIL) The University of Arizona®

Past Research

Mid-Sagittal Tongue Detection and Tracking Using Ultrasound

Images

Student: José A. Rosado-Toro

One of the important problems today is the speech sound disorders that are a common impairment in children. Currently, most interventions for speech sound disorders rely heavily on auditory skills - i.e., people must listen to their own productions and modify them. Results have shown that the use of auditory skills in solving speech sound disorders in children has a low success rate. In order to solve this problem, researchers in linguistics are actively developing various speech production systems capable of identifying the mistakes caused during speech, so as to provide real-time visual feedback of tongue movements to analyze and perfect the tongue shape to acquire correct pronunciations and sounds. Also, knowledge about the vocal tract is useful for providing more accurate data for acoustic modeling of speech production, and for studying the mechanics of the vocal tract. In this regard, the shape and dynamics of the human tongue during speech are crucial in the analysis and modeling of the speech production system.

Ultrasound imaging seems to be one of the most well-suited and common medical imaging modalities used for acquiring sequences of tongue images during speech for several reasons. Ultrasound is non-invasive. We can collect tongue surface scans with a transducer held under the chin. With a compressible standoff, the tongue and jaw are free to make normal speech movements. Ultrasound is safe. It uses high-frequency (~5MHz) sound, which poses no danger to the subjects in repeated trials. Ultrasound imaging is fast. A main advantage of using ultrasound over other imaging modalities such as X-ray or MRI is that it is able to monitor the movement of the tongue due to its high frame-acquisition rate (~30 frames per second). Also, it does not expose the subject to ionizing radiation and can capture time-varying features in real time effectively.

Due to the high acquisition rate in ultrasound imaging, manual detection of tongue surfaces is extremely tedious and time consuming. Due to the noise and unrelated high-contrast edges and regions within the ultrasound images, automatic detection of the correct tongue surfaces is a challenging task. In addition, the tongue's ability to attain complex shape during speech makes it difficult to track as well.

In this research, we merged empirical properties of the tongue with machine learning algorithms in order to make a robust and efficient tongue detection/tracking system. We also investigated methods for localizing the region where the tongue appears within the ultrasound images. Also, we explored ways to develop a mathematical model capable of tracking the detected tongue surface across the various ultrasound image sequences.

The video sequence on the top left shows a typical sequence of ultrasound images captured during speech. The graph on the bottom right shows the detected tongue surface on multiple frames from this video sequence found using our algorithm.

This work was a collaborative effort with Prof. Diana Archangeli in the Dept. of Linguistics, College of Social and Behavioral Sciences, and with Prof. Ian Fasel in the Dept. of Computer Science, College of Science, University of Arizona.



 1230 E. Speedway Blvd., P.O. Box 210104, Tucson, AZ 85721-0104
 ©2014 All Rights Reserved.  
 Contact webmaster                                  
Think ECE!