Experimenting with MS Text to Speech (TTS) I built a simple dialog that speaks a greeting when a button is pressed. I wish to incorporate an avatar that mouths the spoken words. Without getting into all the details of how to match the phenomes with the facial movements, I just included an avatar that moves it's mouth.

I cannot figure out how to coordinate the speech and the avatar movement. My first attempt was to try to run them both together, but that doesnt work -- whichever one runs first must complete before the other runs. My second attempt was to interleave a spoken word with a change in bitmap image -- that sort of works but is very slow and jerky. (see attached app). Perhaps a better approach would be to use multithreading and run speech and avatar moves each as separate worker threads. Sounds good, but I do not know how to do that. And after spending many hours reading books and scouring the web for information on Win API and MFC multithreading, I am more confused than ever.

I would appreciate your thoughts on the problem and how you might go about trying to solve it. Thanks.