Understanding the predictability of gesture parameters from speech and their perceptual importance. Ylva Ferstl, Michael Neff, and Rachel McDonnell.Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion. Enriching Word Vectors with Subword Information. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov.In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents. Uttaran Bhattacharya, Nicholas Rewkowski, Abhishek Banerjee, Pooja Guhan, Aniket Bera, and Dinesh Manocha.In Proceedings of the 29th ACM International Conference on Multimedia. Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning. Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, and Dinesh Manocha.In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Generating coherent spontaneous speech and gesture from text. Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow.Style-controllable speech-driven gesture synthesis using normalising flows. Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow.The result indicates that the motion distribution of our generated gestures is much closer to the distribution of natural gestures. We also conduct an objective evaluation to compare our motion acceleration and jerk with two autoregressive baselines. ![]() Remarkably, our full-body entry receives the highest speech appropriateness (60.5% matched) among all submitted entries. The collective evaluation released by GENEA Challenge 2022 indicates that our two entries (FSH and USK) for the full body and upper body tracks statistically outperform the audio-driven and text-driven baselines on both two subjective metrics. We use the Tacotron2 architecture as our backbone with the locality-constraint attention mechanism that guides the decoder to learn the dependencies from the neighboring latent features. We formulate the gesture generation problem as a sequence-to-sequence conversion task with text, audio, and speaker identity as inputs and the body motion as the output. This paper describes the IVI Lab entry to the GENEA Challenge 2022.
0 Comments
Leave a Reply. |