Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement


  • Miharu Fuyuno Kyushu University
  • Rinko Komiya Kyushu Institute of Technology
  • Takeshi Saitoh Kyushu Institute of Technology


multimodal corpus, public speaking, EFL learners, utterance, non-verbal behaviour, Japan


Although multimodal corpus analysis has been widely practiced in the field of applied linguistics, few studies have investigated performance of English public speaking by EFL learners. Needs for effective public speaking are fundamental in the globalizing society; however, performing public speaking in English is challenging for EFL learners, and objective analysis on factors of eye contact and speech pauses still remain few though such information is crucial in efficient teaching. This study analyses public speaking performance by EFL learners based on data from a multimodal corpus. Data were collected in an annual speech contest at a Japanese high school. Speakers presented English speeches to an audience and judges. The data consist of video and digital audio recordings of performance, as well as speech scripts and evaluation scores by contest judges. Characteristics of speakers’ facial movement patterns in regard to spoken contents and the correlation between facial movements and eye movements were examined. Facial and eye movements were detected with motion tracking and the deep learning method. The results indicated that facial direction changes were not synchronized with speech pauses among highly evaluated speakers. Furthermore, the facial direction changes tended to be synchronized with content words in the spoken utterance rather than function words.

Author Biographies

  • Miharu Fuyuno, Kyushu University
    Miharu Fuyuno is an assistant professor of Faculty of Design, Kyushu University, Japan. She has an MA in TESOL from the University of Nottingham, England, and a Ph.D. in linguistics from Seinan Gakuin University, Japan. Her research field includes multimodal corpora, English language teaching and technology enhanced language learning.
  • Rinko Komiya, Kyushu Institute of Technology
    Rinko Komiya is a masters student at Kyushu Institute of Technology, Japan. She has a Bachelor of Information Engineering from Kyushu Institute of Technology, Japan. Her research field includes facial image processing.
  • Takeshi Saitoh, Kyushu Institute of Technology
    Takeshi Saitoh holds B.E., M.E., and Dr.E. degrees from Toyohashi University of Technology. He is an associate professor at the Faculty of Systems Design and Informatics, Kyushu Institute of Technology. His research interests include image processing and pattern recognition.







How to Cite

Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement. (2018). The Asian Journal of Applied Linguistics, 5(1), 117-129.