GoTriple - Editorial. Special issue on Quantitative prosody modeling for natural speech descriptionand generation

Abstract

International audience Editorial In human communication through spoken language , segmental features serve a major role in the transmission of linguistic information of word meanings and, therefore, utterance contents. In contrast, prosodic features underlie higher-level information at the syntactic and discourse levels, and dominate the expression of attitudes, emotions , and affective information. Reflecting the importance of prosody in human verbal communication , a number of research projects are under way at academic and industrial research organizations throughout the world. Speech prosody covers a huge multidisciplinary area involving academics, scientists, and engineers with various research backgrounds, united by an interest in human communication. Prosodic features play an important role in speech and language research, and the current developments in speech technology call for further interdisciplinary work that builds on this diverse variety of inputs. This situation increased a desire amongst speech researchers to share their knowledge in the field of speech prosody, and led to the international conference on Speech Prosody held in April 2002 in Aix-en-Provence, France. Because of its great success, many researchers working in the domain of speech prosody looked forward to its sequels, and the second conference on Speech Prosody was held in March 2004 in Nara, Japan. One hundred and seventy three papers were presented in 6 oral and 9 poster sessions. Although they covered various topics on speech prosody, modeling prosody in a quantitative way and properly controlling it in speech communication systems such as speech synthesis and recognition were one of the major concerns for the conference attendees. As exemplified by many advances of prosody control in speech communication derived from generation modeling for conventional application uses, acoustic analyses with an underlying model and their quantitative descriptions have been quite important in realizing speech communication with natural prosody. Taking these situations into account, this special issue was organized. We carefully selected papers related to the topic of the special issue from those presented at Speech Prosody 2004, and asked the authors for extended versions. As a result, 16 papers are included in the issue on: analysis and modeling of prosody, control of prosody in speech synthesis, and use of prosody in speech recognition. The scope of the papers covers not only reading style utterances, but also emotional speech, singing voice and interaction with vision. In the first paper by Xu, prosody was viewed from two aspects: information transmitted by prosody, and acoustic manifestations of prosody. This view leads to a comprehensive model of tone and intonation, the PENTA model. These two aspects of prosody are often mixed up in other pros-ody modeling schemes, leading to a less clear understanding of speech prosody. Bänziger and Scherer analyzed how emotions influence prosodic features. They showed that the level and range of fundamental frequency (F 0) change systematically by the degree of emotion 0167-6393/$-see front matter Ó