The project began as a 6-month pilot project in 1997, during which we concentrated on the prosody of discourse markers. A detailed empirical investigation was undertaken of the prosodic and lexical correlates of the cue word/discourse marker men 'but' in Swedish spontaneous speech. As is known (Hirschberg & Litman 1993, Schiffrin 1987), cue words constitute important information to the hearer as to how the speaker intends the following utterance to be related to what has preceded. It is thus of particular interest for speech understanding systems to be able to interpret cue words. A problem associated with these markers, however, is that, since the same word, e.g. men can have both a 'discourse' function (introducing a new topic ) as well as a 'sentential' function (contrasting two or more clause-like utterances within a topic-unit), one must have recourse to other information (e.g. prosodic and lexical) in order to decide which function the word has.
In the empirical investigation conducted using spontaneous monologue data, it was observed that several cues interacted to signal the difference between men's sentiential function and its discourse function (see Horne et al. 1998, Horne et al. (1999)). In brief, it was noted that, as far as associated prosodic correlates are concerned, 'strong' discourse men-tokens (i.e. tokens labelled as discourse by all 4 labellers) were characterized by 1) a duration that was significantly longer than sentential men as well as with 2) a larger F0-reset than that associated with sentential men. Further, 3) 'strong' discourse men tokens, but not 'strong' sentential men tokens (i.e. labelled as sentential by all 4 labellers) were observed to constitute a separate prosodic phrase (34% of the discourse men tokens).
As regards co-occurring lexical cues, it was seen that 'strong' discourse men tokens were often (63% of the tokens) followied by other discourse markers, whereas strong sentential men tokens were instead often (62% of the tokens) followed by pronouns.
A test using a neural network trained using strong tokens was seen to be able to correctly categorize 90% of the strong men-tokens as to their associated boundary-type (topic-shift vs topic-internal). The results show that cue words along with their prosodic correlates and co-occurring lexical items constitute a constellation of important information for understanding how segmentation of spoken discourse is produced and understood. The results from this study are interesting for the development of speech recognition systems as well as speech synthesis. Discourse boundary recognition can be facilitated if one has recourse both to segmental and suprasegmental cues and synthesis of the same boundaries will also be more natural if one can synthesize them together with natural prosodic correlates and co-occurring lexical items. A complete description of the study is to be found in Horne, Hansson, Bruce, Frid & Filipsson (2001).
Within the framework of the project, the results of the spontaneous monologue data have also been compared with dialogue data. In an initial study, Hansson (1998) made a detailed analysis of men and så 'so' as well as a number of other markers in the dialogue data (ja/jo 'yes', nej 'no', och 'and', då 'then', sedan 'then'). As regards men, Hansson showed that it patterns somewhat differently in the travel dialogue data in comparison with the spontaneous monologues. In contast to the monologue data where about half of the 'strong' men tokens were classified as discourse and half as sentential, in the dialogue data, most of the tokens of men were classified as discourse-related. Menís discourse functions in dialogue seem, moreover, to be more varied than in monologues. In addition to introducing new topics, men in dialogues also introduces Explain and Query-YN-moves (Carletta et al. 1997). Further, due to the relatively short duration of the turns in the travel dialogue data, most of the tokens of men occurred in turn-initial position in contrast to the situation in the monologues, where there was no turn-taking. Thus, it would seem to be the turn-initial position and perhaps not menís prosodic characteristics (duration, associated F0 reset) that provided the most salient identifying cue to its discourse functions in dialogue. Other acoustic cues (e.g. vowel reduction), however, should be examined in future studies on discourse markers in order to get a more complete picture of their sound structure. More detailed results on the study of discourse markers in dialogue are presented in Hansson (1999a-c).
A third study dealing with boundary marking in spontaneous speech concerned the constraints on prosodic phrasing in spontaneous speech. In Hansson (2001), an attempt is made to determine whether empirical evidence can be found for theoretical claims made about a number of universal constraints on prosodic phrasing.