Speech Data. To assess the labeling conventions of K-ToBI and to demonstrate that these conventions are applicable to various types of speech, we selected twenty utterances representing five different discourse types: TV drama, interview, news, text reading, and story reading. Four sentences were selected from each discourse type. These sentences contained a total of 153 words, and lasted a total of 78.5 seconds. 18 speakers (8 male and 10 female) produced the sentences. Table 1 shows a summary of the speech files: Twenty-one labelers, differing in their experience with intonation transcription and in their familiarity with the ToBI model, participated in the experiment. The labelers were divided into four groups: Group 1 (Experts), Group 2 (Familiar with K- ToBI), Group 3 (Familiar with the British intonation model, but new to K-ToBI and intonation transcription), and Group 4 (Beginners, completely new to any model of intonation or prosodic transcription). Each group included five labelers, except for Group 2 which had six labelers. The labelers came from four sites. Six of the labelers from Site A and all of the labelers from Site B and C were provided with 2-3 hours of lecture by the person in charge of each site during which the labeling conventions and background assumptions of K-ToBI were introduced. Site C had 4 hours of group discussion and a review session after the lecture. Two of the labelers from Site A and the one labeler from Site D performed their transcriptions based on the K-ToBI manual alone. Table 2 shows the distribution of labeler groups at each site. Sites # Experts (G1) # Familiar to K-ToBI (G2) # Familiar toother model (G3) # Beginners (G4) Discourse Types # of Utter- ances # of Words # of Speakers Total durat- ion (ms) interview 4 29 2 female 14,911 news 4 35 2 male, 2 female 16,869 reading 4 28 2 male, 2 female 15,849 story 4 30 2 male, 2 female 16,086 We selected 20 speech files from the data bank in Korea. The first author sent the following materials to the person in charge at each site: 1) the speech files in wave format, 2) the K-ToBI manual, version 3, together with the example sentence files mentioned in the manual, and 3) a Hangul file in which the sentences from each speech file were written in Hangul orthography with empty spaces below for writing tones and break indices. The wave files and the Hangul files for writing transcriptions were necessary because not all sites used the same speech analysis software (they ranged from xwaves to PitchWorks, CSL and Multispeech). Each labeler was provided with a copy of the K-ToBI manual and the Hangul file and was asked to use their own software to transcribe two tiers —the phonetic tone tier and the break index tier. We did not ask labelers to transcribe a phonological tone tier because the information in this tier, i.e. AP and IP boundary, can be extracted from the phonetic tone tier. Labelers were encouraged to discuss examples in the manual with others, but not the transcription sentences. After they completed the transcription, their Hangul files were collected and statistics for labeler agreement were applied to the data. Following the stringent metric for English ToBI evaluation [14, 12], inter-transcriber consistency was measured in terms of the number of transcriber pairs agreeing on the labeling of each particular word. As described in [12], “transcriber pair-word agreement is a stringent metric because when three of four transcribers agree on a label, agreement of that label is reported to be just 50% because only three of the six pairs drawn from the set of four transcribers agree”. There are a total of 32,130 pairs for comparison in our data —210 comparison pairs for each word (from 21 labelers) and a total of 153 words.
Appears in 1 contract
Sources: Labeler Agreement
Speech Data. To assess the labeling conventions of K-ToBI and to demonstrate that these conventions are applicable to various types of speech, we selected twenty utterances representing five different discourse types: TV drama, interview, news, text reading, and story reading. Four sentences were selected from each discourse type. These sentences contained a total of 153 words, and lasted a total of 78.5 seconds. 18 speakers (8 male and 10 female) produced the sentences. Table 1 shows a summary of the speech files: Twenty-one labelers, differing in their experience with intonation transcription and in their familiarity with the ToBI model, participated in the experiment. The labelers were divided into four groups: Group 1 (Experts), Group 2 (Familiar with K- ToBI), Group 3 (Familiar with the British intonation model, but new to K-ToBI and intonation transcription), and Group 4 (Beginners, completely new to any model of intonation or prosodic transcription). Each group included five labelers, except for Group 2 which had six labelers. The labelers came from four sites. Six of the labelers from Site A and all of the labelers from Site B and C were provided with 2-3 hours of lecture by the person in charge of each site during which the labeling conventions and background assumptions of K-ToBI were introduced. Site C had 4 hours of group discussion and a review session after the lecture. Two of the labelers from Site A and the one labeler from Site D performed their transcriptions based on the K-ToBI manual alone. Table 2 shows the distribution of labeler groups at each site. Sites # Experts (G1) # Familiar to K-ToBI (G2) # Familiar toother model (G3) # Beginners (G4) Discourse Types # of Utter- ances # of Words # of Speakers Total durat- ion (ms) drama 4 31 2 male, 2 female 14,841 interview 4 29 2 female 14,911 news 4 35 2 male, 2 female 16,869 reading 4 28 2 male, 2 female 15,849 story 4 30 2 male, 2 female 16,086 We selected 20 speech files from the data bank in Korea. The first author sent the following materials to the person in charge at each site: 1) the speech files in wave format, 2) the K-ToBI manual, version 3, together with the example sentence files mentioned in the manual, and 3) a Hangul file in which the sentences from each speech file were written in Hangul orthography with empty spaces below for writing tones and break indices. The wave files and the Hangul files for writing transcriptions were necessary because not all sites used the same speech analysis software (they ranged from xwaves to PitchWorks, CSL and Multispeech). Each labeler was provided with a copy of the K-ToBI manual and the Hangul file and was asked to use their own software to transcribe two tiers —the phonetic tone tier and the break index tier. We did not ask labelers to transcribe a phonological tone tier because the information in this tier, i.e. AP and IP boundary, can be extracted from the phonetic tone tier. Labelers were encouraged to discuss examples in the manual with others, but not the transcription sentences. After they completed the transcription, their Hangul files were collected and statistics for labeler agreement were applied to the data. Following the stringent metric for English ToBI evaluation [14, 12], inter-transcriber consistency was measured in terms of the number of transcriber pairs agreeing on the labeling of each particular word. As described in [12], “transcriber pair-word agreement is a stringent metric because when three of four transcribers agree on a label, agreement of that label is reported to be just 50% because only three of the six pairs drawn from the set of four transcribers agree”. There are a total of 32,130 pairs for comparison in our data —210 comparison pairs for each word (from 21 labelers) and a total of 153 words.
Appears in 1 contract
Sources: Labeler Agreement