The role of time in phonetic spaces: Temporal resolution in Cantonese tone perception


The role of temporal resolution in speech perception (e.g. whether tones are parameterized with fundamental frequency sampled every 10 ms, or just twice in the syllable) is sometimes overlooked, and the temporal resolution relevant for tonal perception is still an open question. The choice of temporal resolution matters because how we understand the recognition, dispersion, and learning of phonetic categories is entirely predicated on what parameters we use to define the phonetic space that they lie in. Here, we present a tonal perception experiment in Cantonese where we used interrupted speech in trisyllabic stimuli to study the effect of temporal resolution on human tonal identification. We also performed acoustic classification of the stimuli with support vector machines. Our results show that just a few samples per syllable are enough for humans and machines to classify Cantonese tones with reasonable accuracy, without much difference in performance from having the full speech signal available. The confusion patterns and machine classification results suggest that loss of detailed information about the temporal alignment and shape of fundamental frequency contours was a major cause of decreasing accuracy as resolution decreased. Moreover, machine classification experiments show that for accurate identification of rising tones in Cantonese, it is crucial to extend the temporal window for sampling to the following syllable, due to peak delay.

Journal of Phonetics
Kristine M. Yu
Kristine M. Yu
Associate Professor

Linguist @ UMass Amherst