This is supplemental material for The experimental state of mind in elicitation: illustrations from tonal fieldwork that describes how the fundamental frequency contours were extracted and plotted in the paper for the sections illustrating toneme discovery in Kiriri (Sections 2.2.1 and 2.3.3).

As a starting point, I assume that the raw audio has been processed as described in the tutorials Processing audio (with Praat) and Processing audio (with SoX), i.e. downsampled to 16 kHz, with Channel 1 (the consultant's channel) extracted.

Processing and segmenting of the soundfile was done with the free and open source sound file analysis software Praat, and f0 extraction was done using the RAPT algorithm get_f0 (Talkin 1995).

The procedure for processing the data for each elicitation session described in the paper is given in the following sections:

  1. 20111207-1-kiy-ap-wordlist and 20111207-1-kiy-ap-wordlist
  2. 20111208-6-kiy-ap-nps-vps
  3. 20111213-1-kiy-ap-framedwordlist

A references section follows.

All files used in analysis are available for download in the preparing-data-for-ldc-paper/ sub-directory in the tutorials/ directory.

20111207-1-kiy-ap-wordlist and 20111207-1-kiy-ap-wordlist

Data from the elicitation sessions 20111207-1-kiy-ap-wordlist and 20111207-1-kiy-ap-wordlist appears in the paper in Figure 5 in Section 2.2.1. All files used in analysis are available for download here. I detail the procedure for audio file segmenting and annotation and f0 extraction for only 20111207-1-kiy-ap-wordlist below. The procedure for 20111207-2-kiy-ap-wordlist was identical.

Initial audio file segmenting and annotation

First, I annotated the audio recording file 20111207-1-kiy-ap-wordlist.wav and produced the TextGrid 20111207-1-kiy-ap-wordlist.TextGrid, following procedures in the tutorial on annotating sound files.

Some of the annotation was automated. First, I marked empty intervals, i.e. intervals without recorded material of interest, with XXX, so that scripts would ignore those intervals. Then, using the text file 20111207-1-kiy-ap-wordlist-kiy.txt, I automatically populated TextGrid intervals with labels using a Praat script originally authored by Mietta Lennes, label_from_text_file. I then used another Lennes script, save_intervals_to_wav_sound_files, to save each of the intervals to separate, short WAV files, one per elicitation item. These files were automatically named with incrementing integers, e.g. 20111207-1-kiy-ap-wordlist-kiy-1 for item 1; the file name for each of these individual files refers to the item code, which can be found in the dictionary file. All of these files are in the tokens directory.

Finally, I trimmed each of these files with the assistance of a script trim_ends. For each file, I manually indicated boundaries to cut off silence at the beginning and end of the file; the script than moved these boundaries to the nearest zero crossing (where the audio amplitude was 0) and trimmed the files accordingly. These trimmed files are in the trimmed directory.

f0 extraction with RAPT (get_f0)

I performed f0 extraction with the following scripts:

  1. [script]
  2. [script]
  3. [script]

The shell script called the shell script to perform f0 extraction and then the Python script to aggregate the f0 data into a single file and to assign time stamps to f0 values.1

F0 extraction was done at 10ms increments with get_f0; The parameter file used for extraction was Pget_f0. Output files have .f0 file extensions and are located in the f0 directory. These files were converted into plain text files with pplain, producing output files with .f0.p file extensions, also located in the f0 directory. The aggregated f0 data produced by is espsData.txt, located here.

Plot of f0 contours in R

I used the free and open-source statistical software R for further data processing---in particular, Hadley Wickham's ggplot2 package for plotting.

The scripts for generating the plots are available as:

  1. plot-20111207-knitr.R: pure R code [file]
  2. plot-20111207-knitr.Rnw: Sweave file [file], with interspersed $\LaTeX$ code and R code, compiled with the R package knitr using the Makefile here
  3. plot-20111207-knitr.pdf: Output from Sweave file [file]


Data from the elicitation session 20111208-6-kiy-ap-nps-vps appears in the paper in Tables 2 and 3 and Figures 6 and 7 in Section 2.2.1, as well as in Table A.1 in the Appendix. All files used in analysis are available for download here.

The procedure for analyzing data for 20111208-6-kiy-ap-nps-vps follows the procedure for 20111207 detailed above. Files from analysis are:

  1. Dictionary file: 20111208-6-kiy-ap-nps-vps-hash.txt
  2. Original wav file: 20111208-6-kiy-ap-nps-vps.wav
  3. Textgrid: 20111208-6-kiy-ap-nps-vps.TextGrid.trim
  4. Individual item files: tokens/ directory
  5. Trimmed individual files: trimmed/ directory
  6. f0 output files: f0/ directory
  7. Code for plotting figures: [R code, Sweave, Makefile, pdf]


Data from the elicitation session 20111213-1-kiy-ap-framedwordlist appears in the paper in Tables 5-7 and Figures 9-11 in Section 2.3.3, as well as in Table A.2 in the Appendix. All files used in analysis are available for download here:

  1. Dictionary file: 20111213-1-kiy-ap-framedwordlist.txt
  2. Original wav file: 20111213-1-kiy-ap-framedwordlist.wav
  3. Textgrid file: 20111213-1-kiy-ap-framedwordlist.TextGrid
  4. Individual item files: tokens/ directory
  5. f0 output files: f0/ directory
  6. Code for plotting figures: [R code, Sweave, Makefile, pdf]

The analysis procedure for 20111213-1-kiy-ap-framedwordlist closely followed the procedures given above, except that no trimming occurred. Instead, timestamps for the beginning and end of the intervals corresponding to the first and second words in the disyllabic items were collected with a shell script here. This script extracted the timepoints marking the boundaries of the segmented words from the TextGrid file 20111213-1-kiy-ap-framedwordlist.TextGrid. The output from the script was word-timestamps.txt here. These timestamps were used to split extracted f0 contours into the f0 contour over word 1 and the f0 contour over word 2 in each disyllabic utterance in R.

Also, time-normalized f0 plots were produced: mean f0 calculated over each of 30 frames of equal duration over each word. The functions in the R code that performed this were extract.samples, extract.rapt.mean.f0.samp, and cast.f0. Warning: these functions are not written to work efficiently and I'm sure they could be improved!! See the R code files for details on the functions.


  1. Talkin, David. 1995. A robust algorithm for pitch tracking (RAPT). In Speech coding and synthesis, ed. W. B. Kleijn and K. K. Paliwal, 495–518. Elsevier Science Inc.

  1. There is a known round-off error for get_f0 which is not accounted for in the script, but the error happens to not occur for the sampling rate of the recorded file and the timestep for f0 extraction chosen here. 


comments powered by Disqus