::install_github('rpuggaardrode/sauceshelf') devtools
PraatSauce workflow and parameters
Basic workflow
When you download praatsauce
, all source code is in the src
directory. This directory contains the files params.csv
, praatsauce.praat
, and a whole bunch of other files with Praat scripts that are called by praatsauce.praat
. Unlike in previous versions of PraatSauce, you don’t set parameters directly in Praat, but instead by editing the params.csv
file, which you can do in any text editor or in a spreadsheet editor like Excel if you prefer. When you run praatsauce.praat
, a form will appear asking you only for the location of your parameters file. We think this is advantageous in terms of reproducibility, as you can simply share this parameters file when sharing the code for a study.
This set-up also means that calling PraatSauce from the command line is very simple:
<praat-location> <praatsauce-location> <params-location>
where <praat-location>
is the location of your Praat executable, <praatsauce-location>
is the path to praatsauce.praat
, and <params-location>
is the path to your parameters file. If Praat is on your PATH
and you are running the code directly from the src
directory, it could look like this:
praat praatsauce.praat params.csv
Parameters
The provided params.csv
file has three “columns”:
variable
the name of the parameterinput
the value for this parametercomment
a brief description of the parameter
Here, we provide a little more detail about each parameter.
inputDir
is the directory where your sound files are located.outputDir
is the directory where your output file should be saved. The default value is.
, i.e. the same directory as wherepraatsauce.praat
is located.outputFile
is the name of the output file. Default isout.tsv
.channel
is the channel you want to analyze. Default is1
, i.e. the first channel. If your sound files have multiple channels, fx if the audio is stereo or you have an EGG channel, one of them is extracted and used for processing. You can input0
if you want to use both channels of a stereo file.intervalEquidistant
If measures should be taken at equidistant intervals, how many measures should be taken? This parameter is only relevant if TextGrids are used to determine where to take measures in a sound file. Default is0
, in which case measures are taken at fixed intervals.intervalFixed
How often should measures be taken (in seconds)? Default is0.005
, i.e. every 5 ms. This is more coarse than the VoiceSauce default of taking measures every 1 ms. This should be set to0
if you are taking equidistant measures; eitherintervalEquidistant
orintervalFixed
should be set to0
, otherwise you will see an error message.pitch
Should pitch measures be reported in the output file? Default is1
, set to0
if you are not interested in pitch. Note that pitch will typically be measured anyways, as most of the other measures rely on pitch.formant
Should formant measures be reported in the output file? Default is1
, set to0
if you are not interested in formants. Note that formants will often be measured anyways, as many of the other measures rely on formants.harmonicAmplitude
Should corrected harmonic amplitude measures (H1*, H2*, H4*, A1*, A2*, A3*, H2K, H5K) be reported in the output file? Default is1
, set to0
if you are not interested in corrected harmonic amplitudes. Note that corrected harmonic amplitudes are calculated anyways if you want spectral slope measures.harmonicAmplitudeUncorrected
Should uncorrected harmonic amplitude measures (H1, H2, H4, A1, A2, A3) be reported in the output file? Default is1
, set to0
if you are not interested in uncorrected harmonic amplitudes. Note that uncorrected harmonic amplitudes are calculated anyways if you want corrected harmonic amplitudes.bw
Should formant bandwidths be reported? Default is1
, set to0
if you are not interested in bandwidths. Note that bandwidths will be measured/estimated anyways if you want corrected harmonic amplitudes.bwHawksMiller
Should formant bandwidths be estimated using the Hawks–Miller formula (Hawks & Miller 1995)? Default is1
, set to0
if you want to use Praat’s empirical bandwidth measures for harmonic amplitude corrections.slope
Should corrected spectral slope measures (H1*-H2*, H2*-H4*, H1*-A1*, H1*-A2*, H1*-A3*, H2K-H5K) be reported? Default is1
, set to0
if you are not interested in corrected spectral slope measures.slopeUncorrected
Should corrected spectral slope measures (H1-H2, H2-H4, H1-A1, H1-A2, H1-A3) be reported? Default is1
, set to0
if you are not interested in uncorrected spectral slope measures.cpp
Should ceptral peak prominence (CPP) measures be reported? Default is1
, set to0
if you are not interested in CPP.hnr
Should harmonics-to-noise ratio (HNR) measures be reported? Default is1
, set to0
if you are not interested in HNR.intensity
Should intensity (root-mean-squared amplitude) measures be reported? Default is1
, set to0
if you are not interested in intensity.soe
Should strength of (harmonic) excitation (SoE) measures be reported? Default is1
, set to0
if you are not interested in SoE.resample16khz
Should audio be resampled to 16 kHz? Default is0
, set to1
if you want resampling. This may speed up the processing of some measures, and is otherwise unlikely to influence the results much.windowLength
Length of the analysis window for taking formant measurements and generating spectrograms in seconds. Default is0.025
, i.e. 25 ms.f0min
Pitch floor, minimum frequency in Hz to look for when estimating pitch. Default is50
. Note that the output may have some pitch measures belowf0min
if Praat’s algorithm for removing octave jumps is used.f0max
Pitch ceiling, maximum frequency in Hz. to look for when estimating pitch. Default is300
. Note that the output may have some pitch measures abovef0max
if Praat’s algorithm for removing octave jumps is used.pitchMethod
Which algorithm should be used for measuring pitch? The default value is0
, in which case the filtered autocorrelation method is used; alternatives are1
for raw crosscorrelation and2
for raw autocorrelation.pitchWindowShape
Which analysis window shape should be used for pitch estimation. The default value is0
, in which case a Hanning window is used; the alternative is1
, in which case a Gaussian window is used. This controls theVery accurate
setting in Praat’s pitch tracking options windows; Gaussian windows are considered “very accurate”, but are not used by default, as they are slower. In practice, the difference between the two settings appears to be marginal.pitchMaxNoCandidates
How many pitch candidates should be estimated? Default is15
.silenceThreshold
What should be the silence threshold when measuring pitch? Default is0.01
, frames that do not contain amplitude values above this threshold relative to the global maximum are considered silent and thus not tracked.voicingThreshold
Fractional strength of pitch candidates in the autocorrelation or crosscorrelation function required for a frame to be considered voice. The default is0
, in which case the value will be the Praat default for the chosenpitchMethod
, i.e.0.5
when filtered autocorrelation is used and0.55
when raw crosscorrelation is used. With a highervoicingThreshold
, more pitch candidates will be considered silent.octaveCost
How much should high frequency pitch candidates be favored in terms of fractional strength of candidates in the autocorrelation or crosscorrelation function per octave; higher values will favor high pitch candidates more. The default is0
, in which case the value will be the Praat default for the chosenpitchMethod
, i.e.0.055
when filtered autocorrelation is used and0.01
when raw crosscorrelation is used.octaveJumpCost
How much should sudden pitch changes be disfavored in terms of fractional strenth of candidates in the autocorrelation or crosscorrelation function. Default is0.35
; increasing this value will reduce the number of octave jumps.voicedUnvoicedCost
How much should transitions from voiced to voiceless frames be disfavored in terms of fractional strength of candidates in the autocorrelation or crosscorrelation function. The default is0.14
; increasing this value will reduce the number of transitions.killOctaveJumps
Should Praat’s algorithm for removing octave jumps be applied? Default is0
, set to1
if you want to use this algorithm. Note that this can have unwanted and unexpected consequences if analyzing full sound files instead of individual tokens.maxNumFormants
How many formants should be estimated? Default is5
. In any case, only three formants are reported.preEmphFrom
Pre-emphasize from which frequency when estimating formants? Default is50
.f1ref
Reference F1 value used for formant tracking. Default is500
.f2ref
Reference F2 value used for formant tracking. Default is1500
.f3ref
Reference F3 value used for formant tracking. Default is2500
.maxFormantHz
Frequency ceiling when tracking formants. Default is5000
.pitchSynchronous
Should spectral slope measures be taken using pitch synchronous window length, i.e. with a window length corresponding to 3 glottal pulses? Default is0
, in which case window length is constant; set to1
to get pitch synchronous window length. This will be somewhat slower if analyzing individual tokens and somewhat faster if analyzing full sound files. The difference appears to be marginal.cppTrendType
Which type of regression model should be used to calculate CPP? Default is0
, in which case an exponential decay model is used; change to1
to use linear regression.cppFast
How should the regression model used to calculate CPP be fitted? Default is1
, which corresponds to incomplete Thiel regression; the alternative is0
, which uses Thiel’s robust line fit, which is more robust but significantly slower.pitchSave
Should pitch files be written to disk? Default is0
, change to1
if you want to store the resulting pitch files.pitchSaveDir
If pitch files are written to disk, where should they be saved?pitchRead
Should existing pitch files be used? Default is0
, change to1
if you have existing pitch files which you want to use.pitchSaveDir
If existing pitch files should be used, where are they saved?formantSave
Should formant files be written to disk? Default is0
, change to1
if you want to store the resulting formant files.formantSaveDir
If formant files are written to disk, where should they be saved?formantRead
Should existing formant files be used? Default is0
, change to1
if you have existing formant files which you want to use.formantSaveDir
If existing formant files should be used, where are they saved?useTextGrid
Should TextGrids be used to determine where in a sound file to take measures? Default is0
, i.e. measures are taken over the entire sound file. Set to1
if you wish to take measures only from select parts of sound files based on information in TextGrids.tgDir
is the directory where your TextGrids are located (ifuseTextGrid
is set to1
).filelist
Path to a text file which specifies which sound files to analyze. Should be a plain text file where each line is a relative path to a sound file. These paths are relative toinputDir
. By default this argument is ignored, replace0
with a path if you want to use it.intervalTier
. If using a TextGrid, which is the interval tier of interest? Default is1
.includeTheseLabels
. If using a TextGrid, the code will search for intervals with these labels. Should be a well-formed regex – the default is^()!\s*$
which refers to any interval that isn’t empty.
Output
The output of PraatSauce is a tab-separated text file which maximally looks something like this:
file label t f0 F1 F2 F3 H1c H2c H4c A1c A2c A3c H2Ku H5Ku H1u H2u H4u...
1.wav b 0.621 0 1798.185 2299.12 3370.494 0 0 0 -42.67060159300069 -37.532...
1.wav b 0.626 0 1798.185 2299.12 3370.494 0 0 0 -42.67060159300069 -37.5...
It includes a column with a file name, a column with an interval label if TextGrids are used to determine intervals to analyze, a column giving the time stamp where the measures is taken, and columns for each measure at that time.
Details
Pitch (\(F_0\)) can be measured either 1) using Praat’s filtered autocorrelation method, which implements Boersma (1993)‘s autocorrelation method after low-pass filtering the signal, 1) the ’raw’ autocorrelation method which does not low-pass filter the signal or 3) the raw crosscorrelation which implements a similar routine based on crosscorrelating the signal without applying a low-pass filter. As discussed here, filtered autocorrelation is currently considered the state-of-the-art method for measuring intonation in Praat, while raw crosscorrelation is considered state of the art particularly for Praat’s internal voice analysis tools. Praatsauce users can toggle all the usual parameters, which mostly follow the Praat defaults, except for the “silence threshold” parameter which is by default significantly reduced in Praatsauce for better performance near consonant boundaries and in sound files with relatively low volume, and the floor and ceiling parameters which are set to a rather more narrow range in Praatsauce. Good pitch floor and ceiling settings are important and these settings have big downstream consequences on many of the measures estimated by Praatsauce. We recommend setting them on a speaker-by-speaker basis. The Kill octave jumps
routine is optionally run on the resulting Pitch
object. Praatsauce has the option to save the Pitch
objects to disk, allowing users to potentially hand-edit them and then using these edited objects to re-run Praatsauce. See get_pitch.praat
.
Formants (\(F_1\), \(F_2\), \(F_3\)) are measured using Praat’s implementation of Burg’s (1975) algorithm. The Track
routine, which implements a Viterbi algorithm explained here, is run on the resulting Formant
object with user-specified formant reference values. See get_formants.praat
. As with \(F_0\), poor formant tracking will have important downstream consequences for other measures, and Praatsauce has the option to save the Formant
objects to disk, allowing users to potentially hand-edit them and use the edited objects to re-run Praatsauce (although this is rather less intuitive than is the case for pitch). When possible, it is probably a good idea to set vowel-specific formant reference values, although we recognize that this is often not feasible.
Formant bandwidths (\(B_1\), \(B_2\), \(B_3\)) are either empirical bandwidths returned by the formant tracking process outlined above (see get_formants.praat
), or bandwidths estimated from formants scaled by pitch according to the procedure outlined by Hawks & Miller (1995) (see get_bwHawksMiller.praat
). Our experience is that the estimation of harmonic amplitudes is rather less stable when using Praat’s empirical bandwidths.
Harmonic amplitudes are calculated by generating long-term spectra (by default narrowband spectra from 25 ms Gaussian windows) with a frequency ceiling of 5500 Hz for each time step with a valid pitch measure. By default, the spectra are narrowband spectra from 25 ms Gaussian windows, but it is also possible to determine the window length pitch-synchronously to correspond to three glottal cycles. \(H_1\), the amplitude of the first harmonic, is determined as the energy peak within \(F_0\pm\frac{1}{10}F_0\) Hz, where \(F_0\) is the fundamental frequency measure returned for any given time step; \(H_2\), the amplitude of the second harmonic, is the energy peak within \(2F_0\pm\frac{1}{10}F_0\) Hz; and \(H_4\), the amplitude of the fourth harmonic, is the energy peak within \(4F_0\pm\frac{1}{10}F_0\) Hz. \(A_1\), the amplitude of the harmonic nearest \(F_1\), is determined as the energy peak within \(F1\pm\frac{1}{5}F1\) Hz, \(A_2\) is determined as the energy peak within \(F2\pm\frac{1}{10}F2\) Hz, and \(A_3\) is determined as the energy peak within \(F3\pm\frac{1}{10}F3\) Hz, where \(F_n\) is the \(n\)th formant measure returned for any given time step. \(H_{2K}\), the amplitude of the harmonic nearest 2000 Hz, is determined as the energy peak within \(2000\pm F_0\) Hz, and \(H_{5K}\), the amplitude of the harmonic nearest 5000 Hz, is determined as the energy peak within \(5000\pm F_0\) Hz. See get_spectralMeasures.praat
. \(H_1^*\), \(H_2^*\), \(H_4^*\), \(A_1^*\), and \(A_2^*\) are corrected for the influence of \(F_1\) and \(F_2\) following the formula given by Iseli, Shue & Alwan (2007), while \(A_3^*\) is corrected for the influence of \(F_1\), \(F_2\), and \(F_3\) (see correctIseli.praat
). \(H_{2K}\) and \(H_{5K}\) are not corrected.
Spectral slope measures are straightforward: \(H_1^*-H_2^*\) is the difference between \(H_1^*\) and \(H_2^*\) etc. PraatSauce returns the standard corrected measures \(H_1^*-H_2^*\), \(H_2^*-H_4^*\), \(H_1^*-A_1^*\), \(H_1^*-A_2^*\), \(H_1^*-A_3^*\), and the uncorrected measures \(H_1-H_2\), \(H_2-H_4\), \(H_1-A_1\), \(H_1-A_2\), \(H_1-A_3\), and \(H_{2K}-H_{5K}\). It is not totally straightforward how to implement correction for \(H_{2K}-H_{5K}\), which is why we don’t do it, but note that other uncorrected slope measures are only returned to facilitate comparison – it is certainly not recommended to use them for analysis. See get_spectralMeasures.praat
.
Ceptral peak prominence measures are calculated by generating power cepstra with a frequency ceiling of 5000 Hz for each time step, fitting regression lines to each cepstrum (either linear or exponential), and determining how much the cepstral peak deviates from the regression line. This is the standard procedure of Praat’s Get peak prominence
routine. Note that Hillenbrand, Cleveland & Erickson (1994) and VoiceSauce (Shue et al. 2011) determine the peak prominence by comparing with a linear regression line, but the default in Praat and Praatsauce is to fit an exponential regression; the theoretical arguments for this are laid out here here, but we are not aware of any studies that explicitly compare the suitability of using linear versus exponential regression for computing CPP. See get_CPP.praat
.
Harmonics-to-noise ratio measures are calculated using Praat’s cross-correlation method (a version of Boersma (1993)’s algorithm) which estimates the relative height of the cross-correlation peak (theoretically corresponding to the amplitude of the fundamental frequency). Harmonics-to-noise ratios are taken from different low pass filtered versions of the sound file: \(HNR_{500}\), \(HNR_{1500}\), \(HNR_{2500}\), and \(HNR_{3500}\) use low pass filters of 500, 1500, 2500, and 3500 Hz, respectively. See get_HNR.praat
.
Intensity measures, corresponding to the moving root-mean-squared amplitude, are calculated using Praat’s To Intensity
procedure. See get_intensity.praat
.
Strength of excitation measures are calculated by downsampling the speech signal, taking the difference of this signal \(x[n] = s[n] - s[n-1]\), and passing it through a cascade of two zero-frequency filters of the form
\[ y_i[n] = - \sum_{k=1}^2 a_k y_i [n-k] + x[n] \]
At each step, we apply a trend removal procedure by subtracting the moving average of a window of size \(N\) corresponding to three average pitch cycles from the filtered signal, giving the output
\[ y_2[n] = y_1[n] - \frac{1}{2N+1} \sum_{m=-N}^N y_1[n + m] \]
We find the positive-to-negative zero crossings in the output signal using Praat’s To PointProcess (zeros)
routine, and calculate the slope in the immediate vicinity of these zero crossings. This mostly follows the procedure described by Yegnanarayana & Murty (2009) and Mittal, Yegnanarayana & Bhaskararao (2014) and implemented in VoiceSauce (Shue et al. 2011), although the version here is somewhat simplified, as the moving average window size \(N\) used for trend removal is determined globally from the interval or sound file rather than locally based on pitch. This does not appear to have a major influence on the results, but this should be tested further. See get_SoE.praat
.
The time domain of PraatSauce output looks a little goofy. In order to speed up the processing, we did our best to ‘vectorize’ all the operations in PraatSauce by using Praat’s tabulating functions instead of e.g. querying the pitch value for each time step in a loop, as was done in the previous version of PraatSauce. However, tabulating means that the time series for each measure will differ – the start and end point of a table will depend on the window size used for calculating the measure, which is different for each measure, and is further rounded in some esoteric fashion that is difficult to complete figure out. We solve this by taking whichever measure has the most time points as the standards, and padding other time series with zeros, see zeroPadding.praat
.
A note on smoothing: VoiceSauce has a smoothing parameter, and by default smooths harmonic amplitudes and \(F_0\). Apparently spectral slope measures are computed using the smoothed harmonic amplitudes, and the spectral slope measures are then smoothed again. PraatSauce does not do any smoothing – we leave it up to the user how or if they choose to smoothe the measures. This means that PraatSauce output will look a lot more jagged than VoiceSauce output (partially by design).
Workflow in R
You can use whichever analysis pipeline you prefer, but if you do your analysis in R, note that we have a few convenience tools available as part of an in-development R library called sauceshelf
. sauceshelf
can be downloaded like so:
There is a function available, praatsauce()
, which generates a parameters file using the arguments passed on, runs the PraatSauce scripts, and reads in the resulting output as a data frame. praatsauce()
The function can be a little finicky for various reasons, but has been tested on both Windows and UNIX systems. Feel free to reach out if you can’t get it to work.
Other functions available in sauceshelf
are
make_params()
, which will generate a valid parameters file to use with PraatSauceload_sauce()
, which will read PraatSauce output as a well-formatted data framesauce2ssff()
which converts a data frame with PraatSauce output into SSFF files if you are working with the EMU-SDMS infrastructure.
sauceshelf
also provides various R-internal ways to estimate the measures returned by Praatsauce, and a framework for mixing and matching different sources, i.e. to estimate pitch and formants using Praat, and harmonic amplitudes and spectral slope using R based on the results from Praat. One method which we’ve tentatively found to work well and to speed up processing is to estimate harmonic amplitudes, spectral slope, and SoE in R and all other measures in Praat.
Citation information
If you use PraatSauce in your research, you can cite the current version as follows:
Kirby, James & Rasmus Puggaard-Rode (2025) PraatSauce. Praat-based tools for spectral processing. Version 1.0.0. Available from GitHub: kirbyj.github.io/praatsauce