№1, 2015


Lyudmila V. Sukhostat

Among the existing methods used for speaker recognition, only a few can work in the case of non-linear and non-stationary speech signals. Pitch period is one of the most important features for speaker characterization. This paper presents a method for pitch period detection of nonlinear and non-stationary speech signals based on empirical wavelet transform. Experiments show high relative efficiency of the proposed approach for different noise levels. (pp. 33-41)

Keywords: pitch period, empirical wavelet transform, Teager-Kaiser energy operator, intrinsic mode function, instantaneous frequency
  • Rabiner L.A., Cheng M.J., Rosenberg A.E., McGonegal C.A. A comparative performance study of several pitch detection algorithms // IEEE Trans. on Acoust., Speech and Signal Proc., 1976, no.5, pp.399–417.
  • Tan L.N., Alwan A. Multi-band summary correlogram-based pitch detection for noisy speech // Speech Communication, 2013, vol.55, no.78, pp.841–856.
  • Ba H., Yang N. BaNa: a hybrid approach for noise resilient pitch detection // IEEE Statistical Signal Processing Workshop, 2012, pp.369–372.
  • De Cheveigne A., Kawahara H. Yin, a fundamental frequency estimator for speech and music // J. Acoust. Soc. Am., 2002, vol.111, no.4, pp.1917–1930.
  • Kasi K., Zahorian S.A. Yet another algorithm for pitch tracking / Proc. of the ICASSP, 2002, pp.361–364.
  • Camacho A. SWIPE: a sawtooth waveform inspired pitch estimator for speech and music. Ph.D. dissertation. Florida, 2007, 116 p.
  • Gonzalez S., Brookes M. A pitch estimation filter robust to high levels of noise (PEFAC) / Proc. of EUSIPCO, 2011, pp. 451–455.
  • Boashash B. Estimating and interpreting the instantaneous frequency of a signal // Proc. IEEE, 1992, vol.80, no.4., pp.520–568.
  • Maragos P., Kaiser J.F., Quatieri T.F. On amplitude and frequency demodulation using energy operators // IEEE Trans. on Signal Processing, 1993, vol.41, no.4, pp.1532–1550.
  • Abe T., Kobayashi T., Imai S. Harmonics tracking and pitch extraction based on instantaneous frequency / Proc. of ICASSP, 1995, vol.1, pp.756–759.
  • Abe T., Honda M. Sinusoidal model based on instantaneous frequency attractors // IEEE Trans. on Audio, Speech and Language Processing, 2006, vol.14, no.4, pp.1292–1300.
  • Azarov E., Petrovsky A., Parfieniuk M. Estimation of the instantaneous harmonic parameters of speech / Proc. of EUSIPCO, 2008, pp.1–5.
  • Huang N.E., Shen Z., Long S.R., Wu M.L., Shih H.H., Zheng Q., Yen N.C., Tung C.C., Liu H.H. The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis // Proc. Roy. Soc. London A, 1998, vol.545, pp.903–995.
  • Gilles J. Empirical Wavelet Transform // IEEE Transactions on Signal Processing, 2013, vol.61, no.16, pp.3999–4010.
  • Vakman D. On the analytic signal, the Teager–Kaiser energy algorithm, and other methods for defining amplitude and frequency // IEEE Trans. on Signal Process., 1996, vol.44, no.4, pp.791–797.
  • Chu W., Alwan A. Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend / Proc. of ICASSP, 2009, pp.3969–3972.
  • Varga A., Steeneken H.J. Assessment for automatic speech recognition: II. Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems // Speech Communication, 1993, vol.12, no.3, pp.247–251.
  • Drugman T., Alwan A. Joint robust voicing detection and pitch estimation based on residual harmonics / Proc. of Interspeech, 2011, pp.1973–1976.
  • Azarov E., Vashkevich M., Petrovsky A. Instantaneous pitch estimation based on RAPT framework / Proc. of EUSIPCO, 2012, pp.2787–2791.