AZERBAIJAN NATIONAL ACADEMY OF SCIENCES
AN APPROACH TO PITCH PERIOD DETECTION OF SPEECH SIGNAL BASED ON ADAPTED WAVELETS (rus.)
Lyudmila V. Sukhostat

Among the existing methods used for speaker recognition, only a few can work in the case of non-linear and non-stationary speech signals. Pitch period is one of the most important features for speaker characterization. This paper presents a method for pitch period detection of nonlinear and non-stationary speech signals based on empirical wavelet transform. Experiments show high relative efficiency of the proposed approach for different noise levels. (pp. 33-41)

Keywords: pitch period, empirical wavelet transform, Teager-Kaiser energy operator, intrinsic mode function, instantaneous frequency
References
  • Rabiner L.A., Cheng M.J., Rosenberg A.E., McGonegal C.A. A comparative performance study of several pitch detection algorithms // IEEE Trans. on Acoust., Speech and Signal Proc., 1976, no.5, pp.399–417.
  • Tan L.N., Alwan A. Multi-band summary correlogram-based pitch detection for noisy speech // Speech Communication, 2013, vol.55, no.78, pp.841–856.
  • Ba H., Yang N. BaNa: a hybrid approach for noise resilient pitch detection // IEEE Statistical Signal Processing Workshop, 2012, pp.369–372.
  • De Cheveigne A., Kawahara H. Yin, a fundamental frequency estimator for speech and music // J. Acoust. Soc. Am., 2002, vol.111, no.4, pp.1917–1930.
  • Kasi K., Zahorian S.A. Yet another algorithm for pitch tracking / Proc. of the ICASSP, 2002, pp.361–364.
  • Camacho A. SWIPE: a sawtooth waveform inspired pitch estimator for speech and music. Ph.D. dissertation. Florida, 2007, 116 p.
  • Gonzalez S., Brookes M. A pitch estimation filter robust to high levels of noise (PEFAC) / Proc. of EUSIPCO, 2011, pp. 451–455.
  • Boashash B. Estimating and interpreting the instantaneous frequency of a signal // Proc. IEEE, 1992, vol.80, no.4., pp.520–568.
  • Maragos P., Kaiser J.F., Quatieri T.F. On amplitude and frequency demodulation using energy operators // IEEE Trans. on Signal Processing, 1993, vol.41, no.4, pp.1532–1550.
  • Abe T., Kobayashi T., Imai S. Harmonics tracking and pitch extraction based on instantaneous frequency / Proc. of ICASSP, 1995, vol.1, pp.756–759.
  • Abe T., Honda M. Sinusoidal model based on instantaneous frequency attractors // IEEE Trans. on Audio, Speech and Language Processing, 2006, vol.14, no.4, pp.1292–1300.
  • Azarov E., Petrovsky A., Parfieniuk M. Estimation of the instantaneous harmonic parameters of speech / Proc. of EUSIPCO, 2008, pp.1–5.
  • Huang N.E., Shen Z., Long S.R., Wu M.L., Shih H.H., Zheng Q., Yen N.C., Tung C.C., Liu H.H. The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis // Proc. Roy. Soc. London A, 1998, vol.545, pp.903–995.
  • Gilles J. Empirical Wavelet Transform // IEEE Transactions on Signal Processing, 2013, vol.61, no.16, pp.3999–4010.
  • Vakman D. On the analytic signal, the Teager–Kaiser energy algorithm, and other methods for defining amplitude and frequency // IEEE Trans. on Signal Process., 1996, vol.44, no.4, pp.791–797.
  • Chu W., Alwan A. Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend / Proc. of ICASSP, 2009, pp.3969–3972.
  • Varga A., Steeneken H.J. Assessment for automatic speech recognition: II. Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems // Speech Communication, 1993, vol.12, no.3, pp.247–251.
  • Drugman T., Alwan A. Joint robust voicing detection and pitch estimation based on residual harmonics / Proc. of Interspeech, 2011, pp.1973–1976.
  • Azarov E., Vashkevich M., Petrovsky A. Instantaneous pitch estimation based on RAPT framework / Proc. of EUSIPCO, 2012, pp.2787–2791.