mel-frequency cepstral coefficient

The mel-frequency cepstrum(MFC) is popular in sound processing. Mel-Frequency cepstral coefficients (MFCC) are coefficients that collectively make up an MFC. As the name suggests, MFC consists of two main parts: Mel frequency and cepstrum.

Mel Frequency Analysis

Mel-Frequency analysis is based on the finding of human perception experiments that human ear acts as filters, which is more sensitive to sound in low frequency than high frequency.

mfcc slides from cmu

The mel-frequency function maps the linear frequency space, which is collected by audio devices, to a nonlinear space closer to human perception.

$$mel(f) = 2595 times log_{10}{(1+ frac{f}{700})}$$

function chart

cepstrum analysis

Suppose the speech spectrum as below:

The arrows point out the dominent frequencies in the spectrum, which carry the identity of the sound. These peaks make up an smooth curve, or envelope, of the spectrum, and the spectrum can be seperated as two parts, envolope and detail.

Denote the total spectrum is X[k], the envolope is H[k] and detail is E[k]. We have (why??)

$$X[k] =H[k] times E[k]$$

Take the logarithms both sides:

$$log{X[k]} = log{H[k]}+ log{E[k]}$$

The figures of these functions are shown below:

Our goal is to extract the envolope from the spectrum. As is shown in charts above, if we treat the spectrum as a wave, the envelope corresponds to the low frequency part and details corresponds to high frequency part. So we take the FFT of the spectrum, which is named as inverse FFT(IFFT). Then we have the spectrum of spectrum :

$$x[k]=h[k]+e[k]$$

To get the envolope, we just put it through a low pass filter and transfer back to a spectrum.

summary

The total progress is as follows:

transfer the original audio to spectrum by FFT
map it to Mel-frequency scale
get logarithm of the spectrum X[k]
compute the 'spectrum of spectrum' x[k] by IFFT
filter the high frequency part and transfer it back to a spectrum

For speech synthesis task, we can get the MFCCs of two speech segments and join at the point that minimize the Euclidean distance of them.

For speech recognization, MFCC are used as good features to feed to recognition models, such as nerual networks.

mel-frequency cepstral coefficient

Mel Frequency Analysis

cepstrum analysis

summary

近期文章

近期评论

标签

热门

文章归档

分类目录

功能