
The mel-frequency cepstrum(MFC) is popular in sound processing. Mel-Frequency cepstral coefficients (MFCC) are coefficients that collectively make up an MFC. As the name suggests, MFC consists of two main parts: Mel frequency and cepstrum.
Mel Frequency Analysis
Mel-Frequency analysis is based on the finding of human perception experiments that human ear acts as filters, which is more sensitive to sound in low frequency than high frequency.

The mel-frequency function maps the linear frequency space, which is collected by audio devices, to a nonlinear space closer to human perception.
$$mel(f) = 2595 times log_{10}{(1+ frac{f}{700})}$$

cepstrum analysis
Suppose the speech spectrum as below:

The arrows point out the dominent frequencies in the spectrum, which carry the identity of the sound. These peaks make up an smooth curve, or envelope, of the spectrum, and the spectrum can be seperated as two parts, envolope and detail.
Denote the total spectrum is X[k], the envolope is H[k] and detail is E[k]. We have (why??)
$$X[k] =H[k] times E[k]$$
Take the logarithms both sides:
$$log{X[k]} = log{H[k]}+ log{E[k]}$$
The figures of these functions are shown below:

Our goal is to extract the envolope from the spectrum. As is shown in charts above, if we treat the spectrum as a wave, the envelope corresponds to the low frequency part and details corresponds to high frequency part. So we take the FFT of the spectrum, which is named as inverse FFT(IFFT). Then we have the spectrum of spectrum :
$$x[k]=h[k]+e[k]$$
To get the envolope, we just put it through a low pass filter and transfer back to a spectrum.
summary
The total progress is as follows:
- transfer the original audio to spectrum by FFT
- map it to Mel-frequency scale
- get logarithm of the spectrum X[k]
- compute the 'spectrum of spectrum' x[k] by IFFT
- filter the high frequency part and transfer it back to a spectrum
For speech synthesis task, we can get the MFCCs of two speech segments and join at the point that minimize the Euclidean distance of them.
For speech recognization, MFCC are used as good features to feed to recognition models, such as nerual networks.
see also cmu slides 03 mfcc




近期评论