

Audio embedding

Audio features

Given audio fragment, thee following features are computed:

Frequency statistics: mean, median, first, third and inter quartile
Pitch statistics: mean, median, first, third and inter quartile
Chroma: short-term pitch profile
Linear Predictor Coefficients (LPC)
Line Spectral Frequency (LSF) coefficients
Mel-Frequency Cepstral Coefficients (MFCC)
Octave Band Signal Intensity (OBSI)
Spectral crest factor per band
Decrease: average spectral slope
Flatness: spectral flatness using the ratio between geometric and arithmetic mean
Flux: flux of spectrum between consecutives frames
Rolloff: frequency so that 99% of the energy is contained below
Variation: normalized correlation of spectrum between consecutive frames

Dimensionality reduction

Dimensionality reduction techniques reduce number of variables by projecting them to a lower-dimensional space. The aim in our case is to retain as much as possible of original information, while enjoying exploration of the data in much in familiar 2D space. We're looking at following methods: