Audio embedding

Audio features

Given audio fragment, thee following features are computed:

  • Frequency statistics: mean, median, first, third and inter quartile
  • Pitch statistics: mean, median, first, third and inter quartile
  • Chroma: short-term pitch profile
  • Linear Predictor Coefficients (LPC)
  • Line Spectral Frequency (LSF) coefficients
  • Mel-Frequency Cepstral Coefficients (MFCC)
  • Octave Band Signal Intensity (OBSI)
  • Spectral crest factor per band
  • Decrease: average spectral slope
  • Flatness: spectral flatness using the ratio between geometric and arithmetic mean
  • Flux: flux of spectrum between consecutives frames
  • Rolloff: frequency so that 99% of the energy is contained below
  • Variation: normalized correlation of spectrum between consecutive frames

Dimensionality reduction

Dimensionality reduction techniques reduce number of variables by projecting them to a lower-dimensional space. The aim in our case is to retain as much as possible of original information, while enjoying exploration of the data in much in familiar 2D space. We're looking at following methods: