Loading...

CONVOLUTIONAL OPERATORS IN THE TIME-FREQUENCY DOMAIN

Tahun : 2008
Pengarang : VINCENT LOSTANLEN
Penerbit : PSL RESEARCH UNIVERSITY PARIS
Ket : In the realm of machine listening, audio classification is the problem of automatically retrieving the source of a sound according to a predefined taxonomy. This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequency scattering, a representation which extracts modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-theart results in the classification of urban and environmental sounds, thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representation which combines wavelet convolutions along time, along logfrequency, and across octaves, thus following the geometry of the Shepard pitch spiral which makes one full turn at every octave. We study voiced sounds as a nonstationary source-filter model where both the source and the filter are transposed in frequency through time, and show that spiral scattering disentangles and linearizes these transpositions. In practice, spiral scattering reaches state-of-the-art results in musical instrument classification of solo recordings. Aside from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering transform, time-frequency scattering is able to capture the coherence of spectrotemporal patterns, such as those arising in bioacoustics or speech, up to a scale of about 500 ms. Based on this analysis-synthesis framework, an artistic collaboration with composer Florian Hecker has led to the creation of five computer music pieces.
Ketegori : OPERATIONAL RESEARCH

Download