CONVOLUTIONAL OPERATORS IN THE TIME-FREQUENCY DOMAIN
Tahun : 2008 Pengarang : VINCENT LOSTANLEN Penerbit : PSL RESEARCH UNIVERSITY PARIS Ket : In the realm of machine listening, audio classification is the problem
of automatically retrieving the source of a sound according to a predefined taxonomy. This dissertation addresses audio classification by
designing signal representations which satisfy appropriate invariants
while preserving inter-class variability. First, we study time-frequency
scattering, a representation which extracts modulations at various
scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-theart results in the classification of urban and environmental sounds,
thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representation which combines wavelet convolutions along time, along logfrequency, and across octaves, thus following the geometry of the
Shepard pitch spiral which makes one full turn at every octave. We
study voiced sounds as a nonstationary source-filter model where
both the source and the filter are transposed in frequency through
time, and show that spiral scattering disentangles and linearizes these
transpositions. In practice, spiral scattering reaches state-of-the-art results in musical instrument classification of solo recordings. Aside
from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering
transform, time-frequency scattering is able to capture the coherence
of spectrotemporal patterns, such as those arising in bioacoustics or
speech, up to a scale of about 500 ms. Based on this analysis-synthesis
framework, an artistic collaboration with composer Florian Hecker
has led to the creation of five computer music pieces. Ketegori : OPERATIONAL RESEARCH