ΘΕΜΑ: prosklhsh dhmosia parousiash Didaktorikhs Diatrivhs_
Tzagarakis Christos_1/7/2014_ 17:00_ K206_UoC- oral defense of PhD by
Tzagarakis Christos, Tuesday 1 July, 17-19 (GR) ΑΠΟΣΤΟΛΕΑΣ: Gramateia
Metaptyxiakou CSD [mailto:pgram@xxxxxxxxxx] Παρακαλούμε δείτε το σχετικό αρχείο: http://news.uoc.gr/news/2014/23-06/Diatrivh.Tzagarakis.Christos.pdf
Πρόσκληση σε Δημόσια Παρουσίαση της Διδακτορικής Διατριβής του κ. Τζαγκαράκη Χρήστου Την Τρίτη, 1 Ιουλίου 2014 και ώρα
17:00 στην αίθουσα Τηλεδιάσκεψης Κ206 του Τμήματος Επιστήμης Υπολογιστών
του Πανεπιστημίου Κρήτης στο Ηράκλειο, θα γίνει η δημόσια παρουσίαση και
υποστήριξη της Διδακτορικής Διατριβής του υποψηφίου διδάκτορος του Τμήματος
Επιστήμης Υπολογιστών κ. Τζαγκαράκη με θέμα: “ Τεχνικές Αραιής και
Χαμηλής Τάξης Αναπαράστασης για Εύρωστη Αναγνώριση Ομιλητή και Ανακατασκευή
Ελλιπών Χαρακτηριστικών” “Sparse and Low-Rank Techniques for Robust Speaker
Recognition and Missing-Features Reconstruction” ΠΕΡΙΛΗΨΗ Η αναγνώριση ομιλητή αποτελεί τη διαδικασία
της αυτόματης αναγνώρισης του ατόμου που μιλάει, με βάση κάποια χαρακτηριστικά
που εξάγονται από το σήμα φωνής. Ένα ευρύ φάσμα εφαρμογών έχει ως πυρήνα του
την αναγνώριση ομιλητή, όπου συνήθως η παρουσία περιβαλλοντικού θορύβου στο
σήμα φωνής δυσκολεύει την εξαγωγή σωστών εκτιμήσεων. Ένας επιπρόσθετος παράγοντας
που συμβάλει στη δυσκολία σωστής αναγνώρισης αποτελεί η περιορισμένη ποσότητα
δεδομένων εκπαίδευσης και δεδομένων αξιολόγησης.
Abstract Speaker
recognition is the process of recognizing a speaker automatically, based
on specific features extracted from the speech signal. A broad range of
applications exploits at its core the process of speaker recognition, where
usually the presence of environmental noise in the speech signal impedes the
inference of correct decisions. An additional factor, which contributes to the
difficulty of recognizing a speaker correctly, is the limited amount of
available training and evaluation data. Focusing on
overcoming the above limitations, this dissertation is divided in two main
parts. In the first part, the problem of speaker recognition is reduced in an
equivalent classification problem. To this end, we develop and study the
performance of classification techniques, which are based on the framework of sparse representations, where we focus on
the task of speaker identification by employing highly limited amounts of
training and evaluation data, in environments with high levels of noise. The
main assumption that governs these techniques is that the identified speech
signal, and specifically the features that have been extracted from this
signal, can be expressed as a sparse linear combination in terms of the columns
of an overcomplete matrix, which is often referred in the literature with the
term “dictionary”. The optimally estimated sparse weights of the
linear combinations, the so-called sparse
codes, which are obtained as the solutions of an optimization
problem, are then employed for the final identification of the speaker based on
a minimum reconstruction error criterion. Extending the
previous classification method based on sparse representations, we study the
efficiency of a method for discriminative
dictionary learning. This method estimates jointly the dictionary
comprising of the training data in conjunction with an appropriate linear
classifier. The advantage of this approach is that it results in sparse codes,
which are characterized by enhanced discriminative capability. Extensive
comparisons with probabilistic models, which are based on the hypothesis that
the extracted speech features follow a generalized Gaussian distribution, as
well as with some of the state-of-the-art classification methods, such as
Gaussian mixture models and joint factor analysis, revealed the superiority of
the proposed method. The second part of
this dissertation focuses on the use of low-rank
techniques as a powerful tool for extracting reliable features from
a speech signal. More specifically, a technique for recovering a low-rank
matrix is designed, which is employed for the reconstruction of those spectral
regions of a speech signal, which are unreliable due to the presence of noise.
The reconstruction of the unreliable spectral regions is performed by adopting
the Singular Value Thresholding (SVT) algorithm, based on the assumption that
the logarithmic magnitude representation of a speech signal in the
time-frequency domain, obtained via the short-time Fourier transform (STFT), is
of low rank. The comparison against the widely used method of sparse
imputation, which is based on sparse representations, reveals the superiority of
our proposed approach in terms of producing more reliable features. Finally, we
propose an extension of the matrix completion method, which exploits the prior
knowledge that the data matrix is low rank, as well as the knowledge that the
data can be represented efficiently in terms of a dictionary. In particular, we
proposed an algorithm for joint low-rank representation and matrix completion
(J-SVT). J-SVT is superior when compared with the standard SVT with respect to
the computation of the low-rank representation of a data matrix in terms of a
given dictionary, by employing a small number of observations from the original
matrix. Through extensive simulations, we observed an improvement of the
reconstruction error achieved by the J-SVT, in contrast to the typical SVT, for
several distinct experimental scenarios.
Παναγιώτης Τραχανιάς Πρόεδρος Τμήμα
Επιστήμης Υπολογιστών -- Postgraduate Secretariat Computer Science Department Voutes University Campus Heraklion, Crete GR-70013, Greece tel: + 30 2810 393592, 393504 fax:+ 30 2810 393804 e-mail: pgram@xxxxxxxxxx Url: http://www.csd.uoc.gr |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature