Research Area

Robust speech detection 

Robust speech detection(also known as Voice activity detection) is a technique used in speech processing wherein the presence or absence of human speech is detected in regions of audio (which may also contain music, noise, or other sound) . The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech segments: it can avoid unnecessary coding/transmission of silence packets in VOIP, saving on computation and on network bandwidth. When noise-free speech is acquired, a proper threshold set in the signal level allows relatively easy detection of the speech period. However, real speech is distorted by background noise such as computer-fans, air-conditioners and many other environment sounds, especially in distant-talking situations. Inaccurate detection of the speech period causes serious problems such as degradation of recognition performance and deterioration of speech quality. It is, therefore, highly desirable to develop a robust and reliable VAD method. 

Speech recognition

Speech recognition refers to the process of translating the input speech signal obtained from a mic or telephone into a word or a sentence. These recognition results can be used to command or control a system, or they can be used as an input to a system which understands speech. As a result, speech recognition technology has enabled human beings to communicate more naturally with computers and it has become an essential technology in everyday life. Speech recognition systems can be categorized into speaker dependent systems, speaker independent systems, and speaker adaptive systems. They can be categorized into isolated word recognition systems, keyword recognition systems, connected word recognition systems, and continuous speech recognition systems, according to input utterance types. Also, there are word recognizers which recognize speech by units of word and variable vacabulary word recognizers which recognize speech by units of phone. Nowadays speech recognition has become a part of our lives and there are so many applications where speech recognition may be used such as speech controlled machines and computers, various speech guidance systems, car navigation systems, and home automation systems.

Speaker recognition 

Speaker, or voice, recognition is a biometric modality that uses an individual's voice for recognition purpose. (It is a different technology than "speech recognition", which recognizes words as they are articulated, which is not a biometric.) The speaker recognition process relies on features influenced by both the physical structure of an individual's vocal tract and the behavioral characteristics of the individual. 

Speaker adaptation

In speech recognition, there exist some cases when the environment that the recognition system is used differs much from the environment that it was trained. Speaker adaptation is a technology to compensate for the mismatch of speaker. In order to do speaker adaptation, the models of a speech recognition system are adapted using MAP or MLLR criterion. By doing adaptation, the speech recognition systems will show a more reliable performance to the user speaker. 

Keyword spotting

Keyword spotting is the technique which is the process of recognizing a word or a phrase that is the interest of the user from input speech. Keyword spotting was originally approached by using dynamic programming, and there are keyword spotting systems based on hidden Markov models, neural networks and many more. Because of its capability of recognizing only parts of information from the whole input data, it can be used in areas such as information retrieval or classification. 

Music signal processing

The technologies include music summarization, musical instrument identification, music recommendation, music genre classification, speech/music discrimination and mood classification. These technologies are based on DSP, pattern recognition, A.I. and other things with practical music knowledge. 

Audio indexing & retreival

Audio indexing & retrieval have emerged very recently as research topics with the development of Internet. A lot of data, including audio data, are currently not indexed by web search engines. In this context, audio indexing consists in finding good descriptors of audio documents which can be used as indexes for archiving and search: speech/music segments, speaker indexing, language identification, keyword detection, key sounds detection, etc. And audio retrieval means searching audio data using the kinds of indexes mentioned above. 

Multi-modal interface

With the advance of speech, image and video technology, human?computer interaction (HCI) will reach a new phase. In recent years, HCI has been extended to human?machine communication (HMC) and the perceptual user interface (PUI). The final goal in HMC is that the communication between humans and machines is similar to human-to-human communication. Moreover, the machine can support human-to-human communication (e.g. an interface for the disabled). For this reason, various aspects of human communication are to be considered in HMC. The HMC interface, called a multimodal interface, includes different types of input methods, such as natural language, gestures, face and handwriting characters. 

Multimedia contents search

Multimedia contents search is a research topic that has been studied recently with increasing demands of searching for multimedia. it searches a variety of multimedia documents using image, speech etc. In our lab, we are interested in spoken document search that is used in search engines such as Google, Yahoo, Nate etc. its applications include digital photo album management by using voice and dialog classification of ARS agent speech.

Recommended Courses

Ph.D courses

- Pattern recognition theory
- Optimization theory
- Adaptive signal processing
- Digital image processing
- Special topics on DSP
- Special topics on speech processing
- Other computer & software-related courses
- Thesis course

M.S. courses

- Probability and random process
- Digital signal processing
- Speech signal processing
- Speech & Audio coding theory
- Speech recognition system
- Machine learning
- Thesis course