An Opportunity to Grow as a Speech Recognition AI Expert, Made Possible by Smilegate 2024-03-08

An Opportunity to Grow as a Speech Recognition AI Expert, Made Possible by Smilegate

Winner of the Grand Prize at the 2023 Korean Language AI Competition

Junggyun Park | Master’s Program in DHE, Sogang University Graduate School

서강대AI센터 박정균 인터뷰 1.png

Next-generation AI experts supported by Smilegate have achieved another meaningful milestone. Junggyun Park, a master’s student in the “Digital Human & Entertainment” (DHE) track—jointly operated by the Smilegate AI Center and the Graduate School of AI Convergence at Sogang University—was awarded the grand prize at the 2023 Korean Language Artificial Intelligence Competition.

Smilegate AI Center partnered with Sogang University in September 2021 to nurture AI talent through an industry-academic cooperation agreement. The DHE track was newly created with the goal of cultivating future leaders in AI and enabling collaboration between companies and universities to research and commercialize innovative AI technologies. This marks the first time a DHE student has won the competition’s top prize. We spoke with Park about his aspirations as a next-generation AI expert and the unique strengths of the DHE program.

서강대AI센터 박정균 인터뷰 2.png

DHE Track Student Wins Grand Prize at the "2023 Korean AI Competition"

With the growing attention on generative AI, speech recognition technology is also in the spotlight. It is a core capability that allows AI to understand natural language more quickly and accurately—an essential step toward effective problem-solving. The competition aims to discover and support outstanding professionals in Korean-language speech recognition.

Park attracted attention with a speech recognition AI model specifically designed for the elderly and children—groups often classified as “information-vulnerable.” Most existing models are trained on data from average adults, leaving a gap in accuracy for these underrepresented populations. To address this, Park extracted relevant data subsets and fine-tuned a speech recognition model to suit their needs. As a result, his model significantly reduced the character error rate (CER) and word error rate (WER), earning high marks from judges.

*Fine-tuning: A process of refining a pre-trained AI model by feeding it new data to improve performance and shorten training time.

서강대AI센터 박정균 인터뷰 3.png

“My research focus in the DHE program is speech recognition. Most models use data from average individuals, so there’s limited research and data for the elderly or young children. This competition presented an opportunity to address that issue. Despite the data constraints, I applied the Whisper model and achieved strong results. I had participated in the same category last year and received a commendation award, so I’m thrilled to have won the top prize this time.”

* Whisper: An open-source speech recognition model developed by OpenAI, known for ChatGPT.

Using Speech Recognition to Support the Elderly, Children, and People with Disabilities

Park is especially focused on “audiovisual speech recognition,” which improves accuracy in noisy environments by analyzing lip movements in addition to sound. This multimodal approach, using both audio and video input, enhances recognition performance.

*Multi-modal: A method of exchanging information using various input channels such as visual and auditory cues.

More recently, Park has been exploring “context-aware speech recognition.” This approach collects and trains data based on specific situations, enabling the system to better understand context. For instance, recognizing that a user is in a seminar allows the AI to adapt its language model accordingly, since vocabulary differs significantly between casual and professional settings.

서강대AI센터 박정균 인터뷰 4.png

“Speech recognition models will benefit a wide range of industries in the future. Context-aware speech recognition offers personalized services. It can be applied to AI assistants, professional fields, and service industries. I also believe that audiovisual speech recognition will be valuable in kiosk environments. Current kiosks rely on touch, but with robust speech recognition less affected by noise, people could use kiosks by voice even in loud areas. This could greatly help seniors, children, and visually impaired individuals.”

DHE Program Opened the Door to Becoming a Speech Recognition Expert

Park originally majored in journalism and broadcasting. As a non-STEM major, he was able to transition into AI development thanks to the DHE program.

“After deciding to pursue development, I wasn’t sure which direction to take. I learned about Sogang University’s new AI Interdisciplinary Major and enrolled in the DHE track. The curriculum offered a deep dive into both foundational topics like optimization theory and specialized areas such as speech recognition. As someone without a technical background, it was a huge help.”

서강대AI센터 박정균 인터뷰 5.png

He highlighted DHE’s strengths, including personalized research mentorship and hands-on industry-linked projects in areas like generative AI and deep learning for creative work. The field internship with Smilegate AI Center, he added, gave students valuable exposure to commercializing AI technologies.

서강대AI센터 박정균 인터뷰 6.png

“What I learned in the DHE program was incredibly helpful in preparing for the competition. It has many benefits. The chance to exchange ideas with other labs, classmates, and peers is very meaningful. These interactions often lead to interdisciplinary projects, and internships provide opportunities to quickly test ideas in real-world settings. You also gain a clear understanding of what kinds of AI technologies are needed in industry. In the future, I want to develop AI models that are not only useful to professionals but also accessible to the public in their daily lives.”

Request Coverage Back to List

All content published on the Smilegate Newsroom is available for use by the media.
However, when quoting content in articles, please credit it as “Smilegate Newsroom.”