News

Photo

Online presentation information:

Suppose Oral/Poster presenters cannot attend the physical conference due to visa application issues. In that case, you must prepare a pre-recorded video (10 minutes) and send the download link to sarc@nycu.edu.tw before October 10th, 2024. In addition, you should stay tuned during your presentation session on the online meeting platform (Cisco Webex) for QA interactions.

Photo

Poster preparation information:

Poster Size is A0 (118.9cm x 84.1 cm). Layout is Portait. All Poster presenters can send the PDF file of your final poster to sarc@nycu.edu.tw before October 10th, 2024. We will print and mount it to the poster stands free of charge.

Photo

Program information has been updated.

Photo

VISA information. If you need an invitation letter or any other assistance, contact the conference organizers through email sarc@nycu.edu.tw

Photo

Registration. Each accepted paper needs at least a Regular registration. One Regular registration can cover a maximum of one paper.

Photo

Sponsors:

Welcome to O-COCOSDA 2024!

O-COCOSDA 2024: The 27th International Conference of the Oriental COCOSDA will take place from October 17-19 in Hsinchu, Taiwan. Hosted once again by the National Yang Ming Chiao Tung University (NYCU), this conference will be held in-person. We kindly encourage your physical attendance to fully experience and engage with the event.

Oriental COCOSDA (O-COCOSDA), the Oriental chapter of COCOSDA, an acronym of the International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques, was established in 1997. The purpose of O-COCOSDA is to exchange ideas, share information, and discuss regional matters on the creation, utilization, and dissemination of spoken language corpora of oriental languages and also on the assessment methods of speech recognition/synthesis systems as well as promote speech research on oriental languages.

The annual Oriental COCOSDA international conference is the flag conference of Oriental COCOSDA. The first preparatory meeting was held in Hong Kong in 1997 and then the past 26 workshops were held in Japan, Taiwan, China, Korea, Thailand, Singapore, India, Indonesia, Malaysia, Vietnam, Japan, China, Nepal, Taiwan, Macau, India, Thailand, China, Malaysia, Korea, Japan, Philippines, Myanmar, Singapore, Vietnam and India. The 27th Oriental COCOSDA Conference is returning to Taiwan and will be held on Oct. 17-19, in Hsinchu, Taiwan hosted again by the National Yang Ming Chiao Tung University (NYCU), Taiwan.

The organizers of O-COCOSDA 2024 invite all researchers, practitioners, industry partners and sponsors to join the conference. The O-COCOSDA venue provides a regular forum for the presentation of international cooperation in developing speech corpora and coordinating assessment methods of speech input/output systems for both academic and industry researchers. Continuing a series of 26 successful previous meetings, this conference spans the research interest areas of database development and assessment methods. We thank you for your support and look forward to welcoming you to the conference. Stay safe and healthy!

We are pleased to inform you that the conference will be held in-person. Your physical attendance is highly encouraged to facilitate the full experience of the event. However, if you encounter special circumstances (e.g., issues with visa applications), please discuss this with the organizers to seek approval for alternative arrangements.

Program

Time Day 1: 10/17 (Thu.) Day 2: 10/18 (Fri.) Day 3: 10/19 (Sat.)
08:45-09:00 Opening Ceremony
09:00-10:00 Keynote Speech #1
Prof. Jen-Tzung Chien
Keynote Speech #2
Prof. Chin-Hui Lee
Keynote Speech #3
Prof. Satoshi Nakamura
10:00-10:30 Coffee Break Coffee Break Coffee Break
10:30-11:45 Oral session #1
Speech Corpora
Oral session #4
Best Paper Candidates
Country Report
11:45-13:00 Lunch Break Lunch Break Closing Ceremony
(end at 12:15)
13:00-14:00 Oral Session #2
Speech Corpora


SanXia+YingGe
Cultural Tour
14:00-15:00 Poster Session #1
Language Acquisition, Pronunciation,
and Speech Learning
15:00-15:30 Coffee Break
15:30-16:30 Oral Session #3
Speech Generation
16:30-17:30 Poster Session #2
Speech Technology, Machine Learning,
and AI in Speech Processing
18:30-20:30 Welcome Reception Banquet

Oral Session #1
Time: Thursday, October 17, 2024, 10:30 -11:45

Speech Corpora

10:30-10:45
Check Your Audio Data: Nkululeko for Bias Detection

Felix Burkhardt, Bagus Tris Atmaja, Anna Derington, Florian Eyben, and Bjoern Schuller

10:45-11:00
CAO Robot for Taiwanese/English Knowledge Graph Application

Chang-Shing Lee, Mei-Hui Wang, Guan-Ying Tseng, Chao-Cyuan Yue, Hao-Chun Hsieh, and Marek Z Reformat

11:00-11:15
Instant-EMDB: A Multi Model Spontaneous English and Malayalam Speech Corpora For Depression Detection

Reni K Cherian, Chiranjeevi Yarra, Anjali Mathew, Raniya M, Harsha Sanjan, Amjith S B, Alex Starlet Ben, and Priyanka Srivastava

11:15-11:30
Chinese Psychological Counseling Corpus Construction for Valence-Arousal Sentiment Intensity Prediction

Hsiu-Min Shih, Tzu-Mi Lin, Yu-Wen Tzeng, Jung-Ying Chang, Kuo-Kai Shyu, and Lung-Hao Lee

11:30-11:45
UCSYSpoof: A Myanmar Language Dataset for Voice Spoofing Detection

Hay Mar Soe Naing Hay Mar Soe Naing, Win Pa Pa, Aye Mya Hlaing, Myat - Aye Aye Aung, Kasorn Galajit, and Candy Olivia Mawalim

Oral Session #2
Time: Thursday, October 17, 2024, 13:00 -14:00

Speech Corpora

13:00-13:15
Speech Watermarking for Tampering Detection using Singular Spectrum Analysis with Quantization Index Modulation and Psychoacoustic Model

Pantarat Vichathai, Puchit Bunpleng, Patharapol Laolakkana, Sasiporn Usanavasin, Phondanai Khanti, Kasorn Galajit, and jessada karnjana

13:15-13:30
Iiitsaint-emomdb: Carefully Curated Malayalam Speech Corpus With Emotion And Self-reported Depression Ratings

Reni K Cherian, Talit Sara George, Chiranjeevi Yarra, priyanka Srivastava, Guneesh Vats, Christa Thomas, Aravind Johnson, and Ashin George

13:30-13:45
CL-CHILD Corpus: The Phonological Development of Putonghua in Children from Dialect-speaking Regions

Jiewen Zheng, Tianxin Zheng, and Mengxue Cao

13:45-14:00
WikiTND24: A Chinese Text Normalization Database

Wu-Hao Li and Chen Yu Chiang

Oral Session #3
Time: Thursday, October 17, 2024, 15:30 -16:30

Speech Generation

15:30-15:45
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

Li-Wei Chen, Hung-Shin Lee, and Chen-Chi Chang

15:45-16:00
Learning Contrastive Emotional Nuances in Speech Synthesis

Bryan Gautama Ngo, Mahdin Rohmatillah, and Jen-Tzung Chien

16:00-16:15
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and BERT LID

Ahmad Alfani Handoyo, Chung Quang Tran, Dessi Puji Lestari, and Sakriani Sakti

16:15-16:30
Exemplar-based Methods For Mandarin Electrolaryngeal Speech Voice Conversion

Hsin-Te Hwang, Chia-Hua Wu, Ming-Chi Yen, Yu Tsao, and Hsin-Min Wang

Oral Session #4
Time: Thursday, October 17, 2024, 10:30 -11:45

Best Paper Candidates

10:30-10:45
Proposal of Protocols For Speech Materials Acquisition And Presentation Assisted By Tools Based On Structured Test Signals

Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, and Kohei Yatabe

10:45-11:00
Exploring Impact of Prioritizing Intra-Singer Acoustic Variations on Singer Embedding Extractor Construction for Singer Verification

Sayaka Toma, Ariga Tomoki, Yosuke Higuchi, Ichiju Hayasaka, Rie Shigyo, and Tetsuji Ogawa

11:00-11:15
A Feedback-driven Self-improvement Strategy And Emotion-aware Vocoder For Emotional Voice Conversion

Zhang Zhanhang and Sakriani Sakti

11:15-11:30
ConvCounsel: A Conversational Dataset for Student Counseling

Po-Chuan Chen, Mahdin Rohmatillah, You Teng Lin, and Jen-Tzung Chien

11:30-11:45
Construction of Large Language Models for Taigi and Hakka Using Transfer Learning

Yen-Chun Lai, Yi-Jun Zheng, Wen-Han Hsu, Yan-Ming Lin, Cheng-Hsiu Cho, Chao-Shih Huang, Chih-Chung Kuo, and Yuan-Fu Liao

Poster Session #1
Time: Thursday, October 17, 2024, 14:00 -15:00

Language Acquisition, Pronunciation, and Speech Learning

P1-1
A Parameter-efficient Multi-step Fine-tuning of Multilingual And Multi-task Learning Model for Japanese Dialect Speech Recognition

Yuta Kamiya, Shogo Miwa, and Atsuhiko Kai

P1-2
A Study on The Acquisition of Triphthong Vowels by Altaic Chinese Learners Under The ‘belt And Road’ Initiative

Yuan Jia and Linjiao Pan

P1-3
Acoustic Realization of /s/ Across Accents Of Urdu

Iram Fatima and Sahar Rauf

P1-4
Age-related and Gender-related Differences in Cantonese Vowels

Wai-Sum Lee

P1-5
An Investigation of Chinese Speech Under Alcohol Influence: Database Construction and Phonetic Analysis

Peppina Po-Lun Lee and Mosi He; Bin Li

P1-6
Analysis of Pathological Features for Spoof Detection

Win Pa Pa, Myat - Aye Aye Aung, Hay Mar Soe Naing Hay Mar Soe Naing, Aye Mya Hlaing, Kasorn Galajit, and Candy Olivia Mawalim

P1-7
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

Chen-Chi Chang, Ching-Yuan Chen, Hung-Shin Lee, and Chi-Cheng Lee

P1-8
Clapping Hands To Word Stress Improves Children’s L2 English Pronunciation Accuracy in A Word Imitation Task: Evidence from A Classroom Study

Chen Meiyun

P1-9
Comparative Study on The Phonetic Characteristics of Chinese Vowels Between Kyrgyz and Kirgiz Learners

Yuan Jia and Mingshuai Yin

P1-10
Computer-assisted Pronunciation Training System for Atayal, An Indigenous Language in Taiwan

Yu-Lan Chuang, Hsiu-Ray Hsu, Di Tam Luu, Yi-Wen Liu, and Ching-Ting Hsin

P1-11
Continual Gated Adapter for Bilingual Codec Text-to-speech

Li-Jen Yang and Jen-Tzung Chien

P1-12
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory

Geoffrey Tyndall, Kurniawati Azizah, Dipta Tanaya, Ayu Purwarianti, Dessi Puji Lestari, and Sakriani Sakti

P1-13
Developing A Robust Mispronunciation Detection by Data Augmentation Based on Automatic Phone Annotation

Jong In Kim, Sunhee Kim, and Minhwa Chung

P1-14
Developing A Thai Name Pronunciation Dictionary from Road Signs and Naming Websites

Ausdang Thangthai

P1-15
Development of An English Oral Assessment System with The Gept Dataset

Hao Chien Lu, Chung-Chun Wang, Jacob Lin, and Berlin Chen

P1-16
Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities

Aulia Adila, Dessi Puji, Ayu Purwarianti, Dipta Tanaya, Kurniawati Azizah, and Sakriani Sakti

P1-17
Enhancing Phoneme Recognition in The Bengali Language Through Fine-tuning of Multilingual Model

Akash Deep, Puja Bharati, Sabyasachi Chandra, Debolina Pramanik, and Shyamal Kumar Das Mandal

P1-18
Exploration of Mongolian Word Stress Research Methods Based on Intonation Synthesis Technology

ao min

P1-19
Fusion of Multiple Audio Descriptors for The Recognition of Dysarthric Speech

Komal Bharti and Pradip K. Das

P1-20
Gated Adapters with Balanced Activation for Effective Contextual Speech Recognition

Yu-Chun Liu, Yi-Cheng Wang, Li-Ting Pai, Jia-Liang Lu, and Berlin Chen

P1-21
Improving Speech Recognition by Enhancing Accent Discrimination

Hao-Tian Zheng and Berlin Chen

P1-22
Research on the Temporal Effect of Focus on Trisyllabic Sequences in Leizhou Min

Maolin Wang

P1-23
Right-prominent Trisyllabic Tone Sandhi in Taifeng Chinese

Xiaoyan Zhang, Aijun LI, and Zhiqiang Li

P1-24
The Development of LOTUS-TRD: A Thai Regional Dialect Speech Corpus

Sumonmas Thatphithakkul, Kwanchiva Thangthai, and Vataya Chunwijitra

P1-25
The Effectiveness of Audio-visual Feedback for L2 Chinese Sentence Stress Perception And Production

Xingzi Gao, Yujie Gao,and Sichang Gao

P1-26
Unified Spoken Language Proficiency Assessment System

Sunil Kumar Kopparapu and Ashish Panda

P1-27
Using Automatic Speech Recognition for Speech Comprehension Evaluation in The Cochlear Implant

Hsin-Li Chang, Enoch Hsin-Ho Huang, Yi-Ching Wang, and Yu Tsao

Poster Session #2
Time: Thursday, October 17, 2024, 16:30 -17:30

Speech Technology, Machine Learning, and AI in Speech Processing

P2-1
A Deep Learning Based Approach with Data Augmentation For Infant Cry Sound Verification

Namita Nagappa Gokavi, Sri Ramulu Padala, Nanda Kishore Kandregula, and Sunil Saumya, and Deepak T

P2-2
A Preliminary Study on End-to-end Multimodal Subtitle Recognition for Taiwanese TV Programs

Pei-Chung Su, Cheng-Hsiu Cho, Chih-Chung Kuo, Yen-Chun Lai, Yan-Ming Lin, Chao-Shih Huang, and Yuan-Fu Liao

P2-3
A Preliminary Study on Taiwanese POS Taggers: Leveraging Chinese in The Absence Of Taiwanese POS Annotation Datasets

Chao-Yang Chang, Yan-Ming Lin, Chih-Chung Kuo, Yen-Chun Lai, Chao-Shih Huang, and Yuan-Fu Liao

P2-4
Agent-Driven Large Language Models for Mandarin Lyric Generation

Hong-Hsiang Liu and Yi-Wen Liu

P2-5
An Evaluation of Neural Vocoder-based Voice Cloning System for Dysphonia Speech Disorder

Dhiya Ulhaq Dewangga, Dessi Puji, Ayu Purwarianti, Dipta Tanaya, Kurniawati Azizah, and Sakriani Sakti

P2-6
An N-best List Selection Framework for ASR N-best Rescoring

Chen-Han Wu and Kuan-Yu Chen

P2-7
Analysis and Detection of Differences in Spoken User Behaviors Between Autonomous and Wizard-of-oz Systems

Mikey Elmers, Koji Inoue, Divesh Lala, Keiko Ochi, and Tatsuya Kawahara

P2-8
Analysis and Discussion of Feature Extraction Technology for Musical Genre Classification

Shu-Hua Chen, Wei-Ting Huang, Cheng-Hao Lai, Yu-Lun Lin, and Ming-Hsiang Su

P2-9
Annotation of Addressing Behavior in Multi-party Conversation

Keisuke Kadota, Seima Oyama, and Yasuharu Den

P2-10
Benchmarking Clickbait Detection from News Headlines

Ying Lung Lin, Liang Chih Yu, and Shao Ying Lu

P2-11
Chunk Size Scheduling for Optimizing The Quality-latency Trade-off in Simultaneous Speech Translation

Iqbal Pahlevi Amin, Haotian Tan, Kurniawati Azizah, and Sakriani Sakti

P2-12
Comprehensive Benchmarking and Analysis of Open Pre-trained Thai Speech Recognition Models

Pattara Tipakasorn, Oatsada Chatthong, Ren Yonehana, and Kwanchiva Thangthai

P2-13
Depression Classification Using Log-mel Spectrograms: A Comparative Analysis of Window Size-based Data Augmentation and Deep Learning Models

Lokesh Kumar, Kumar Kaustubh, Shashaank Aswatha Mattur, and Mahadeva Prasanna

P2-14
Effects of Multiple Japanese Datasets for Training Voice Activity Projection Models

Yuki Sato, Yuya Chiba, and Ryuichiro Higashinaka

P2-15
Exploring Branchformer-based End-to-end Speaker Diarization with Speaker-wise VAD Loss

Pei-Ying Lee, Hau-Yun Guo, Tien-Hong Lo, and Berlin Chen

P2-16
IIIT-speech Twins 1.0: An English-Hindi Parallel Speech Corpora for Speech-to-speech Machine Translation And Automatic Dubbing

Anindita Mondal, Anil Vuppala, and Chiranjeevi Yarra

P2-17
Improving Real-Time Music Accompaniment Separation with MMDenseNet

Chun-Hsiang Wang, Chung-Che Wang, Jun-You Wang, Roger Jang, and Yen-Hsun Chu

P2-18
Infant Cry Verification With Multi-view Self-attention Vision Transformers

Kartik Jagtap, Namita Nagappa Gokavi, and Sunil Saumya

P2-19
Modeling Response Relevance Using Dialog Act and Utterance-design Features: A Corpus-based Analysis

Mika Enomoto, Yuichi Ishimoto, and Yasuharu Den

P2-20
Multi-resolution Singing Voice Separation

Yih-Liang Shen, Ya-Ching Lai, and Tai-Shih Chi

P2-21
Multilingual speech translator for medical consultation

Zhe-Jia Xu, Yeou-Jiunn Chen, and Qian-Bei Hong

P2-22
Overcoming The Impact Of Different Materials On Optical Microphones for Speech Capture Using Deep Learning Integrated Training Data

Yi-Hao Jiang, Jia-Hui Li, Jia-Wei Chen, Yi-Chang Wu, and Ying-Hui Lai

P2-23
Robust Audio-visual Speech Enhancement: Correcting Misassignments in Complex Environments With Advanced Post-processing

Wenze Ren, Kuo-Hsuan Hung, Rong Chao, You-Jin Li, Hsin-Min Wang, and Yu Tsao

P2-24
Singer Separation for Karaoke Content Generation

Hsuan-Yu Lin

P2-25
Uncertainty-based Ensemble Learning for Speech Classification

Bagus Tris Atmaja, Akira Sasou, and Felix Burkhardt

P2-26
A Neural Machine Translation System for The Low-resource Sixian Hakka Language

Yi-Hsian Hung and Yi-Chin Huang

Keynote Speakers

Jen-Tzung Chien


Learning Towards Generative and Conversational AI
Speaker: Prof. Jen-Tzung Chien

Lifetime Chair Professor, National Yang Ming Chiao Tung University

Biography:

Jen-Tzung Chien received his Ph.D. degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 1997, and is currently the Lifetime Chair Professor in National Yang Ming Chiao Tung University, Hsinchu, Taiwan. He has authored more than 250 peer-reviewed articles in machine learning, deep learning, and Bayesian learning with applications on speech and natural language processing, and three books including Bayesian Speech and Language Processing, Cambridge University Press, 2015, Source Separation and Machine Learning, Academic Press, 2018, and Machine Learning for Speaker Recognition, Cambridge University Press, 2020. He was a Tutorial Speaker of AAAI, IJCAI, ACL, KDD, ICASSP, COLING and Interspeech. He received the Best Paper Award in IEEE Workshop on Automatic Speech Recognition and Understanding in 2011, and IEEE International Workshop on Machine Learning for Signal Processing in 2023.

Abstract:

Spoken dialogue systems have become crucial to build a wide range of virtual assistants for customer service, entertainment and health. These systems are composed of various components including automatic speech recognition, text-to-speech, and natural language generation, which involve delicate multimodal machine learning towards multilingual generative models. This talk will focus on state-of-the-art generative models in individual components and address how the pre-trained foundation models are utilized to re-shape the architecture via adapters, re-function the foundation via prompting and re-program the dialogue via flow control. We will also explore the challenges and opportunities through learning and integrating these components into a comprehensive conversation system.

CHIN-HUI LEE


Language-Universal Speech Processing: Lessons learned from ASAT and Large Pre-trained Models with Extensions to Multilingual ASR

Speaker: Prof. Chin-Hui Lee

IEEE & ISCA Fellow, Georgia Tech

Biography:

Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as the Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published 30 patents and about 600 papers, with more than 55,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award for speech recognition products in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in Scientific Achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''. His two pioneering papers on deep regression accumulated over 2200 citations and won a Best Paper Award from IEEE Signal Processing Society in 2019.

Abstract:

With recent advances in deep neural networks and large pre-trained models, the baseline performances for automatic speech recognition (ASR) of resource-rich languages have improved a great deal. However, only a few applications have been deployed in our daily life. Part of the reason was due to past black-box approaches to ASR without leveraging upon speech knowledge sources, resulting in unsatisfactory recognition results in many situations. On the other hand, knowledge-based approaches, such as automatic speech attribute transcription (ASAT), were not practiced in the machine-learning community due to difficulties to integrate speech knowledge into building ASR system. Since speech attributes are usually language-universal, they serve as an ideal set of fundamental units to build speech models. They also share common distinct features among different languages such that good models can also be established for speech processing to detect speech cues needed to correct unexpected ASR results. In this talk, we will discuss ways the O-COCOSDA community can contribute to developing robust multilingual ASR systems for many resource-limited languages in this region.

Satoshi Nakamura


Recent trends in speech translation
Speaker: Prof. Satoshi Nakamura

Professor, Chinese University of Hong Kong, Shenzhen

Biography:

Dr. Satoshi Nakamura is a full professor at The Chinese University of Hong Kong, Shenzhen. He is also a professor emeritus at Nara Institute of Science and Technology (NAIST) and Honorarprofessor of Karlsruhe Institute of Technology, Germany. He received his B.S. from Kyoto Institute of Technology in 1981 and Ph.D. from Kyoto University in 1992. He was an Associate Professor in the Graduate School of Information Science at NAIST from 1994-2000. He was Department head and Director of ATR Spoken Language Communication Research Laboratories in 2000-2004, and 2005-2008, respectively, and Vice president of ATR in 2007-2008. He was Director General of Keihanna Research Laboratories and the Executive Director of Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, Japan 2009-2010. He moved to Nara Institute of Science and Technology as a full professor in 2011. He established the Data Science Center at NAIST and was a director from 2017 to 2021. He also served as a team leader of the Tourism Information Analytics Team at the AIP center of RIKEN Institute, Japan, from 2017-2021. He is currently a full professor at The Chinese University of Hong Kong, Shenzhen, China. His research interests include modeling and systems of spoken language processing, speech processing, spoken language translation, spoken dialog systems, natural language processing, and data science. He is one of the world leaders in speech-to-speech translation research. He has been serving various speech-to-speech translation research projects, including C-Star, A-Star, and the International Workshop on Spoken Language Translation IWSLT. He is currently the chairperson of ISCA SIG SLT (Spoken Language Translation). He also contributed to the standardization of the network-based speech translation at ITU-T. He was a committee member of IEEE SLTC 2016-2018. He was an Elected Board Member of the International Speech Communication Association, ISCA, from 2012 to 2019. He received the Antonio Zampolli Prize in 2012 and retained the title of IEEE Fellow, ISCA Fellow, IPSJ Fellow, and ATR Fellow.

Abstract:

After long research, speech translation technology has reached the level of providing a service using a smartphone. However, there are still various problems in realizing automatic simultaneous interpretation that produces the interpretation output before the end of the utterance. In this talk, I will introduce the recent research activities on automatic simultaneous speech translation and the simultaneous speech translation system developed for IWSLT shared tasks. The talk also includes research activities on speech translation, preserving para-linguistic information, and utilizing the pre-trained Large Language Models.

SanXia+YingGe Culture Tour

Registration

Registration is officially open

The Full Registration fee covers the following costs to the conference:

  • Admission to all conference sessions
  • Lunch, plus morning and afternoon coffee breaks (with light snacks)
  • Admission to the Welcome Reception on Thursday, 17th of October located at the conference venue
  • A half day cultural tour on Friday, 18th of October.
  • Admission to the Banquet on Friday, 18th of October located at the Ambassador Hsinchu
  • Please note: fees of accommodation and transportation from/to airport are not included.
Registration Type Early
(Sept. 1- Sept. 30)
Late
(Oct. 1– Oct. 10)
Onsite
(Oct. 17-19)
Regular Member NTD 10,000 NTD 11,500 NTD 13,000
Non-Member NTD 13,000 NTD 14,500 NTD 16,000
Student Member NTD 5,000 NTD 6,500 NTD 8,000
Non-Member NTD 6,500 NTD 8,000 NTD 9,500

Notes:

  1. One USD in almost equal to 33 NTD.
  2. Each accepted paper needs at least a Regular registration. One Regular registration can cover a maximum of one paper.
  3. Member fees are eligible for participants hold the current IEEE/ACLCLP/ISCA membership.
  4. Participants from least developed countries will be provided with a discounted fee. If you require financial assistance, please contact sarc@nycu.edu.tw in advance before registration.
  5. Although O-COCOSDA 2024 is planned to be held in person, we understand that there may be special circumstances (such as visa issues). In such cases, please feel free to discuss with sarc@nycu.edu.tw to arrange alternative means of presenting your paper (such as online with the corresponding fees).

Cancellation Policy

Cancellation of registration one month or more prior to the conference (September 17th, 2024, or before) will be refunded the full amount paid (minus up to NTD 500 processing fee). Cancellations up to two weeks prior to the conference (October 3rd, 2024) will be refunded a half of amount paid (minus up to NTD 500 processing fee). Cancellations after two weeks prior to the start of the conference (October 4th, 2024, or after) and all no-shows will not be refunded.

Notes:

  1. Registration fees can be transferred to an alternative participant. Please feel free to contact sarc@nycu.edu.tw for this request if needed.
  2. Cancellation due to governmental travel restrictions, failure to obtain a visa, or serious illness that prevents travel to the conference will be refunded the full amount paid (minus up to NTD 500 processing fee), regardless of when the notification is received before the start of the conference. Documentation supporting the cancellation exceptions is required.

VISA Information

Introduction

Citizens from over 30 foreign nations can enter Taiwan with visa-exempt status or with landing visas.

For information regarding obtaining a Taiwanese visa, please refer to the website of the Bureau of Consular Affairs, Ministry of Foreign Affairs.

Visa-Exempt Entry

  1. The nationals of the following countries are eligible for the visa exemption program, which permits a duration of stay up to 90 days: Austria, Belgium, Bulgaria, Canada, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Japan, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Monaco, the Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, U.K. , and Vatican City State.
  2. The nationals of the following countries are eligible for the visa exemption program, which permits a duration of stay up to 30 days: Australia, Republic of Korea, Malaysia, Singapore, U.S.A..
  3. The nationals of India, Thailand, Philippines, Vietnam, and Indonesia who also possess a valid visa or permanent residence certificate issued by the U.S.A., Canada, Japan, U.K., Schengen Convention countries, Australia or New Zealand are eligible for the visa exemption program, which permits a duration of stay up to 30 days. Those who meet the above qualification and have never been employed in Taiwan as blue-collar workers, have to apply to the “Advance Online Registration System for the Visitors of Nationals from Four Southeast Asian Countries to Taiwan” of National Immigration Agency for Entry Permit before coming to Taiwan. Upon completion, the printed confirmation is used to validate the traveler during the boarding and the immigration check. During the immigration check, travelers who can not show a valid visa or permanent resident visa issued by one of the aforementioned countries will not be admitted into the country.
  4. Detailed information, please see https://www.boca.gov.tw/lp-149-2.html

Landing Visas

  1. Holders of emergency or temporary passports with a validity of more than six months for nationals of those countries eligible for visa-exempt entry.
  2. Holders of USA passport with validity less than six months.
  3. Detailed information, please see https://www.boca.gov.tw/mp-2.html

Other Visa Information

Venue

Location

Microelectronics and Information Systems Research Center (MIRC) in National Yang Ming Chiao Tung University, Hsinchu, Taiwan

Transportation

From TaoYuan International Airport to NYCU

  1. High Speed Rail Between TaoYuan Airport and HsinChu:
    Airport (take Taoyuan Airport Metro, fee ~35 TWD, 20 minutes) → TaoYuan High Speed Rail station (take high-speed train, fee ~130TWD, 10 minutes) → HsinChu High Speed Rail station → Bus or Taxi to NYCU or Hsinchu downtown (~20 TWD, 20 minutes)
  2. Taxi to NYCU and Hsinchu downtown (fee ~ 1500TWD, 50 minutes)

From Taipei Shong-shan International Airport to NYCU

  • Airport (take MRT MuzhaNeihu line, change to MRT Tuncheng Bannan line in Zhaongxiao Fuxing station) → Taipei Main station (take High Speed Rail, fee ~290TWD) → HsinChu High Speed Rail station → Free bus to NYCU and Hsinchu downtown

Climate

Taiwan's climate is subtropical with average annual temperatures of 19℃ (66℉) in the north and 21℃(69℉) in the south. Autumn is from September through November and is usually cool with an average temperature of 20℃ to 24℃ (68℉ to 75℉).

The weather report of Taiwan can be found at the website of Central Weather Administration.

Photo

Currency

The Taiwan currency unit is the New Taiwan Dollar (NTD). (1 US dollar ~ 33 NTD as of August 2024.) Major credit cards such as VISA, MasterCard, American Express, Diners Club, and JCB are accepted at most hotels and many stores.

Photo

Tipping and Tax

There is no tipping in Taiwan. Customers pay the exact amount that appears on the bill. Taiwan has a 5% consumer tax on all goods and services. All published prices (including restaurant menus, taxi fares, supermarket prices) include consumer tax.

Photo

Electricity

Taiwan has the same electrical standards as the US and Canada. The power plugs and sockets are of type A and B. The standard voltage is 110 V and the standard frequency is 60 Hz.

Photo

Accommodation

Introduction

Hsinchu City presents an international high-tech community blessed where many conferences choose to take place. There are plenty of choices for hotels, and we suggest the selected ones with concerns of the best comfort and convenience during your stay in Taiwan. All the following four hotels have coordinated with the O-COCOSDA 2024 Conference to charge in special rates and provide scheduled free shuttle bus transportation from/to the conference place for each conference day. For best deals, please reserve accommodation well in advance.

Ambassador Hsinchu (Banquet venue, 12-Minute drive) more information

Star: ★★★★★

Tel: +886-3-515-1111

Address: No.188, Section 2, Zhonghua Rd., HsinChu 30060, Taiwan

https://www.ambassador-hotels.com/en/hsinchu/

If you have a reservation request, please send the reservation form to rsvn.hc@ambassador-hotels.com before 10/15

THE HO HOTEL (10-Minute walk)

Star: ★★★★★

Tel: +886-3-571-5888

Address: No.16, Daxue Rd., East Dist., Hsinchu City 300,Taiwan

https://www.thehohotel.com.tw/index.php?lang=en

If you have a reservation request, please send the reservation form to rsvn@thehohotel.com.tw

Hotel Royal Hsinchu (7-Minute drive)

Star: ★★★★★

Tel: +886-3-563-1122

Address: No 227 Kuan Fu Road Section 1, Hsinchu, Taiwan

https://www.hotelroyal.com.tw/en-us/hsinchu

If you have a reservation request, please send the reservation form to hotel@hc.hotelroyal.com.tw

Kingdom Hotel (8-Minute drive) more information

Star: ★★★

Tel: +886-3-564-6655

Address: NO.238, Kuan Fu Road Section 1., Hsinchu City 300,Taiwan

http://www.kingdomhotel.com.tw/index.php?language=en

If you have a reservation request, please send the reservation form to kingdom@seed.net.tw

Call for Papers

Introduction

Papers are invited on substantial, original, and unpublished research on all aspects of speech databases, assessments, and speech I/O, including, but not limited to:

Topics

  • Speech databases and text corpora
  • Assessment of speech input and output technologies
  • Phonetic/phonological systems for oriental languages
  • Romanization of non-roman characters
  • Segmentation and labeling
  • Speech prosody and labeling
  • Speech processing models and systems
  • Multilingual speech corpora
  • Special topics on speech databases and assessments
  • Standardization
  • Any other relevant topics

Official Language

  • The official language of O-COCOSDA2024 is English.

Submission

Submissions should describe unpublished original work. Papers may consist of up to six (6) pages of content, including references and appendices. All submitted papers must be written in English and conform to the double-column format of IEEE Signal Processing Society conferences. The paper templates (word and latex) can be downloaded below:

Papers must be submitted through the Microsoft CMT system on or before July 15, 2024 August 31, 2024, using this hyperlink: https://cmt3.research.microsoft.com/OCOCOSDA2024

Accepted papers will be presented orally or as posters as determined by the committee. The conference proceedings will be submitted to the IEEE Xplore digital library. All the accepted papers must go through the file conversion offered by IEEE PDF eXpress. You can refer to the link here: https://ieee-pdf-express.org/account/login?ReturnUrl=%2F. The Conference ID is 64382X.

General Information

  • All submitted papers should include “Index Terms” following the Abstract.
  • All submissions should be camera-ready PDF files of up to six (6) pages in length. There must be no password protection on the PDF file, and all fonts must be embedded.
  • All submitted papers must be original contributions that neither have been submitted to any other conference or journal nor will be submitted to any other conference or journal during the review process.
  • At least one author of an accepted paper must be registered by the early registration deadline; otherwise, the paper will be withdrawn and not published in the proceedings.
  • Each accepted paper must be presented by one of its authors in person at the conference.

Important Dates

  • Full Paper Submission: July 15, 2024 July 29, 2024 August 31, 2024 (this is a hard deadline)
  • Notification of Acceptance: August 19, 2024 August 24, 2024 September 20, 2024
  • Final Manuscript Submission: September 30, 2024
  • Early Registration: September 30, 2024
  • Conference: October 17-19, 2024

All deadlines are 11:59 pm UTC-12h (anywhere on earth)

Organization

Convenor

Sakriani Sakti

Nara Institute of Science and Technology, Japan

Honorary Chair

Sin-Horng Chen

National Yang Ming Chiao Tung University, Taiwan

General Chairs

Yuan-Fu Liao

National Yang Ming Chiao Tung University, Taiwan

Lung-Hao Lee

National Yang Ming Chiao Tung University, Taiwan

Hsin-Min Wang

Academia Sinica, Taiwan

Shaw-Hwa Hwang

National Yang Ming Chiao Tung University, Taiwan

Program Chairs

Chi-Chun Lee

National Tsing Hua University, Taiwan

Yu Taso

Academia Sinica, Taiwan

Publication Chairs

Ming-Hsiang Su

Soochow University, Taiwan

Jui-Feng Yeh

National Chiayi University, Taiwan

Local Arrangement Chairs

Chih-Chung Kuo

Speech AI Research Center, Taiwan

Chao-Shih Huang

Speech AI Research Center, Taiwan

Web Master

Tzu-Mi Lin

National Yang Ming Chiao Tung University, Taiwan

Organized by

Co-organized by

Supported by

Sponsored by

Steering Committee

Satoshi Nakamura

Nara Institute of Science and Technology, Japan (Chair)

Wutiwiwachai Chai

National Electronics and Computer Technology Center, Thailand

Yong-Ju Lee

Seoul National University of Science and Technology, Korea

Aijun Li

Chinese Academy of Social Sciences, China

Haizhou Li

National University of Singapore, Singapore

Luong Chi Mai

Institute of Information Technology, Vietnam

Agrawal Shyam

Kalinga Institute of Industrial Technology, India

Hsin-Min Wang

Academia Sinica, Taiwan

International Advisory Committee

Shyam S. Agrawal

KIIT, Gurgaon & CDAC, Noida, India

Jai Raj Awasthi

Tribhuvan University, Nepal

Nick Campbell

Trinity College Dublin, Ireland

Pak-Chung Ching

Chinese University of Hong Kong, Hong Kong

Hiroya Fujisaki

Tokyo University, Japan

Dafydd Gibbon

Bielefeld University, German

Shuichi Itahashi

NII/AIST, Japan

Lin-Shan Lee

National Taiwan University, Taiwan

Yong Ju Lee

Wonkwang University, Korea

Aijun Li

Chinese Academy of Social Sciences, China

Haizhou Li

Institute of Infocom Research, Singapore

Luong Chi Mai

The Vietnamese Academy of Sciences, Vietnam

Joseph Mariani

LIMSI-CNRS, France

Satoshi Nakamura

Nara Institute of Science and Technology, Japan

Hammam Riza

BPPT, Indonesia

Yoshinori Sagisaka

Waseda University, Japan

Chai Wutiwiwatchai

NECTEC, Thailand

Thomas Fang Zheng

Tsinghua University, China

Program Committee

Farah Adeeba

CLE-KICS, Pakistan

Shyam Agrawal

KIIT College of Engineering, India

Masato Akagi

Japan Advanced Institute of Science and Technology, Japan

Karunesh Arora

CDAC, India

Nguyen Bach

Alibaba US

Khac-Hoai Nam Bui

Viettel Cyberspace Center, Vietnam

Siqi Cai

National University of Singapore, Singapore

Kuan-Yu Chen

National Taiwan University of Science and Technology, Taiwan

Sin-Horng Chen

National Yang Ming Chiao Tung University, Taiwan

Maria Art Antonette Clariño

University of the Philippines Los Baños, Philippines

Ratnadeep Deshmukh

Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India

Chenchen Ding

NICT, Japan

Tuan Dinh

Amazon Alexa, USA

Truong Do

Vietnam Artificial Intelligence Solutions, Vietnam

Van Hai Do

Thuyloi University, Vietnam

Minghui Dong

Institute for Infocomm Research, Singapore

Ngoc Duong

Interdigital, USA

Chatchawarn Hansakunbuntheung

National Science and Technology Development Agency, Thailand

Aye Mya Hlaing

UCSY, Myanmar

Huy Phan

Queen Mary University of London, UK

Aijun Li

Institute of Linguistics,CASS, China

Lantian Li

Tsinghua University, China

Yongwei Li

Institute of Automation, Chinese Academy of Sciences, China

Rui Liu

National University of Singapore, Singapore

Yanfeng Lu

Institute for Infocomm Research, Singapore

Chi Mai Luong

Institute of Information Technology, VAST, Vietnam

Dang-Khoa Mac

Vingroup Big Data Institute, Vietnam

Hsu Myat Mo

University of Computer Studies, Yangon, Myanmar

Aye Nyein Mon

University of Computer Studies, Yangon, Myanmar

Satoshi Nakamura

Nara Institute of Science and Technology, Japan

Binh Nguyen

KIT, Germany

Quang Minh Nguyen

Vietnam Artificial Intelligence Solutions, Vietnam

Thai Son Nguyen

Free University of Bolzano, Italy

Thi Minh Huyen Nguyen

Vietnam National University, Hanoi, Vietnam

Trang Nguyen

Hanoi University of Science and Technology, Vietnam

Van Huy Nguyen

Thai Nguyen University of Technology, Vietnam

Viet Son Nguyen

Hanoi University of Science and Technology, Vietnam

Nathaniel Oco

De La Salle University, Philippines

Chutamanee Onsuwan

Faculty of Liberal Arts and CILS, Thammasat University, Thailand

Yadanar Oo

University of Computer Studies, Yangon, Myanmar

Win Pa

University of Computer Studies, Yangon, Myanmar

Ronald Pascual

De La Salle University, Philippines

Van Tung Pham

TikTok Pte. Ltd., Singapore

Trung-Nghia Phung

Thai Nguyen University of Information and Communication Technology, Vietnam

Rodolfo Jr Raga

Jose Rizal University, Philippines

Chi Mai Luong

Institute of Information Technology, VAST, Vietnam

Hammam Riza

BPPT, Indonesia

Sakriani Sakti

Nara Institute of Science and Technology, Japan

Yimonshwe Sin

University of Computer Studies, Yangon, Myanmar

Shweta Sinha

Amity University Gurgaon, India

Hay Mar Soe Naing

University of Computer Studies, Yangon, Myanmar

Thanh T. H. Duong

Hanoi University of Mining and Geology, Vietnam

Samudra Vijaya

Koneru Lakshmaiah Education Foundation, India

Dong Wang

Tsinghua University, China

Hsin-Min Wang

Academia Sinica, Taiwan

Chai Wutiwiwatchai

Human Language Technology Laboratory, NECTEC, Thailand

Yanlu Xie

Beijing Language and Culture University, China

Chenglin Xu

National University of Singapore, Singapore

Reviewer

Farah Adeeba

University of Engineering & Technology, Pakistan

Jyoti Arora

Maharaja Surajmal Institute of Technology, India

Shweta Bansal

K. R. Mangalam University, India

Sandeep Bhongade

Shri G.S. Institute of Technology & Science, India

Siqi Cai

National University of Singapore, Singapore

Yung-Chun Chang

Taipei Medical University, Taiwan

Rong Chao

National Taiwan University, Taiwan

Fei Chen

Southern University of Science and Technology, China

Kuan-Yu Chen

National Taiwan University of Science and Technology, Taiwan

Yeou-Jiunn Chen

Southern Taiwan University of Science and Technology, Taiwan

Yu-Wen Chen

Columbia University, United States

Ratnadeep Deshmukh

Dr. Babasaheb Ambedkar Marthwada University, India

Minghui Dong

Institute of Infocomm Research, Singapore

Thanh Thi-Hien Duong

Hanoi University of Mining and Geology, Vietnam

Muhammad Umar Farooq

University of Sheffield, England

Xiaoxue Gao

A*Star, Singapore

Aye Mya Hlaing

University of Computer Studies, Myanmar

Qian-Bei Hong

Southern Taiwan University of Science and Technology, Taiwan

Hsin-Ho Huang

Academia Sinica, Taiwan

Jeih-Weih Hung

National Chi Nan University, Taiwan

Kuo-Hsuan Hung

Academia Sinica, Taiwan

Hsin-Te Hwang

Research Center for Information Technology Innovation, Academia Sinica, Taiwan

Pooja Kherwa

Maharaja Surajmal Institute of Technology, India

Ravinder Kumar

Saginaw Valley State University, United States

Lung-Hao Lee

National Yang Ming Chiao Tung University, Taiwan

Aijun Li

Institute of Linguistics, CASS, China

Yuan-Fu Liao

National Yang Ming Chiao Tung University, Taiwan

Kai-Chun Liu

Academia Sinica, Taiwan

Xugang Lu

National Institute of Information and Communications Technology, Japan

Yanfeng Lu

Institute of Infocomm Research, Singapore

Yen-Ju Lu

Johns Hopkins University, United States

Shaily Malik

Maharaja Surajmal Institute of Technology, India

Hsu Myat Mo

University of Computer Studies, Myanmar

Banaja Mohanty

Veer Surendra Sai University of Technology, India

Huy Van Nguyen

Vinbigdata, Vietnam

Chutamanee Onsuwan

Thammasat University, Thailand

Ronald M Pascual

De La Salle University, Philippines

Priyankoo Sarmah

Indian Institute of Technology Guwahati, India

Ruchi Sehrawat

Guru Gobind Singh Indraprastha University, India

Bidisha Sharma

Uniphore, United States

Hay Mar Soe Naing

University of Computer Studies, Myanmar

Ming-Hsiang Su

Soochow University, Taiwan

Yu Tsao

Academia Sinica, Taiwan

Samudra Vijaya

Koneru Lakshmaiah Education Foundation, India

Deepali Virmani

Guru Tegh Bahadur Institute of Technology, India

Hsin-Min Wang

Academia Sinica, Taiwan

Kuan-Chen Wang

National Taiwan University, Taiwan

Syu-Siang Wang

Yuan Ze University, Taiwan

Chia-Hua Wu

Academia Sinica, Taiwan

Yanlu Xie

Beijing Language and Culture University, China

Cheng Yu

The Ohio State University, United States

Ryandhimas E Zezario

Academia Sinica, Taiwan