The 27th International Conference of the Oriental COCOSDA

O-COCOSDA 2024: The 27th International Conference of the Oriental COCOSDA will take place from October 17-19 in Hsinchu, Taiwan. Hosted once again by the National Yang Ming Chiao Tung University (NYCU), this conference will be held in-person. We kindly encourage your physical attendance to fully experience and engage with the event.

Oriental COCOSDA (O-COCOSDA), the Oriental chapter of COCOSDA, an acronym of the International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques, was established in 1997. The purpose of O-COCOSDA is to exchange ideas, share information, and discuss regional matters on the creation, utilization, and dissemination of spoken language corpora of oriental languages and also on the assessment methods of speech recognition/synthesis systems as well as promote speech research on oriental languages.

The annual Oriental COCOSDA international conference is the flag conference of Oriental COCOSDA. The first preparatory meeting was held in Hong Kong in 1997 and then the past 26 workshops were held in Japan, Taiwan, China, Korea, Thailand, Singapore, India, Indonesia, Malaysia, Vietnam, Japan, China, Nepal, Taiwan, Macau, India, Thailand, China, Malaysia, Korea, Japan, Philippines, Myanmar, Singapore, Vietnam and India. The 27th Oriental COCOSDA Conference is returning to Taiwan and will be held on Oct. 17-19, in Hsinchu, Taiwan hosted again by the National Yang Ming Chiao Tung University (NYCU), Taiwan.

The organizers of O-COCOSDA 2024 invite all researchers, practitioners, industry partners and sponsors to join the conference. The O-COCOSDA venue provides a regular forum for the presentation of international cooperation in developing speech corpora and coordinating assessment methods of speech input/output systems for both academic and industry researchers. Continuing a series of 26 successful previous meetings, this conference spans the research interest areas of database development and assessment methods. We thank you for your support and look forward to welcoming you to the conference. Stay safe and healthy!

We are pleased to inform you that the conference will be held in-person. Your physical attendance is highly encouraged to facilitate the full experience of the event. However, if you encounter special circumstances (e.g., issues with visa applications), please discuss this with the organizers to seek approval for alternative arrangements.

Time	Day 1: 10/17 (Thu.)	Day 2: 10/18 (Fri.)	Day 3: 10/19 (Sat.)
08：45-09：00	Opening Ceremony
09：00-10：00	Keynote Speech #1 Prof. Jen-Tzung Chien	Keynote Speech #2 Prof. Chin-Hui Lee	Keynote Speech #3 Prof. Satoshi Nakamura
10：00-10：30	Coffee Break	Coffee Break	Coffee Break
10：30-11：45	Oral session #1 Speech Corpora	Oral session #4 Best Paper Candidates	Country Report
11：45-13：00	Lunch Break	Lunch Break	Closing Ceremony （end at 12:15)
13：00-14：00	Oral Session #2 Speech Corpora	SanXia+YingGe Cultural Tour
14：00-15：00	Poster Session #1 Language Acquisition, Pronunciation, and Speech Learning
15：00-15：30	Coffee Break
15：30-16：30	Oral Session #3 Speech Generation
16：30-17：30	Poster Session #2 Speech Technology, Machine Learning, and AI in Speech Processing
18：30-20：30	Welcome Reception	Banquet

Registration Type		Early (Sept. 1- Sept. 30)	Late (Oct. 1– Oct. 10)	Onsite (Oct. 17-19)
Regular	Member	NTD 10,000	NTD 11,500	NTD 13,000
Regular	Non-Member	NTD 13,000	NTD 14,500	NTD 16,000
Student	Member	NTD 5,000	NTD 6,500	NTD 8,000
Student	Non-Member	NTD 6,500	NTD 8,000	NTD 9,500

From Hsinchu Train Station	From NYCU	Note
06:20	06:45	10/19 service cancelled
06:50	07:25
08:05	08:40
08:45	09:20
10:35	11:10
12:25	13:00
13:10	13:45
14:25	15:00
15:30	16:00
16:30	17:05
17:55	18:30

Date	From Hsinchu High Speed Rail Station	From NYCU	Note
10/17	08:00	20:30
10/18			20:30 (After The Banquet) Route: Ambassador Hsinchu -> Hsinchu Train Station-> The Ho Hotel -> NYCU (The conference venue)
10/19		13:30

News

Welcome to O-COCOSDA 2024!

Program

Oral Session #1 Time: Thursday, October 17, 2024, 10:30 -11:45

Speech CorporaSession Chair: Yi-Wen Liu, National Tsing Hua University, Taiwan

10:30-10:45 Check Your Audio Data: Nkululeko for Bias Detection

10:45-11:00 CAO Robot for Taiwanese/English Knowledge Graph Application

11:00-11:15 Instant-EMDB: A Multi Model Spontaneous English and Malayalam Speech Corpora For Depression Detection

11:15-11:30 Chinese Psychological Counseling Corpus Construction for Valence-Arousal Sentiment Intensity Prediction

11:30-11:45 UCSYSpoof: A Myanmar Language Dataset for Voice Spoofing Detection

Oral Session #2 Time: Thursday, October 17, 2024, 13:00 -14:00

Speech CorporaSession Chair: Ming-Hsiang Su, Soochow University, Taiwan

13:00-13:15 Speech Watermarking for Tampering Detection using Singular Spectrum Analysis with Quantization Index Modulation and Psychoacoustic Model

13:15-13:30 IIITSAINT-EMOMDB: Carefully Curated Malayalam Speech Corpus With Emotion And Self-reported Depression Ratings

13:30-13:45 CL-CHILD Corpus: The Phonological Development of Putonghua in Children from Dialect-speaking Regions

13:45-14:00 WikiTND24: A Chinese Text Normalization Database

Oral Session #3 Time: Thursday, October 17, 2024, 15:30 -16:30

Speech GenerationSession Chair: Ying-Hui Lai, National Yang Ming Chiao Tung University, Taiwan

15:30-15:45 VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

15:45-16:00 Learning Contrastive Emotional Nuances in Speech Synthesis

16:00-16:15 Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and BERT LID

16:15-16:30 Exemplar-based Methods For Mandarin Electrolaryngeal Speech Voice Conversion

Oral Session #4 Time: Friday, October 18, 2024, 10:30 -11:45

Best Paper CandidatesSession Chair: Yu Taso, Academia Sinica, Taiwan

10:30-10:45 Proposal of Protocols For Speech Materials Acquisition And Presentation Assisted By Tools Based On Structured Test Signals

10:45-11:00 Exploring Impact of Prioritizing Intra-Singer Acoustic Variations on Singer Embedding Extractor Construction for Singer Verification

11:00-11:15 A Feedback-driven Self-improvement Strategy And Emotion-aware Vocoder For Emotional Voice Conversion

11:15-11:30 ConvCounsel: A Conversational Dataset for Student Counseling

11:30-11:45 Construction of Large Language Models for Taigi and Hakka Using Transfer Learning

Poster Session #1 Time: Thursday, October 17, 2024, 14:00 -15:00

Language Acquisition, Pronunciation, and Speech Learning

P1-1 A Parameter-efficient Multi-step Fine-tuning of Multilingual And Multi-task Learning Model for Japanese Dialect Speech Recognition

P1-2 A Study on The Acquisition of Triphthong Vowels by Altaic Chinese Learners Under The ‘belt And Road’ Initiative

P1-3 Acoustic Realization of /s/ Across Accents Of Urdu

P1-4 Age-related and Gender-related Differences in Cantonese Vowels

P1-5 An Investigation of Chinese Speech Under Alcohol Influence: Database Construction and Phonetic Analysis

P1-6 Analysis of Pathological Features for Spoof Detection

P1-7 Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

P1-8 Clapping Hands To Word Stress Improves Children’s L2 English Pronunciation Accuracy in A Word Imitation Task: Evidence from A Classroom Study

P1-9 Comparative Study on The Phonetic Characteristics of Chinese Vowels Between Kyrgyz and Kirgiz Learners

P1-10 Computer-assisted Pronunciation Training System for Atayal, An Indigenous Language in Taiwan

P1-11 Continual Gated Adapter for Bilingual Codec Text-to-speech

P1-12 Continual Learning in Machine Speech Chain Using Gradient Episodic Memory

P1-13 Developing A Robust Mispronunciation Detection by Data Augmentation Based on Automatic Phone Annotation

P1-14 Developing A Thai Name Pronunciation Dictionary from Road Signs and Naming Websites

P1-15 Development of An English Oral Assessment System with The Gept Dataset

P1-16 Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities

P1-17 Enhancing Phoneme Recognition in The Bengali Language Through Fine-tuning of Multilingual Model

P1-18 Exploration of Mongolian Word Stress Research Methods Based on Intonation Synthesis Technology

P1-19 Fusion of Multiple Audio Descriptors for The Recognition of Dysarthric Speech

P1-20 Gated Adapters with Balanced Activation for Effective Contextual Speech Recognition

P1-21 Improving Speech Recognition by Enhancing Accent Discrimination

P1-22 Research on the Temporal Effect of Focus on Trisyllabic Sequences in Leizhou Min

P1-23 Right-prominent Trisyllabic Tone Sandhi in Taifeng Chinese

P1-24 The Development of LOTUS-TRD: A Thai Regional Dialect Speech Corpus

P1-25 The Effectiveness of Audio-visual Feedback for L2 Chinese Sentence Stress Perception And Production

P1-26 Unified Spoken Language Proficiency Assessment System

P1-27 Using Automatic Speech Recognition for Speech Comprehension Evaluation in The Cochlear Implant

Poster Session #2 Time: Thursday, October 17, 2024, 16:30 -17:30

Speech Technology, Machine Learning, and AI in Speech Processing

P2-1 A Deep Learning Based Approach with Data Augmentation For Infant Cry Sound Verification

P2-2 A Preliminary Study on End-to-end Multimodal Subtitle Recognition for Taiwanese TV Programs

P2-3 A Preliminary Study on Taiwanese POS Taggers: Leveraging Chinese in The Absence Of Taiwanese POS Annotation Datasets

P2-4 Agent-Driven Large Language Models for Mandarin Lyric Generation

P2-5 An Evaluation of Neural Vocoder-based Voice Cloning System for Dysphonia Speech Disorder

P2-6 An N-best List Selection Framework for ASR N-best Rescoring

P2-7 Analysis and Detection of Differences in Spoken User Behaviors Between Autonomous and Wizard-of-oz Systems

P2-8 Analysis and Discussion of Feature Extraction Technology for Musical Genre Classification

P2-9 Annotation of Addressing Behavior in Multi-party Conversation

P2-10 Benchmarking Clickbait Detection from News Headlines

P2-11 Chunk Size Scheduling for Optimizing The Quality-latency Trade-off in Simultaneous Speech Translation

P2-12 Comprehensive Benchmarking and Analysis of Open Pre-trained Thai Speech Recognition Models

P2-13 Depression Classification Using Log-mel Spectrograms: A Comparative Analysis of Window Size-based Data Augmentation and Deep Learning Models

P2-14 Effects of Multiple Japanese Datasets for Training Voice Activity Projection Models

P2-15 Exploring Branchformer-based End-to-end Speaker Diarization with Speaker-wise VAD Loss

P2-16 IIIT-speech Twins 1.0: An English-Hindi Parallel Speech Corpora for Speech-to-speech Machine Translation And Automatic Dubbing

P2-17 Improving Real-Time Music Accompaniment Separation with MMDenseNet

P2-18 Infant Cry Verification With Multi-view Self-attention Vision Transformers

P2-19 Modeling Response Relevance Using Dialog Act and Utterance-design Features: A Corpus-based Analysis

P2-20 Multi-resolution Singing Voice Separation

Oral Session #1
Time: Thursday, October 17, 2024, 10:30 -11:45

Speech Corpora
Session Chair: Yi-Wen Liu, National Tsing Hua University, Taiwan

10:30-10:45
Check Your Audio Data: Nkululeko for Bias Detection

10:45-11:00
CAO Robot for Taiwanese/English Knowledge Graph Application

11:00-11:15
Instant-EMDB: A Multi Model Spontaneous English and Malayalam Speech Corpora For Depression Detection

11:15-11:30
Chinese Psychological Counseling Corpus Construction for Valence-Arousal Sentiment Intensity Prediction

11:30-11:45
UCSYSpoof: A Myanmar Language Dataset for Voice Spoofing Detection

Oral Session #2
Time: Thursday, October 17, 2024, 13:00 -14:00

Speech Corpora
Session Chair: Ming-Hsiang Su, Soochow University, Taiwan

13:00-13:15
Speech Watermarking for Tampering Detection using Singular Spectrum Analysis with Quantization Index Modulation and Psychoacoustic Model

13:15-13:30
IIITSAINT-EMOMDB: Carefully Curated Malayalam Speech Corpus With Emotion And Self-reported Depression Ratings

13:30-13:45
CL-CHILD Corpus: The Phonological Development of Putonghua in Children from Dialect-speaking Regions

13:45-14:00
WikiTND24: A Chinese Text Normalization Database

Oral Session #3
Time: Thursday, October 17, 2024, 15:30 -16:30

Speech Generation
Session Chair: Ying-Hui Lai, National Yang Ming Chiao Tung University, Taiwan

15:30-15:45
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

15:45-16:00
Learning Contrastive Emotional Nuances in Speech Synthesis

16:00-16:15
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and BERT LID

16:15-16:30
Exemplar-based Methods For Mandarin Electrolaryngeal Speech Voice Conversion

Oral Session #4
Time: Friday, October 18, 2024, 10:30 -11:45

Best Paper Candidates
Session Chair: Yu Taso, Academia Sinica, Taiwan

10:30-10:45
Proposal of Protocols For Speech Materials Acquisition And Presentation Assisted By Tools Based On Structured Test Signals

10:45-11:00
Exploring Impact of Prioritizing Intra-Singer Acoustic Variations on Singer Embedding Extractor Construction for Singer Verification

11:00-11:15
A Feedback-driven Self-improvement Strategy And Emotion-aware Vocoder For Emotional Voice Conversion

11:15-11:30
ConvCounsel: A Conversational Dataset for Student Counseling

11:30-11:45
Construction of Large Language Models for Taigi and Hakka Using Transfer Learning

Poster Session #1
Time: Thursday, October 17, 2024, 14:00 -15:00

P1-1
A Parameter-efficient Multi-step Fine-tuning of Multilingual And Multi-task Learning Model for Japanese Dialect Speech Recognition

P1-2
A Study on The Acquisition of Triphthong Vowels by Altaic Chinese Learners Under The ‘belt And Road’ Initiative

P1-3
Acoustic Realization of /s/ Across Accents Of Urdu

P1-4
Age-related and Gender-related Differences in Cantonese Vowels

P1-5
An Investigation of Chinese Speech Under Alcohol Influence: Database Construction and Phonetic Analysis

P1-6
Analysis of Pathological Features for Spoof Detection

P1-7
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

P1-8
Clapping Hands To Word Stress Improves Children’s L2 English Pronunciation Accuracy in A Word Imitation Task: Evidence from A Classroom Study

P1-9
Comparative Study on The Phonetic Characteristics of Chinese Vowels Between Kyrgyz and Kirgiz Learners

P1-10
Computer-assisted Pronunciation Training System for Atayal, An Indigenous Language in Taiwan

P1-11
Continual Gated Adapter for Bilingual Codec Text-to-speech

P1-12
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory

P1-13
Developing A Robust Mispronunciation Detection by Data Augmentation Based on Automatic Phone Annotation

P1-14
Developing A Thai Name Pronunciation Dictionary from Road Signs and Naming Websites

P1-15
Development of An English Oral Assessment System with The Gept Dataset

P1-16
Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities

P1-17
Enhancing Phoneme Recognition in The Bengali Language Through Fine-tuning of Multilingual Model

P1-18
Exploration of Mongolian Word Stress Research Methods Based on Intonation Synthesis Technology

P1-19
Fusion of Multiple Audio Descriptors for The Recognition of Dysarthric Speech

P1-20
Gated Adapters with Balanced Activation for Effective Contextual Speech Recognition

P1-21
Improving Speech Recognition by Enhancing Accent Discrimination

P1-22
Research on the Temporal Effect of Focus on Trisyllabic Sequences in Leizhou Min

P1-23
Right-prominent Trisyllabic Tone Sandhi in Taifeng Chinese

P1-24
The Development of LOTUS-TRD: A Thai Regional Dialect Speech Corpus

P1-25
The Effectiveness of Audio-visual Feedback for L2 Chinese Sentence Stress Perception And Production

P1-26
Unified Spoken Language Proficiency Assessment System

P1-27
Using Automatic Speech Recognition for Speech Comprehension Evaluation in The Cochlear Implant

Poster Session #2
Time: Thursday, October 17, 2024, 16:30 -17:30

P2-1
A Deep Learning Based Approach with Data Augmentation For Infant Cry Sound Verification

P2-2
A Preliminary Study on End-to-end Multimodal Subtitle Recognition for Taiwanese TV Programs

P2-3
A Preliminary Study on Taiwanese POS Taggers: Leveraging Chinese in The Absence Of Taiwanese POS Annotation Datasets

P2-4
Agent-Driven Large Language Models for Mandarin Lyric Generation

P2-5
An Evaluation of Neural Vocoder-based Voice Cloning System for Dysphonia Speech Disorder

P2-6
An N-best List Selection Framework for ASR N-best Rescoring

P2-7
Analysis and Detection of Differences in Spoken User Behaviors Between Autonomous and Wizard-of-oz Systems

P2-8
Analysis and Discussion of Feature Extraction Technology for Musical Genre Classification

P2-9
Annotation of Addressing Behavior in Multi-party Conversation

P2-10
Benchmarking Clickbait Detection from News Headlines

P2-11
Chunk Size Scheduling for Optimizing The Quality-latency Trade-off in Simultaneous Speech Translation

P2-12
Comprehensive Benchmarking and Analysis of Open Pre-trained Thai Speech Recognition Models

P2-13
Depression Classification Using Log-mel Spectrograms: A Comparative Analysis of Window Size-based Data Augmentation and Deep Learning Models

P2-14
Effects of Multiple Japanese Datasets for Training Voice Activity Projection Models

P2-15
Exploring Branchformer-based End-to-end Speaker Diarization with Speaker-wise VAD Loss

P2-16
IIIT-speech Twins 1.0: An English-Hindi Parallel Speech Corpora for Speech-to-speech Machine Translation And Automatic Dubbing

P2-17
Improving Real-Time Music Accompaniment Separation with MMDenseNet

P2-18
Infant Cry Verification With Multi-view Self-attention Vision Transformers

P2-19
Modeling Response Relevance Using Dialog Act and Utterance-design Features: A Corpus-based Analysis

P2-20
Multi-resolution Singing Voice Separation

P2-21
Multilingual speech translator for medical consultation

P2-22
Overcoming The Impact Of Different Materials On Optical Microphones for Speech Capture Using Deep Learning Integrated Training Data

P2-23
Robust Audio-visual Speech Enhancement: Correcting Misassignments in Complex Environments With Advanced Post-processing

P2-24
Singer Separation for Karaoke Content Generation

P2-25
Uncertainty-based Ensemble Learning for Speech Classification