DeEAR

Decoding the Ear (DeEAR): A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Zhiyu Lin1, 2, Jingwen Yang1, 2, Jiale Zhao2, Meng Liu2, Sunzhu Li2, Zhengjun Yue1, 3, Benyou Wang1, *

1The Chinese University of Hong Kong, Shenzhen, China   |   2Li Auto Inc., China   |   3Shenzhen Loop Area Institution, China

Audio Samples for DeEAR

This page provides audio demonstrations to help readers intuitively understand the DeEAR framework proposed in our paper.

The page contains two parts:

  1. DeEAR Score Examples: Audio samples across High, Medium, and Low expressiveness levels.
  2. S2S Model Fine-Tuning: Comparison before and after fine-tuning (Section 4.3).

Part 1: DeEAR Score Examples

Level
Audio Sample
DeEAR Scores (0-100)
Low
Overall (expressive): 3.43Emo: 53.72Pros: 49.00Spon: 38.52
Low
Overall (expressive): 4.21Emo: 38.41Pros: 32.83Spon: 34.66
Low
Overall (expressive): 24.52Emo: 33.35Pros: 23.47Spon: 70.09
Low
Overall (expressive): 29.82Emo: 29.51Pros: 26.05Spon: 73.69
Medium
Overall (expressive): 36.46Emo: 30.23Pros: 22.17Spon: 82.84
Medium
Overall (expressive): 52.15Emo: 54.51Pros: 50.59Spon: 93.88
Medium
Overall (expressive): 55.22Emo: 62.18Pros: 60.02Spon: 92.30
High
Overall (expressive): 74.05Emo: 77.91Pros: 71.00Spon: 100.00
High
Overall (expressive): 74.23Emo: 74.62Pros: 75.50Spon: 99.62
High
Overall (expressive): 80.06Emo: 92.63Pros: 66.50Spon: 95.32
High
Overall (expressive): 96.43Emo: 86.93Pros: 71.50Spon: 94.59
High
Overall (expressive): 98.46Emo: 100.00Pros: 72.50Spon: 96.98
High
Overall (expressive): 99.08Emo: 96.79Pros: 70.50Spon: 95.54

Part 2: S2S Model Fine-Tuning

Input text
Output (S2S-Base)
Output (S2S-FT)
嗯嗯,听起来不错,我会试试看的。哎,但我还是觉得时间不够用。
你当然是我师傅啦。一日为师,终身为师。 师傅。你这些年过得好吗? 哎呀,你看我问的。看海天悦这么大的规模,就知道师傅这几年一定过得很好啊。
多宝啊,这不是这两天西安鸿乐园那边事儿多吗?他帮忙去了。
那倒是您的厨艺可是大厨级别的。 那我吃了啊。
我们什么时候去呢?我这个周末我没事。
哎呦,我是真戒了,洛林坚持让我戒了。