The Interactive Emotional Dyadic Motion Capture (IEMOCAP)
#Metrics-WAR UAR
Adapter:
Linear Classifier
Data description:
The Interactive Emotion Motion Capture (IEMOCAP) database is a performative, multimodal, and multispeaker emotion database. It contains approximately 12 hours of audio-visual data, including video, audio, facial motion capture, and text transcription. It includes conversational sessions with actors performing improvisation or scripted scenarios.
Dataset structure:
Amount of source data:
The dataset is split into Neutral (1708), Angry (1103), Sad (1084), Happy (595), Excited (1041), Scared (40), Surprised (107), Frustrated (1849), and Other (2507).
Amount of evaluation data:
The evaluation dataset are 5531 instances (Neutral, Angry, Sad, Happy) from the source dataset, where samples from the Excited and Happy categories are combined.
Data detail:
KEYS | EXPLAIN |
---|---|
id | data id |
sentence | the content of the speech |
label | emotion label |
Sample of dataset:
{
"id": Ses01F_impro01_F000,
"sentence": "Excuse me.",
"label": "neu"
}
Citation information:
@article{busso2008iemocap,
title={IEMOCAP: Interactive emotional dyadic motion capture database},
author={Busso, Carlos and Bulut, Murtaza and Lee, Chi-Chun and Kazemzadeh, Abe and Mower, Emily and Kim, Samuel and Chang, Jeannette N and Lee, Sungbok and Narayanan, Shrikanth S},
journal={Language resources and evaluation},
volume={42},
pages={335--359},
year={2008},
publisher={Springer}
}
Licensing information:
MSP-IMPROV
#Metrics-WAR UAR
Data description:
MSP-IMPROV database is a performative, multimodal, and multispeaker emotion database. It is constructed similar to IEMOCAP dataset but with 12 actors and six sessions.
Dataset structure:
Amount of evaluation data:
The evaluation dataset are 7,798 instances (Neutral, Angry, Sad, Happy).
Data detail:
KEYS | EXPLAIN |
---|---|
id | data id |
sentence | the content of the speech |
label | emotion label |
Sample of dataset:
{
"id": MSP-IMPROV-S01A-F01-P-FM01,
"sentence": "I have to go to class. How can I not? Okay.",
"label": "ang"
}
Citation information:
@article{busso2016msp,
title={MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception},
author={Busso, Carlos and Parthasarathy, Srinivas and Burmania, Alec and AbdelWahab, Mohammed and Sadoughi, Najmeh and Provost, Emily Mower},
journal={IEEE Transactions on Affective Computing},
volume={8},
number={1},
pages={67--80},
year={2016},
publisher={IEEE}
}
Licensing information:
EmotionTalk
#Metrics-WAR UAR
Data description:
EmotionTalk is an interactive Chinese multimodal emotion dataset with rich annotations. It was released by Nankai University. This dataset provides multimodal information from 19 actors participating in dyadic conversation settings, incorporating acoustic, visual, and textual modalities. It includes 23.6 hours of speech (19,250 utterances), annotations for 7 utterance-level emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral), 5-dimensional sentiment labels (negative, weakly negative, neutral, weakly positive, and positive) and 4-dimensional speech captions (speaker, speaking style, emotion and overall).
Dataset Composition and Specifications
Source Data Volume
Training set: 15413 Validation set: 1908 Test set: 1929
Evaluation Data Volume
Public test set: 1929 utterances
Data Fields
Includes video, audio files, sentence transcriptions, and emotion discrete / continuous /caption annotations.
data/
├── audio/*.tar
├── Text/*.tar
├── Video/*.tar
└── Multimodal/*.tar
Dataset Example
{
"data": {
"A": {
"emotion": "happy",
"Confidence_degree": "9",
"Continuous_label": 1
},
"B": {
"emotion": "happy",
"Confidence_degree": "9",
"Continuous_label": 0
},
"C": {
"emotion": "happy",
"Confidence_degree": "9",
"Continuous_label": 1
},
"D": {
"emotion": "happy",
"Confidence_degree": "9",
"Continuous_label": 1
},
"E": {
"emotion": "happy",
"Confidence_degree": "7",
"Continuous_label": 1
}
},
"speaker_id": "07",
"emotion_result": "happy",
"content": "哎,发现我有什么变化没有?",
"Continuous label_result": 0.8,
"file_name": "G00002/G00002_01/G00002_01_07/G00002_01_07_001.mp4"
}
Citation
@article{sun2025emotiontalk,
title={EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations},
author={Sun, Haoqin and Wang, Xuechen and Zhao, Jinghua and Zhao, Shiwan and Zhou, Jiaming and Wang, Hui and He, Jiabei and Kong, Aobo and Yang, Xi and Wang, Yequan and others},
journal={arXiv preprint arXiv:2505.23018},
year={2025}
}
License
CC BY-NC-SA 4.0