Skip to content

The Interactive Emotional Dyadic Motion Capture (IEMOCAP)

#Metrics-WAR UAR

Adapter:

Linear Classifier

Data description:

The Interactive Emotion Motion Capture (IEMOCAP) database is a performative, multimodal, and multispeaker emotion database. It contains approximately 12 hours of audio-visual data, including video, audio, facial motion capture, and text transcription. It includes conversational sessions with actors performing improvisation or scripted scenarios.

Dataset structure:

Amount of source data:

The dataset is split into Neutral (1708), Angry (1103), Sad (1084), Happy (595), Excited (1041), Scared (40), Surprised (107), Frustrated (1849), and Other (2507).

Amount of evaluation data:

The evaluation dataset are 5531 instances (Neutral, Angry, Sad, Happy) from the source dataset, where samples from the Excited and Happy categories are combined.

Data detail:

KEYSEXPLAIN
iddata id
sentencethe content of the speech
labelemotion label

Sample of dataset:

{
  "id": Ses01F_impro01_F000,
  "sentence": "Excuse me.",
  "label": "neu"
}

Citation information:

@article{busso2008iemocap,
  title={IEMOCAP: Interactive emotional dyadic motion capture database},
  author={Busso, Carlos and Bulut, Murtaza and Lee, Chi-Chun and Kazemzadeh, Abe and Mower, Emily and Kim, Samuel and Chang, Jeannette N and Lee, Sungbok and Narayanan, Shrikanth S},
  journal={Language resources and evaluation},
  volume={42},
  pages={335--359},
  year={2008},
  publisher={Springer}
}

Licensing information:

IEMOCAP License

MSP-IMPROV

#Metrics-WAR UAR

Data description:

MSP-IMPROV database is a performative, multimodal, and multispeaker emotion database. It is constructed similar to IEMOCAP dataset but with 12 actors and six sessions.

Dataset structure:

Amount of evaluation data:

The evaluation dataset are 7,798 instances (Neutral, Angry, Sad, Happy).

Data detail:

KEYSEXPLAIN
iddata id
sentencethe content of the speech
labelemotion label

Sample of dataset:

{
  "id": MSP-IMPROV-S01A-F01-P-FM01,
  "sentence": "I have to go to class. How can I not? Okay.",
  "label": "ang"
}

Citation information:

@article{busso2016msp,
  title={MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception},
  author={Busso, Carlos and Parthasarathy, Srinivas and Burmania, Alec and AbdelWahab, Mohammed and Sadoughi, Najmeh and Provost, Emily Mower},
  journal={IEEE Transactions on Affective Computing},
  volume={8},
  number={1},
  pages={67--80},
  year={2016},
  publisher={IEEE}
}

Licensing information:

MSP-IMPROV License

EmotionTalk

#Metrics-WAR UAR

Data description:

EmotionTalk is an interactive Chinese multimodal emotion dataset with rich annotations. It was released by Nankai University. This dataset provides multimodal information from 19 actors participating in dyadic conversation settings, incorporating acoustic, visual, and textual modalities. It includes 23.6 hours of speech (19,250 utterances), annotations for 7 utterance-level emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral), 5-dimensional sentiment labels (negative, weakly negative, neutral, weakly positive, and positive) and 4-dimensional speech captions (speaker, speaking style, emotion and overall).

Dataset Composition and Specifications

Source Data Volume

Training set: 15413 Validation set: 1908 Test set: 1929

Evaluation Data Volume

Public test set: 1929 utterances

Data Fields

Includes video, audio files, sentence transcriptions, and emotion discrete / continuous /caption annotations.

data/  
├── audio/*.tar  
├── Text/*.tar  
├── Video/*.tar  
└── Multimodal/*.tar

Dataset Example

{
    "data": {
        "A": {
            "emotion": "happy",
            "Confidence_degree": "9",
            "Continuous_label": 1
        },
        "B": {
            "emotion": "happy",
            "Confidence_degree": "9",
            "Continuous_label": 0
        },
        "C": {
            "emotion": "happy",
            "Confidence_degree": "9",
            "Continuous_label": 1
        },
        "D": {
            "emotion": "happy",
            "Confidence_degree": "9",
            "Continuous_label": 1
        },
        "E": {
            "emotion": "happy",
            "Confidence_degree": "7",
            "Continuous_label": 1
        }
    },
    "speaker_id": "07",
    "emotion_result": "happy",
    "content": "哎,发现我有什么变化没有?",
    "Continuous label_result": 0.8,
    "file_name": "G00002/G00002_01/G00002_01_07/G00002_01_07_001.mp4"
}

Citation

@article{sun2025emotiontalk,
  title={EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations},
  author={Sun, Haoqin and Wang, Xuechen and Zhao, Jinghua and Zhao, Shiwan and Zhou, Jiaming and Wang, Hui and He, Jiabei and Kong, Aobo and Yang, Xi and Wang, Yequan and others},
  journal={arXiv preprint arXiv:2505.23018},
  year={2025}
}

License

CC BY-NC-SA 4.0