评测数据
BFCL
评测方式:如果模型支持函数调用,则按照其官方提供方式进行,否则使用BFCL统一的系统提示语。
数据描述:
Berkeley Function-Calling Leaderboard (BFCL)是目前最全面的大模型函数工具调用评测基准,由加州大学伯克利分校出品。
具体形式为:
- Python: 指定函数调用、函数选择调用、多函数调用(Parallel Function)、多函数选择调用
- 非Python: 聊天能力(不提供任何函数的场景)、指定函数相关性/不相关性检测、其他语言或函数调用场景(REST API, SQL, Java, Javascript)
从v3开始增加了对不同复杂程度多步或多轮对话情景的考察。
评测数据量:
BFCL v3包含1,000条数据供评测使用。详细组成参见这里。
源数据集问题样例:
{
<!-- 指定函数调用 -->
"question": [[{"role": "user", "content": "I've been playing a game where rolling a six is somehow more likely than usual, and the chance of it happening on a single roll is 60%. I'm curious, if I roll the die 20 times, what are the odds that I'll get exactly five sixes?"}]],
"function": [
{
"name": "calc_binomial_probability",
"description": "Calculates the probability of getting k successes in n trials.",
"parameters": {"type": "dict", "properties": {"n": {"type": "integer", "description": "The number of trials."}, "k": {"type": "integer", "description": "The number of successes."}, "p": {"type": "float", "description": "The probability of success."}}, "required": ["n", "k", "p"]}
}
],
"execution_result_type": ["exact_match"],
"ground_truth": ["calc_binomial_probability(n=20, k=5, p=0.6)"]
}
{
<!-- 多函数调用 -->
"question": [[{"role": "user", "content": "Could you fetch me the latest news from Paris, France, and also the current weather in Letterkenny, Ireland, in Celsius?"}]],
"function": [
{
"name": "get_news_report",
"description": "Fetches the latest news based on a specific location, typically a city and state.",
"parameters": {"type": "dict", "required": ["location"],"properties": {"location": {"type": "string", "description": "The location for which to get the news, in the format of 'City, State (abbr)' or 'City, Country', such as 'San Francisco, CA', 'Paris, France', or 'New York, NY'."}}}
},
{
"name": "get_current_weather",
"description": "Retrieves the current weather conditions for a specified location, with options for units of temperature measurement.",
"parameters": {"type": "dict", "required": ["location"], "properties": {"location": {"type": "string", "description": "The location for which weather data is to be fetched, in the format of 'City, State (abbr)', such as 'San Francisco, CA'. Or 'City, Country'"}, "unit": {"type": "string", "description": "The unit of temperature for the weather report.", "enum": ["celsius", "fahrenheit"], "default": "fahrenheit"}}}
}
]
}
{
<!-- 不相关性检测 -->
"question": [[{"role": "user", "content": "Find the integral of x^3 from 1 to 5"}]],
"function": [
{
"name": "str_to_int",
"description": "Converts string value to integer.",
"parameters": {"type": "dict", "properties": {"value": {"type": "string", "description": "String value to be converted to integer"}}, "required": ["value"]}}
]
}
论文引用:
@misc{berkeley-function-calling-leaderboard,
title={Berkeley Function Calling Leaderboard},
author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E. Gonzalez},
howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}},
year={2024},
}
数据集版权使用说明:
Apache 2.0