Skip to content

评测数据

BFCL

#评测指标-抽象句法树或执行结果匹配

评测方式:如果模型支持函数调用,则按照其官方提供方式进行,否则使用BFCL统一的系统提示语

数据描述:

Berkeley Function-Calling Leaderboard (BFCL)是目前最全面的大模型函数工具调用评测基准,由加州大学伯克利分校出品。

具体形式为:

  • Python: 指定函数调用、函数选择调用、多函数调用(Parallel Function)、多函数选择调用
  • 非Python: 聊天能力(不提供任何函数的场景)、指定函数相关性/不相关性检测、其他语言或函数调用场景(REST API, SQL, Java, Javascript)

从v3开始增加了对不同复杂程度多步或多轮对话情景的考察。

评测数据量:

BFCL v3包含1,000条数据供评测使用。详细组成参见这里

源数据集问题样例:

{
  <!-- 指定函数调用 -->
  "question": [[{"role": "user", "content": "I've been playing a game where rolling a six is somehow more likely than usual, and the chance of it happening on a single roll is 60%. I'm curious, if I roll the die 20 times, what are the odds that I'll get exactly five sixes?"}]],
  "function": [
    {
      "name": "calc_binomial_probability",
      "description": "Calculates the probability of getting k successes in n trials.",
      "parameters": {"type": "dict", "properties": {"n": {"type": "integer", "description": "The number of trials."}, "k": {"type": "integer", "description": "The number of successes."}, "p": {"type": "float", "description": "The probability of success."}}, "required": ["n", "k", "p"]}
    }
  ],
  "execution_result_type": ["exact_match"],
  "ground_truth": ["calc_binomial_probability(n=20, k=5, p=0.6)"]
}
{
  <!-- 多函数调用 -->
  "question": [[{"role": "user", "content": "Could you fetch me the latest news from Paris, France, and also the current weather in Letterkenny, Ireland, in Celsius?"}]],
  "function": [
    {
      "name": "get_news_report",
      "description": "Fetches the latest news based on a specific location, typically a city and state.",
      "parameters": {"type": "dict", "required": ["location"],"properties": {"location": {"type": "string", "description": "The location for which to get the news, in the format of 'City, State (abbr)' or 'City, Country', such as 'San Francisco, CA', 'Paris, France', or 'New York, NY'."}}}
    },
    {
      "name": "get_current_weather",
      "description": "Retrieves the current weather conditions for a specified location, with options for units of temperature measurement.",
      "parameters": {"type": "dict", "required": ["location"], "properties": {"location": {"type": "string", "description": "The location for which weather data is to be fetched, in the format of 'City, State (abbr)', such as 'San Francisco, CA'. Or 'City, Country'"}, "unit": {"type": "string", "description": "The unit of temperature for the weather report.", "enum": ["celsius", "fahrenheit"], "default": "fahrenheit"}}}
    }
  ]
}
{
  <!-- 不相关性检测 -->
  "question": [[{"role": "user", "content": "Find the integral of x^3 from 1 to 5"}]],
  "function": [
    {
      "name": "str_to_int",
      "description": "Converts string value to integer.",
      "parameters": {"type": "dict", "properties": {"value": {"type": "string", "description": "String value to be converted to integer"}}, "required": ["value"]}}
  ]
}

论文引用:

@misc{berkeley-function-calling-leaderboard,
  title={Berkeley Function Calling Leaderboard},
  author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E. Gonzalez},
  howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}},
  year={2024},
}

数据集版权使用说明:

Apache 2.0