Evaluation Data
BFCL
# Evaluation Metrics - Abstract Syntax Tree or Execution Result Matching
Evaluation method: if the model supports function calls, follow its official provision, otherwise use the BFCL unified system prompts.
Data Description:
Berkeley Function-Calling Leaderboard (BFCL) is the most comprehensive large model function tool-calling evaluation benchmark available, produced by the University of California, Berkeley.
The specific form is:
- Python: Specified Function Calling, Function Selection Calling, Multiple Function Calling (Parallel Function), Multiple Function Selection Calling
- Non-Python: Chat capability (scenarios where no function is provided), Specified Function Relevance/Irrelevance Detection, other languages or function calling scenarios (REST API, SQL, Java, Javascript)
Examination of multi-step or multi-round dialogue scenarios of varying complexity has been added from v3 onwards.
Assessment Data Volume:
BFCL v3 contains 1,000 pieces of data for evaluation purposes. See here for the detailed composition.
Source Dataset Example Questions:
{
<!-- Designated Function Calling -->
"question": [[{"role": "user", "content": "I've been playing a game where rolling a six is somehow more likely than usual, and the chance of it happening on a single roll is 60%. I'm curious, if I roll the die 20 times, what are the odds that I'll get exactly five sixes?"}]],
"function": [
{
"name": "calc_binomial_probability",
"description": "Calculates the probability of getting k successes in n trials.",
"parameters": {"type": "dict", "properties": {"n": {"type": "integer", "description": "The number of trials."}, "k": {"type": "integer", "description": "The number of successes."}, "p": {"type": "float", "description": "The probability of success."}}, "required": ["n", "k", "p"]}
}
],
"execution_result_type": ["exact_match"],
"ground_truth": ["calc_binomial_probability(n=20, k=5, p=0.6)"]
}
{
<!-- Multiple Function Calling -->
"question": [[{"role": "user", "content": "Could you fetch me the latest news from Paris, France, and also the current weather in Letterkenny, Ireland, in Celsius?"}]],
"function": [
{
"name": "get_news_report",
"description": "Fetches the latest news based on a specific location, typically a city and state.",
"parameters": {"type": "dict", "required": ["location"],"properties": {"location": {"type": "string", "description": "The location for which to get the news, in the format of 'City, State (abbr)' or 'City, Country', such as 'San Francisco, CA', 'Paris, France', or 'New York, NY'."}}}
},
{
"name": "get_current_weather",
"description": "Retrieves the current weather conditions for a specified location, with options for units of temperature measurement.",
"parameters": {"type": "dict", "required": ["location"], "properties": {"location": {"type": "string", "description": "The location for which weather data is to be fetched, in the format of 'City, State (abbr)', such as 'San Francisco, CA'. Or 'City, Country'"}, "unit": {"type": "string", "description": "The unit of temperature for the weather report.", "enum": ["celsius", "fahrenheit"], "default": "fahrenheit"}}}
}
]
}
{
<!-- Irrelevance Detection -->
"question": [[{"role": "user", "content": "Find the integral of x^3 from 1 to 5"}]],
"function": [
{
"name": "str_to_int",
"description": "Converts string value to integer.",
"parameters": {"type": "dict", "properties": {"value": {"type": "string", "description": "String value to be converted to integer"}}, "required": ["value"]}}
]
}
Paper Citation:
@misc{berkeley-function-calling-leaderboard,
title={Berkeley Function Calling Leaderboard},
author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E. Gonzalez},
howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}},
year={2024},
}
Dataset Copyright and Usage Instructions:
Apache 2.0