Evaluation Data

BFCL

# Evaluation Metrics - Abstract Syntax Tree or Execution Result Matching

Evaluation method: if the model supports function calls, follow its official provision, otherwise use the BFCL unified system prompts.

Data Description:

Berkeley Function-Calling Leaderboard (BFCL) is the most comprehensive large model function tool-calling evaluation benchmark available, produced by the University of California, Berkeley.

The specific form is:

Python: Specified Function Calling, Function Selection Calling, Multiple Function Calling (Parallel Function), Multiple Function Selection Calling
Non-Python: Chat capability (scenarios where no function is provided), Specified Function Relevance/Irrelevance Detection, other languages or function calling scenarios (REST API, SQL, Java, Javascript)

Examination of multi-step or multi-round dialogue scenarios of varying complexity has been added from v3 onwards.

Assessment Data Volume:

BFCL v3 contains 1,000 pieces of data for evaluation purposes. See here for the detailed composition.

Source Dataset Example Questions:

{
  <!-- Designated Function Calling -->
  "question": [[{"role": "user", "content": "I've been playing a game where rolling a six is somehow more likely than usual, and the chance of it happening on a single roll is 60%. I'm curious, if I roll the die 20 times, what are the odds that I'll get exactly five sixes?"}]],
  "function": [
    {
      "name": "calc_binomial_probability",
      "description": "Calculates the probability of getting k successes in n trials.",
      "parameters": {"type": "dict", "properties": {"n": {"type": "integer", "description": "The number of trials."}, "k": {"type": "integer", "description": "The number of successes."}, "p": {"type": "float", "description": "The probability of success."}}, "required": ["n", "k", "p"]}
    }
  ],
  "execution_result_type": ["exact_match"],
  "ground_truth": ["calc_binomial_probability(n=20, k=5, p=0.6)"]
}
{
  <!-- Multiple Function Calling -->
  "question": [[{"role": "user", "content": "Could you fetch me the latest news from Paris, France, and also the current weather in Letterkenny, Ireland, in Celsius?"}]],
  "function": [
    {
      "name": "get_news_report",
      "description": "Fetches the latest news based on a specific location, typically a city and state.",
      "parameters": {"type": "dict", "required": ["location"],"properties": {"location": {"type": "string", "description": "The location for which to get the news, in the format of 'City, State (abbr)' or 'City, Country', such as 'San Francisco, CA', 'Paris, France', or 'New York, NY'."}}}
    },
    {
      "name": "get_current_weather",
      "description": "Retrieves the current weather conditions for a specified location, with options for units of temperature measurement.",
      "parameters": {"type": "dict", "required": ["location"], "properties": {"location": {"type": "string", "description": "The location for which weather data is to be fetched, in the format of 'City, State (abbr)', such as 'San Francisco, CA'. Or 'City, Country'"}, "unit": {"type": "string", "description": "The unit of temperature for the weather report.", "enum": ["celsius", "fahrenheit"], "default": "fahrenheit"}}}
    }
  ]
}
{
  <!-- Irrelevance Detection -->
  "question": [[{"role": "user", "content": "Find the integral of x^3 from 1 to 5"}]],
  "function": [
    {
      "name": "str_to_int",
      "description": "Converts string value to integer.",
      "parameters": {"type": "dict", "properties": {"value": {"type": "string", "description": "String value to be converted to integer"}}, "required": ["value"]}}
  ]
}

Paper Citation：

@misc{berkeley-function-calling-leaderboard,
  title={Berkeley Function Calling Leaderboard},
  author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E. Gonzalez},
  howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}},
  year={2024},
}

Dataset Copyright and Usage Instructions：

Apache 2.0

Evaluation Data ​

BFCL ​

Data Description: ​

Assessment Data Volume: ​

Source Dataset Example Questions: ​

Paper Citation： ​