Experiments with FunctionGemma

Experiments with FunctionGemma

Experiments with FunctionGemma, a 300MB LLM that converts natural language into structured function calls, running entirely locally with ollama and Python.

FunctionGemma is a new, very lightweight LLM from Google. It has a footprint of around 300MB for its 270M parameters, which is very small by modern standards. It achieves this tiny size by specializing. Unlike typical chatbots, it acts purely as a model that turns text inputs into structured function calls.

What is “Function Calling”?

A function generally consists of a name, input parameters (the arguments), and an output.

For example, a function that fetches the current temperature for a city might call an external API or run a calculation. In Python, it would look like this:

def get_weather(city: str) -> float: ...

If I want to execute this, I have to choose the function and explicitly write out the parameters, like this:

>>> get_weather("Augsburg")
4.0

I had to think about the right function name for my goal, understand its parameters, and format them correctly. This might be trivial for a programmer, but it is a barrier for non-programmers.

The Natural Language Interface

That’s exactly where FunctionGemma comes in. It serves as a natural language interface for code. Instead of manually writing the function call, you can just write a sentence describing the action you want to perform:

“What’s the temperature in Augsburg?”

FunctionGemma determines the function name and arguments. Its output to the previous example looks like this:

ToolCall(function=Function(name='get_weather', arguments={'city': 'Augsburg'}))

And just like that, it is possible to call a function using plain English! (or other languages)

Wiring it Up

To test FunctionGemma, I wanted to implement this weather calling in real code.

First of all I had to get the model running locally. Thanks to its smale size, my laptop was capable of running this model without problems.

ollama provides a simple CLI for pulling models. With ollama pull functiongemma, I had the model ready to go. Then I setup a simple Python project with uv init, added the ollama Python Client with uv add ollama, and a http client with uv add httpx to call an external weather API.

The basic get_weather function looks like this:

def get_weather(city: str) -> float:
    """
    Get the current weather for a city.

    Args:
        city: The name of the city

    Returns:
        A float representing the current temperature in Celsius.
    """
    r = httpx.get(f"https://wttr.in/{city}?format=j1")
    data = r.json()
    return data["current_condition"][0]["FeelsLikeC"]

The model then needs to be able to inspect the function, which is easy in Python, because functions are first-class objects. Functions are passed as `tools` to the ollama client, where it can see its name and typed parameters and output. But also it sees the docstring, which provides a human-readable description of what the function does. This is not only helpful to humans, but also gives LLMs more context.

tools = [get_weather]

response = chat(
    "functiongemma",
    messages,
    tools=tools,
)

To make the function callable by the string of its name, I used a simple registry:

tool_registry = {f.__name__: f for f in tools}

# Retrieve and execute the function
func_name = tool_call.function.name
func = tool_registry[func_name]
result = func(**tool_call.function.arguments)
print(f"Result: {result} C")

After this I was able to run the script:

uv run examples/weather.py
Prompt: What's the weather in Augsburg?
Calling: get_weather({'city': 'Augsburg'})
Result: 6 C

Pretty neat! :)

The repo with the full code is available on Codeberg.

There are two other examples in the repo that show how to use FunctionGemma with different types of tools. One for controlling my linux desktop environment using ydotool, and another for automating git commands.

Reflection

First of all, I was amazed that I could run this model locally on my laptop! No API with fees, no cloud dependency or privacy issues, just plain old Python and ollama.

I also think this is where LLMs really shine. LLMs are really good at understanding natural language and also flexible enough to be trained on custom data to understand specific domain language, just like the weather.py script. But LLMs really suck (at least at the scale of functiongemma) in logic. Functions on the other hand are great at logic. Combining both allows us to have the best of both worlds, mapping human language to machine actions.

Another idea I had was, that this is not that different from Claude Code or other Coding agents. They basically use frontier models to call “Tools”, which are basically functions, like edit_file, read_file, find_file, search_web etc. and feed back the results into the model for further processing. Claude Code relies on massive, cloud-based frontier models to call tools, but functiongemma shows that we might not need supercomputers for everyday tasks. If a 300MB model can reliably trigger git commands or control my Linux desktop locally, we are very close to having lightweight, capable, and private agents.

But FunctionGemma is still a small model and its accuracy is not good. I repeated some prompts and got different results than expected. For example, when I once asked for the weather: “whats the weather in augsburg?” it resulted in Function(name='get_weather', arguments={'city': 'sburg'})). The FunctionGemma Docs state that the accuracy is only around 58% but finetuning can improve it to up to 85%.

What’s next?

The idea of “Agentic Coding” with a 300MB model really stuck with me. It brings up a fascinating question: what if we don’t always need the massive “cognitive capabilities” of frontier models to build agents?

Right now, frontier models use a lot of their reasoning power just to navigate messy, unstructured tools and syntax. My hypothesis is that if we provide a tiny model with highly intuitive, well-structured tools, we drastically might reduce its cognitive load. It wouldn’t need to learn complex, historical programming quirks. It could just map natural language to clean APIs.

A good testing ground for this might be nushell. Shells like bash or zsh (which are used by almost all coding agents as of now) use a flat, string-based syntax from the 70s which requires a lot of logic to parse. For example, filtering a CSV in bash looks like this: awk -F',' 'NR==1 || $3 < 10' data.csv

Nushell, on the other hand, uses a structured API for its commands, passing structures data instead of raw text: open data.csv | where revenue < 10000

I find myself often intuitively guessing the right commands in Nushell. If it’s more intuitive for humans, it could also be easier for an LLM. Instead of training a model on complex awk logic, the model can learn a more structured approach that more directly maps to natural language. What if a pattern learning machine gets easier patterns to learn? I think the answer could be, less learning is needed and less “knowledge” (parameters are needed.

Conveniently, Nushell has a built-in way to export all of its commands, descriptions, and typed signatures as JSON: scope commands | select name category description search_terms signatures | to json

For my next experiment, I want to feed these into FunctionGemma. If the hypothesis holds up, perhaps with a bit of fine-tuning, I might be able to build a fully local shell agent that runs in a fraction of a gigabyte.

References