This project started as something much smaller than it became.
A few months ago I wanted a clean way to give Claude access to Google Sheets and Google Docs. That was it. I use those tools every day, and the existing options either didn't work the way I wanted or mapped so directly to the underlying API that the LLM ended up doing all the heavy lifting itself. Fighting font metrics, retrying calls, hallucinating field names.
Once I had a working framework for one toolkit, I realized adding another wasn't much work. Then a third. I started thinking less about individual integrations and more about what an aggregator could look like if it were built specifically for the things consumers actually use (fitness data, listening history, prediction markets, calendars), rather than the developer and enterprise APIs that everyone else has already covered well.
That's toolforest.io. As of today it's in open beta.
What's different
There are excellent sites out there, like Zapier and Composio, that connect AI assistants to hundreds of APIs. For a lot of use cases they're a perfect fit. The thing I kept running into, though, was that many of these integrations are essentially thin wrappers. The LLM gets handed the same surface the API exposes, with all of its quirks intact.
I think there's real value in adding an intermediate layer between the toolkit and the underlying API. A few examples of what I mean:
- Google Slides. The raw API has no concept of font metrics, which means LLMs routinely generate slides where text overflows its text box. Toolforest's Google Slides toolkit measures fonts properly and gives the model the tools it needs to lay things out correctly.
- Google Sheets. The default auto-resize-column behavior doesn't measure fonts accurately. We compute widths properly so the output actually looks right.
- Polymarket and Kalshi. The raw APIs expose markets, events, prices, and order books, but they don't have a built-in concept of "what's moving." The toolkits add that layer by continuously snapshotting markets, computing price and volume changes over multiple windows, filtering out low-volume noise, and normalizing the quirks between venues. The model can ask for meaningful movers directly instead of trying to assemble that analysis from a pile of raw API calls.
- ListenBrainz. Toolforest maintains a replicated MusicBrainz and ListenBrainz database, so the toolkit can do more than proxy the public API. It resolves missing MBIDs, normalizes time ranges, paginates large histories, explains empty or truncated results, cleans playlist metadata, and supports database-backed questions the public API doesn't expose directly. The result is that an assistant can answer questions like "which Pink Floyd tracks has this user listened to most over time?" or "which similar artists should I explore?" without stitching together brittle raw API calls.
The LLM just gets the right answer faster, and the user never sees the plumbing.
Why I'm announcing this to ListenBrainz first
The toolkits across the site are all live, but I wanted to introduce toolforest somewhere specific rather than everywhere at once. ListenBrainz felt like the obvious place.
I've spent time on the community forums recently, and what struck me was how much of what people want to build is exactly the kind of thing an LLM with structured access to listening data is good at. Taste twins. Year-end summaries that are actually personalized. Reconstructing the shape of a specific day, a year ago. Connecting dots across years of scrobbles.
The dataset is open, the community is generous, and the use cases are genuinely fun. It felt like the right room to walk into first.
If you want to try it, the easiest path is to connect your ListenBrainz account to Toolforest and ask Claude (or whichever assistant you use) something like "According to my ListenBrainz data, who are my three closest taste twins?" The examples in the Cookbook section will give you a few more ideas.
Where this goes
This is a personal project. I have no plans to commercialize it. I'll support the infrastructure myself for as long as that's reasonable; if usage ever gets to the point where the costs become a problem, I'll figure something out then.
What I'm most interested in, though, is closing the loop on toolkit development itself. The same LLMs that use these toolkits are pretty good at evaluating them: finding rough edges, suggesting better tool shapes, inventing use cases I wouldn't have thought of. I've been building a pipeline where models do exactly that. Pick a toolkit, invent a use case, execute it, evaluate the result, and write the findings back as structured feedback. The end state is something close to LLM-guided toolkit development, where agents propose new toolkits, build intermediate layers, and roll them out with minimal hand-holding from me. I'll write more about that in a future post.
If you have ideas for toolkits or enhancements you'd like to see, you can reach me at gerrit@toolforest.io or through the request form on the homepage. And if you're a ListenBrainz user, thanks for taking a look. Hoping this is useful.
Regards,
Gerrit