Death of the Toolbar

How AI Agents Change What Software Is Worth Building

Author

Published

March 29, 2026

Modified

March 29, 2026

The other day I tried Claude Cowork to create a PowerPoint presentation. I absolutely hate PowerPoint (among other Microsoft products), so I was pretty happy when Claude was able to generate a decent slide deck from my original LaTeX presentation. But what really amazed me was how the AI agent created the PPT document: by using a Python script!

This got me thinking about what our software tools will look like in a future where agents increasingly take over the tasks that graphical user interfaces (GUIs) were originally designed for. Here’s my attempt at a framework for reasoning about which kind of software will survive — and thrive — in the age of AI agents.

Disclaimer: This article is meant to spark ideas, not be exhaustive. It won’t cover every type of software and your specific product might not fit neatly into the categories below. Furthermore, adjacent questions (pricing models, security frameworks, liability when agents act on user’s behalf) are important but deserve their own treatment.

Software Purpose

The first thing to consider when evaluating to which degree a software product will be affected by the advent of AI agents is the purpose of the software:

Experience: This is software where the whole point is the human experience when using the software, for example, playing a game, interacting with other humans (social media), or consuming content (Netflix, Spotify, news pages). The main use case of this software (the actual watching, playing, scrolling) will be largely unaffected by AI agents: if you’re using the software for fun, why let an agent do it for you? That said, AI is already reshaping the discovery and curation layer around these experiences (through personalized feeds and playlists or summarized news), so even experience-focused products should not completely ignore these new trends.
Production: This is software you use to create something, but where you care more about the final result than the process of creation, like PowerPoint, Photoshop, audio production tools, and code editors. While their GUIs were optimized over the years to make creation smoother, you mostly use these tools because you need to produce a result and this is the way to get it. This is the kind of software most at risk of being replaced by agentic workflows. Note: if you’re an artist who genuinely enjoys the creation process with these tools and you’re not just interested in the final result, for you the software falls under the “experience” category. But I doubt anyone genuinely enjoys dragging text boxes around in PowerPoint.
Decision-Making: This is software that displays results for humans to make a decision. A pure example is a dashboard with data visualizations. But decision-making is often embedded in production tools too: you need to look at the created artifact and judge whether it’s done or needs more work. This will still be necessary when agents generate the results, as you need to understand whether the agent changed the right things or introduced mistakes. IDEs already do this well with features to examine diffs of version controlled files, but for non-text artifacts like images, music, slides, or 3D models, any automated edit currently risks destroying your progress in subtle ways you might miss when checking only the final result.

Most software products serve multiple purposes, for example, based on the number of features for each category, PowerPoint could be considered 70% production (all the tools used to build slides), 10% decision-making (ability to view and comment on slides), and 20% experience (features to present a slideshow to an audience). And some products don’t fit neatly into any of these buckets (e.g., coordination tools like Jira or Slack, which are also heavily affected by agents).
We’ll focus on production and decision-making in the following, but the general ideas should be relevant for other software too.

Semantic Diff and Comparison Tools

The comparative lack of tools to support decision-making for AI-generated changes demonstrates a huge opportunity for a new class of software features: semantic diff and comparison tools. These address a real need right now: users need to be able to verify not just the final output of an agent’s creation, but also the individual steps to trust subsequent edits. IDEs with AI-integration can serve as inspiration here: they’ve shown diffs of text files for a long time and now also allow users to select individual lines of code to point the agent to the parts it should change. For other file types, creative solutions are needed that offer similar capabilities:

Visual diffs that highlight what changed between versions of an image, slide deck, or 3D model.
Partial reverts so users can accept some of the agent’s changes while rejecting others.
Targeted instructions to point the agent to specific parts that should be changed, instead of it touching everything.

And there’s a subtler problem here: if users stop creating artifacts by hand entirely, they might gradually lose the expertise to judge whether the AI output is any good, which makes good decision-making tooling even more critical.

Agent-Accessible File Formats

The next important change for production tools to stay relevant: agent-accessible file formats. For an agent to create or edit artifacts on a user’s behalf, it needs to be able to work with the underlying file format. If that format is text-based and readable (like XML, HTML, JSON, or markdown), an agent can edit it directly, the same way it edits source code. For more complex operations, it can write a script and possibly use targeted libraries to produce or manipulate the file, as my PowerPoint example illustrates. Either way, text-based formats like the XML behind Word and PowerPoint documents allow targeted, surgical edits: change one part without risking corruption of unrelated elements.

Optimized binary representations with internal cross-references and offsets, common in software that requires fast rendering of complex graphics, are a different story. Modifying one thing can invalidate the whole file. These formats work great when users manipulate artifacts through the software’s GUI, but an agent can’t read or edit them directly, and programmatic access through libraries is brittle at best.

The key question for your product is: can an agent reliably read, produce, and manipulate your file format? If not, your competitors who make their files agent-accessible (through text-based formats, well-documented libraries, CLIs, or APIs) will leave you behind and your only hope for staying relevant would be to make your software so fun to use that users will stay for the experience.

Impact of Hallucinations

When we let an AI generate our outputs, we need to address an important limitation of the underlying large language models (LLMs): the tendency to make stuff up, also referred to as “hallucinations”. How problematic this is depends on the use case. Hallucinations can be …

A feature: If you want the LLM to get creative (e.g., generate a fictional short story, a new song, a cool image), then a result that deviates from what you originally had in mind often sparks new ideas and can result in a better outcome than if the LLM followed your instructions to the letter.
Partially tolerable: For most creation tasks we don’t want the LLM to go too far off script. For example, when drafting an email to your colleague, the AI agent should keep your tone of voice and not make up incorrect progress updates. Or when coding, we need the resulting script to be syntactically correct, to compute the right stuff, and follow our coding style guidelines. But we don’t care about small details like exactly which variable names the agent used. So while the final output might have looked a bit different if we had created it by hand, the time saving from letting an AI do the work is big enough that we’re fine with small variations as long as the result overall gets the job done.
A bug: For some tasks there is only and exactly one correct result and deviations are not acceptable. For example, calculating your tax returns, computing the hash value of an object, running a physics simulation for some initial conditions, or maybe just counting the number of “R”s in the word “strawberry”. In these cases you absolutely do not want an LLM to just make up an answer, but use a deterministic algorithm that is guaranteed to give you the correct result and produces the same output every time you call it with the same inputs.

Tools That Implement Complex Algorithms

This means tools that do complex calculations will stay relevant, provided they are easier for the AI to use than the next best alternative, whether that’s reimplementing the functionality from scratch or using a competing tool. As Steve Yegge describes in his blog post, making your tool agent-friendly requires attention to three things:

Discoverability: If your tool was created after the LLM’s training data cut-off, the agent won’t get the idea to use it on its own. Make it easy to discover, e.g., by distributing a skill that describes how to use it and making the documentation accessible in an agent-friendly format (e.g., as markdown files).
Intuitive interface: Since, like human users, agents sometimes just skim the documentation, the tool should be as intuitive to use as possible. For example, if you watch an agent interact with the tool and it consistently provides an incorrect argument name, you could add this as an alias for your original parameter. This can also help prevent “slopsquatting”, where malicious actors register malware under frequently hallucinated names to trick agents into installing it.
Token-conscious output: Avoid noise in the output and return only the results the agent really needs. CLI tools are often better than MCP servers here, since they come with fewer instructions that pollute the agent’s context window and can be chained more easily with other command line operations to filter and transform the output further.

Validators and Verifiers

Another class of tools that become even more critical with AI agents are validators and verifiers. Software development is far ahead here: compilers, type checkers, linters, and automated tests all catch non-tolerable violations like syntax errors or unexpected results, and what remains in terms of deviations from human-ideal output is mostly tolerable, so manual rework is limited.

But for other data formats such automated checks are still missing. When I opened my AI-generated slide deck in PowerPoint, it told me the file needed to be “repaired”. This wasn’t a big deal, but I wish this kind of feedback and the provided fix were available to the agent as a command line tool so it could address these trivial mistakes before I looked at the output. This means when you design your programmatic interfaces for creating and editing your new agent-accessible file formats, also think about how the output can be validated and what meaningful error messages you could provide to an agent to help it fix any mistakes without additional user input.

Some mistakes, however, are not structural and may even come down to personal preferences. For these I expect LLMs-as-a-judge to become more prevalent: a second agent that criticizes the work of the first. Products could support this by asking users about their style guidelines when setting up a project and providing custom skills for agents to simplify such a review workflow.

Your Data as a Moat

Finally, to accomplish tasks, agents often need access to up-to-date information. Since they were trained on data that is usually at least a few months behind, they can’t (reliably) answer questions such as “what’s the weather forecast for tomorrow?” without consulting external data sources. This data will be valuable context when the agent does its work: the files on your computer when editing a function in an existing codebase, a CRM or ERP database when compiling a report, or sensor logs when diagnosing a problem with a machine.

If the data you can provide is fresh and exclusive, charging for access will remain a legitimate business model. However, just like with computation tools, you need to provide this access in an agent-friendly format: make it easy to discover, intuitive to use, and token-conscious in its output.

My AI-generated PowerPoint presentation was not perfect: I had to fix some alignment issues, there were text boxes I didn’t ask for, and it didn’t fully conform to our corporate design. But I’d still choose Claude over wrestling with PowerPoint’s toolbar any day.

I believe the shift from using tools to describing intent and reviewing results is coming for a lot more software than just presentation apps. And the products that will thrive are the ones that make themselves useful to agents — through accessible file formats, validation tools, and exclusive data — rather than clinging to a clunky GUI that is no fun to use.

So I’m really looking forward to a future where I waste less time dragging around text boxes!