Computer Use Bots may not be a one-size-fits-all solution.
For example, consider the task of searching and deleting a post from Reddit.
The browser agent (comp use) takes very long because of complex website structure
Instead if you have access to APIs to delete (Tools), it is only a few lines of code.
So, which one to use - Browser Agent or Tool Use?
This paper argues that Hybrid is needed. At each step, decide and pick one of them using an LLM.
"This case highlights the Hybrid Agent’s ability to efficiently combine API calls with web interactions, allowing it to tackle complex multi-step tasks that would be difficult or impossible for solely browsing or solely API-based agents."
Browsing agent - gets 15% on web arena
API agent - 29% on webarena.
Hybrid agent +5% over API agent
The catch is that many websites may not have well-documented or undocumented APIs.
How do you recover such APIs?
——-
Interesting quotes/prompts from the paper:
"Also, do NOT ask the user for any clarification, they cannot clarify anything and you need to do it yourself."
"Be careful about pagination of the API response, the response might only contain the first few instances, so make sure you look at all instances."