4 News Express: Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability https://ift.tt/8vETZnR

Friday, November 1, 2024

Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability https://ift.tt/8vETZnR

Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability Hi HN, We’ve built a desktop app to create highly reliable AI agents that use a computer with mouse and keyboard. Until last week, we had tried many different approaches to open-ended agentic features but none of them had met our reliability bar. With Anthropic’s Computer Use this finally changed, and we just shipped a feature we’re calling Instruct. Instruct allows users to create agentic blocks as part of a larger Autotab skill that provides the structured logical flow to keep the automation on track. If you haven’t had a chance to try Computer Use yet, it is an impressive leap from the last generation of vision models (e.g. gpt4o struggles with relative positions, let alone coordinates). At the same time, it is still not good enough to be given a prompt and let loose. One of the big surprises to us early on was just how much intent specification is required for most real world workflows to run reliably. What looks at first like a simple form filling task usually turns out to have dozens of edge cases and super specific, hidden rules. Even human employees need to be shown how to perform these tasks, and then refined with question-asking + feedback over time. We wanted to build a tool for specifying intent, and iterating with the model to make it reliable enough for real work. - Automations run on top of an action scaffold, which works kind of like a very fuzzy programming language with strict types. This gives the model a high level plan that guides execution, and makes it easy to break out discrete steps to get the reliability you need. (Interestingly, this has also proven useful not just for the agent, but also for the human trying to create, verify and edit the automation.) - When the model is unsure it asks for clarification. For example, if you are in editing mode and the model thinks that an element looks meaningfully different than before, it will ask you to verify that it is the same element. - The agent has access to a memory system that lets it recall information from past runs as well as instructions and feedback from the user. Here's a short video of Autotab Instruct in action: https://ift.tt/xOpDGTj?... We’d love to hear what you think! November 1, 2024 at 10:26PM

4 News Express

Friday, November 1, 2024

Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability https://ift.tt/8vETZnR

No comments:

Show HN: Promptlet – Mac app to help you stop typing "ultrathink" over and over https://ift.tt/KSDsqyv

Footer Social Widget

Followers