Inside ActIO

How Orby’s Large Action Model Powers AI Agents

Bella Liu
August 14, 2025

Large language models (LLMs) like ChatGPT and Claude have transformed how businesses think about and interact with information. LLMs help teams generate content, summarize research, and brainstorm ideas. But they haven’t been able to execute tasks inside business software without human intervention — until now.

That’s where AI agents and Large Action Models (LAMs) come in.

Where LLMs focus on generating outputs based on prompts, LAMs like Orby’s ActIO AI model are designed to complete tasks inside real applications, marking a turning point in enterprise automation AI. LAMs bridge the gap between understanding intent and acting on it. 

ActIO represents a new kind of enterprise automation AI model: one that can see, reason, and execute work from end to end.

How ActIO’s architecture powers modern AI agents

Our ActIO AI model has the power to completely automate enterprise-level tasks in a way that’s never been done before.

Its capabilities are built on a hierarchical agent architecture that separates decision-making from interface execution. This structure enables more reliable AI workflow automation by using two types of AI agents.

  1. The Planner Agent
    • This top-level agent interprets a goal, such as “approve vendor invoices”, breaks it down into logical sub-tasks, such as logging into the ERP, locating and clicking on interface elements, and inputting data into form fields. It manages the overall strategy and task state to keep everything on track. 

  2. The Grounder Agent
    • The lower-level agent takes input from the planner and executes each step within software applications. It operates visually, scanning the screen and interacting with buttons, fields, and menus as a human would.

This separation enables ActIO to adjust its course during execution and handle dynamic scenarios with accuracy as it monitors progress toward completion.

Why visual grounding makes AI Agents more resilient

Traditional automation tools rely heavily on code-based structures like the Document Object Model (DOM) or accessibility layers to identify and interact with software elements. These methods are fast but fragile. Any minor change in the underlying code can disrupt the automation, resulting in high maintenance costs and frequent system failures.

ActIO takes a different approach, incorporating UGround technology and its vision-first strategy.  UGround was co-developed with Ohio State University and trained on over 10 million interface elements across 1.3 million screenshots to perceive the software interface the way a human would — by looking at the screen. This technology allows our ActIO AI model to:

  • Identify buttons, fields, menus, and dynamic elements as they appear on-screen, regardless of how they are coded underneath
  • Operate across web, desktop, and mobile platforms, even when underlying code changes
  • Improve resilience and reduce the cost of ongoing automation maintenance

Since it focuses on visual appearance rather than brittle backend code, ActIO’s grounding mechanism can interpret a redesigned dashboard, a slightly altered layout, or a new third-party tool, and continue to function, even in environments where interfaces evolve frequently.

Our proprietary Self-Adaptive Interface Learning (SAIL) system complements UGround, allowing ActIO to operate on completely new interfaces without prior training or manual customization. It uses pattern recognition and learned context from a wide variety of software interactions to generalize across unseen applications. This makes it possible to scale automation without the delays and costs traditionally associated with onboarding new tools.

Together, UGround and SAIL help ActIO operate like a digital teammate; intuitive, flexible, and ready to work in any environment.

Real-world use case: How ActIO is automating finance and operations workflows

ActIO isn’t just theory. It’s already in use across major enterprises that handle high-volume, multi-step workflows in finance and operations. These tasks often involve navigating multiple applications, interpreting dynamic interfaces, and following strict compliance and reporting protocols.

One of the clearest examples comes from a top tech company. Their challenge was that extracting data from supplier contracts manually was leading to slow processing, data entry errors, and compliance gaps.ActIO helped the company:

  • Reduce average handling time by 60%
  • Improve data accuracy by 90%.

The result was faster payment approvals, stronger supplier relationships, and significant cost savings without increasing headcount.

In another case, a leading marketplace company is using Orby for the cash collection process. Before, they had to manually reconcile uncollected payments across spreadsheets and internal systems, twice in a week. It is highly manual and error prone.After adopting ActIO, they were able to:

  • Fully automate the process
  • Achieve 0 error rate
  • Reconcile daily automatically
  • Enhance compliance through 24/7 automation

These are high-stakes tasks where accuracy, speed, and compliance all matter. And they’re where ActIO shines, delivering measurable results in productivity and reliability. The result is faster decisions, improved forecasting, and more time spent on high-value work.

These gains are supported by real-world testing and actual customer deployment of ActIO. Because ActIO adapts visually and plans intelligently, enterprises can go from identification to execution in days rather than months.

Still, success depends on more than technology. Organizations must align on governance, process ownership, and employee roles. Change management is essential, as is upskilling teams to work alongside AI agents.

ActIO sets a new standard for automation benchmarks

ActIO’s performance stands out in a market crowded with general-purpose AI. In head-to-head comparisons with leading models, ActIO has shown exceptional results, including:

  • ScreenSpot (visual grounding accuracy): 89.4% accuracy in GUI visual grounding, beating models from OpenAI, Google DeepMind, and Salesforce
  • MiniWoB (web task automation): 74.9% success rate, outperforming ServiceNow
  • WebArena (complex workflows): 37.5% success rate across 812 tasks, ahead of ServiceNow
  • VisualWebBench (understanding web interfaces): ActIO scored higher than leading multimodal models

These benchmarks matter because they test real interactions, not just simulated scenarios. For enterprises seeking to automate complex tasks across multiple platforms, ActIO sets a new standard.

Purpose-built AI agents deliver the ROI retrofitted models can’t match

Many AI tools try to bolt on task execution to LLMs, but those models were built to generate language, not execute tasks. 

ActIO is different. It was designed from the ground up as a Large Action Model to handle digital work in real environments — and actions speak louder than words.

Its visual-first grounding, adaptive learning, and hierarchical planning system allow it to operate where retrofitted tools fall short: across complex user interfaces, in workflows that evolve, and at enterprise scale.

For businesses serious about automation, the future is action. And ActIO is already delivering it.

Learn more about ActIO and the future of enterprise automation AI

Ready to see what enterprise-grade AI agents can really do? Get the full report to see how ActIO outperforms traditional automation tools and delivers real business value

WANT TO UNDERSTAND THE SHIFT FROM CONTENT GENERATION TO INTELLIGENT ACTION? DOWNLOAD THE ORBY WHITE PAPER TO EXPLORE HOW LARGE ACTION MODELS ARE REDEFINING ENTERPRISE AI.