Evaluating the Future of Agentic Automation

Beyond Manus AI

Mar 24, 2025

The Rise of Agentic Automation

Artificial intelligence is rapidly advancing toward a new era characterized by agentic automation, where AI systems proactively perceive their environment, make informed decisions, and autonomously execute complex tasks. Recent research indicates that the capabilities of AI models are expanding significantly, with the duration of tasks they can complete at a 50% success rate doubling approximately every seven months since 2019.

The emergence of Manus AI, a powerful proprietary solution, significantly popularized this agentic paradigm. However, Manus AI remains highly exclusive due to invite-only restrictions, prompting widespread interest in accessible, open-source alternatives. Two particularly notable frameworks are OpenManus and OWL. Meanwhile, h2oGPTe has held a leading position in GAIA benchmarks, and while it is accessible, it remains a closed-source system and faced limitations in our testing.

Why Framework Selection is Crucial

Choosing the appropriate agentic automation framework is vital yet complex, as solutions differ significantly in their design philosophies, real-world effectiveness, supported use cases, and community ecosystems. This report examines the practical reliability of leading open-source frameworks in executing complex interactive tasks such as hotel bookings and other intricate web-based automations.

Frameworks Evaluated

This analysis primarily investigates two prominent open-source agentic automation frameworks:

OpenManus: A modular, flexible, open-source project inspired by ManusAI, providing general-purpose automation with a highly customizable architecture.
OWL (Optimized Workforce Learning): A framework emphasizing structured multi-agent collaboration, leveraging the established CAMEL-AI framework, and highlighting its top-ranked performance on benchmarks.

Additionally, we briefly reference:

h2oGPTe: A prominent closed-source agent, notable for strong GAIA benchmark performance yet hindered by closed-source limitations.

In-Depth Analysis and Practical Insights

We conducted a practical evaluation using the following realistic task scenario:

"Search Booking.com for hotels in Medellin for one person, three nights from March 31, 2025, rated 4-stars or higher, under USD 80/night, including breakfast, preferably in Laureles or El Poblado."

OWL: OWL emphasizes its high GAIA validation benchmark (58.18 average). However, caution is needed due to possible overfitting, as suggested by lack of a test set score and by other benchmark submissions achieving unrealistic perfect validation scores. Real-world testing uncovered significant struggles in OWL's browser automation capabilities, ultimately preventing successful retrieval of relevant hotel listings from Booking.com:
h2oGPTe: Despite strong GAIA test-set benchmark scores (74.75%), practical tests highlighted vulnerabilities to bot-detection systems. Our Booking.com test resulted in task failure due to anti-bot measures preventing successful execution:
OpenManus: OpenManus seeks to democratize sophisticated AI automation with a modular, developer-friendly architecture. Its open-source design supports diverse applications including SEO automation, software development, conversational AI, personalized travel planning, and creative tasks across multiple sectors.

Very recent enhancements over the last few days include initial functionality as an MCP server. The open-source nature of the project enabled me to submit a Pull Request that exposes its main “Manus Agent” execution as a tool for the MCP server, enabling broad interoperability and use by MCP clients.

During practical tests on Trilogy’s Ephor platform, OpenManus successfully managed the hotel search task. Although occasionally challenged by interactive UI elements or unexpected redirects, OpenManus showcased consistent creativity in problem-solving (e.g. successfully constructing direct URL queries for flights in earlier tests). It successfully performed the search with the relevant criteria and applied the specified filters for my hotel search:

Current Insights and Strategic Recommendations

Given the practical test outcomes, OpenManus currently offers superior flexibility and practical performance among open-source agentic automation frameworks. Its modular design and innovative approaches to overcoming real-world web automation barriers, such as CAPTCHAs and redirects, provide significant competitive advantages.

However, broader issues persist in the web automation landscape, notably the prevalence of anti-bot measures and inconsistent web interfaces. Addressing these challenges requires transitioning toward "bot-friendly" web standards. One compelling solution under consideration is "llms.txt," a standardized markdown file initially intended to assist large language models (LLMs) in effectively understanding and processing website content by summarizing key information, specifying critical file locations, and offering alternative renderings. Extending this concept further could support interactive transactional operations, making it viable for automated agents to autonomously perform complex, economically significant tasks such as booking flights and accommodations.

Embracing the Agentic Automation Future

While OpenManus presently leads in practical real-world application, continuous advancement depends significantly on achieving standardized, reliable interfaces for AI-web interactions. Successful implementation of these standards will unlock the transformative potential of fully autonomous agentic systems, redefining how complex, economically impactful tasks are delegated and executed.