simon_luv_pho

This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

- GitHub: https://github.com/alibaba/page-agent

- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

show comments
arjunchint

Oh whoa, we are working in parallel on a similar angle!

We just launched Rover (https://rover.rtrvr.ai/) as the first Embeddable Web Agent.

Similar principles, just embed a script tag and you get an agent that can type/click/select to onboard/demo/checkout users.

I tried on your website and it was reeaaaally slow. Quick question:

- you are injecting numbering on to the UI. Are you taking screenshots? But I don't see any screenshots in the request being sent, what is the point of the numbering?

I don't think building on browser-use is the way to go, it was the worst performing harness of all we tested [https://www.rtrvr.ai/blog/web-bench-results]. We built out our own logic to build custom Action Trees that don't require any ARIA or accessibility setup from websites.

Would love to meet and trade notes, if possible (rtrvr.ai/request-demo)!

moehj

"Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."

mentalgear

> Data processed via servers in Mainland China

Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

show comments
general_reveal

I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

show comments
dzink

Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?

show comments
pscanf

Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

show comments
jadbox

Firefox support?

Mnexium

Curious - how does it perform with captchas and other "are you human" stuff on the web?

show comments
coreylane

Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?

show comments
MeteorMarc

Confusing name because of the existence of pageant, the putty agent.

show comments
popalchemist

Does it support long-click / click-and-drag?

show comments
jauntywundrkind

Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...

show comments