Skip to content

RafalWilinski/cloudflare-agentic-ai-browser

Repository files navigation

AI-Controlled Browser on Cloudflare

This is an experiment to create an AI Agent that can crawl and interact with webpages to achieve desired goal. Fully on Cloudflare (almost).

Services used:

Cloudflare Infra

Prerequisites

Setup & Usage

pnpm i
npx wrangler secret put OPENAI_API_KEY # and fill with your OpenAI key
# You can also put it inside .dev.vars (copy .dev.vars.template as a reference)

# Run migrations on local SQLite instance
npx wrangler d1 execute ai-agent-jobs --local --file=migrations/0000_init.sql

# Run migrations on remote SQLite instance
npx wrangler d1 execute ai-agent-jobs --remote --file=migrations/0000_init.sql

 # You can use `pnpm run dev` as well but Browser Rendering does not work locally
pnpm run deploy

curl -X POST \
  <URL to your deployed worker> \
  -d '{"baseUrl": "https://chatwithcloud.ai", "goal": "Extract pricing data" }' \ # Replace with your URL and goal
  --no-buffer

The loop

  1. User sends request to the Cloudflare Worker
  2. Cloudflare Worker passes that to the Durable Object
  3. Durable Object starts or reuses Browser and loads baseUrl from the request's body
  4. The Goal and HTML is passed via AI Gateway to the LLM. LLM Responds with:
  • Either the goal is met and final answer is returned
  • Or LLM decides to do one of three things:
    • Click something on the page
    • Type something
    • Select something
  1. After each interaction, the current browser window screenshot is stored in R2. The resulting HTML (or error) is passed to the LLM to generate next step (back to 4).

Limitations

  • To prevent huge bills, Cloudflare Worker is capped at 2 requests per 10 seconds (adjustable in wrangler.toml)
  • GPT-4o context window allows up to 128K tokens. HTML code of many pages exceeeds that
  • Browser Rendering session is limited to 180 seconds (can be changed in code though by adjusting KEEP_BROWSER_ALIVE_IN_SECONDS)

Using Drizzle Studio to view and edit the database

  1. Create .env file and fill with your Cloudflare account ID, D1 Databse ID and Cloudflare D1 token with edit permissions. Use .env.template as a reference
cp .env.template .env
  1. npx drizzle-kit studio
  2. Head to https://local.drizzle.studio

Learn more about Drizzle Studio and its D1 configuration

Todo

  • nice frontend to display the results from the DB, maybe using Hono?

About

LLM-controlled headless browser running fully on Cloudflare

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published