Skip to content

mdros/browser-annotator

Repository files navigation

Browser Annotation Tool

A web-based annotation tool for teaching AI models how to use a web browser. This tool allows annotators to record browser interactions (clicks, scrolls) with explanations, and export the annotation data as JSON.

Demo:

annotator-demo.mov

Features

  • URL Input: Enter any URL and a task prompt
  • Screenshot Capture: Automatically captures and displays webpage screenshots
  • Action Recording: Record three types of actions:
    • Click: Record x,y coordinates on the screenshot
    • Scroll: Scroll up or down on the page
    • Stop: Finish annotation and provide the final answer
  • Action Explanations: Each action requires a text explanation
  • Action History: View all recorded actions with timestamps
  • JSON Export: Export annotation sessions as JSON files

Prerequisites

  • Node.js 20 or higher
  • npm or yarn
  • Docker (optional, for containerized deployment)

Local Development Setup

1. Install Dependencies

npm install

2. Install Playwright Browsers

npx playwright install chromium

3. Run Development Server

npm run dev

The application will be available at http://localhost:3000

4. Build for Production

npm run build
npm start

Docker Setup

Using Docker Compose (Recommended)

  1. Build and start the container:
docker-compose up --build
  1. The application will be available at http://localhost:3000

  2. To stop the container:

docker-compose down

Using Docker Directly

  1. Build the Docker image:
docker build -t browser-annotator .
  1. Run the container:
docker run -p 3000:3000 browser-annotator

Usage

  1. Start a Session:

    • Enter a URL (e.g., https://example.com)
    • Enter a task prompt describing what you want to accomplish
    • Click "Start Annotation"
  2. Record Actions:

    • Click Action:
      • Click "Enable Click Selection"
      • Click on the screenshot to select coordinates
      • Enter an explanation
      • Click "Execute Action"
    • Scroll Action:
      • Click "Scroll Up" or "Scroll Down"
      • Enter an explanation
      • Click "Execute Action"
    • Stop Action:
      • Click "Stop"
      • Enter an explanation and final answer
      • Click "Execute Action"
  3. View History:

    • All actions are displayed in the Action History panel
    • Each action shows type, coordinates (if applicable), explanation, and timestamp
  4. Export Data:

    • Click "Export JSON" to download the annotation session
    • The JSON file contains:
      • URL and prompt
      • All actions with coordinates, explanations, and timestamps
      • Final answer (if stop action was performed)

API Endpoints

POST /api/screenshot

Captures a screenshot of the specified URL.

Request Body:

{
  "url": "https://example.com",
  "sessionId": "session-123"
}

Response:

{
  "screenshot": "base64-encoded-image",
  "width": 1280,
  "height": 720
}

POST /api/action

Executes a browser action (click, scroll).

Request Body:

{
  "sessionId": "session-123",
  "actionType": "click",
  "x": 100,
  "y": 200,
  "explanation": "Clicked on the login button"
}

Response:

{
  "screenshot": "base64-encoded-image",
  "width": 1280,
  "height": 720
}

POST /api/export

Exports annotation session as JSON.

Request Body:

{
  "url": "https://example.com",
  "prompt": "Task description",
  "actions": [...],
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Export Format

The exported JSON file follows this structure:

{
  "url": "https://example.com",
  "prompt": "Find the contact information",
  "actions": [
    {
      "type": "click",
      "x": 100,
      "y": 200,
      "explanation": "Clicked on the contact link",
      "timestamp": "2024-01-01T00:00:00.000Z"
    },
    {
      "type": "scroll-down",
      "explanation": "Scrolled down to see more content",
      "timestamp": "2024-01-01T00:01:00.000Z"
    },
    {
      "type": "stop",
      "explanation": "Found the contact information",
      "finalAnswer": "The contact email is [email protected]",
      "timestamp": "2024-01-01T00:02:00.000Z"
    }
  ],
  "createdAt": "2024-01-01T00:00:00.000Z"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published