Browser Annotation Tool

A web-based annotation tool for teaching AI models how to use a web browser. This tool allows annotators to record browser interactions (clicks, scrolls) with explanations, and export the annotation data as JSON.

Demo:

annotator-demo.mov

Features

URL Input: Enter any URL and a task prompt
Screenshot Capture: Automatically captures and displays webpage screenshots
Action Recording: Record three types of actions:
- Click: Record x,y coordinates on the screenshot
- Scroll: Scroll up or down on the page
- Stop: Finish annotation and provide the final answer
Action Explanations: Each action requires a text explanation
Action History: View all recorded actions with timestamps
JSON Export: Export annotation sessions as JSON files

Prerequisites

Node.js 20 or higher
npm or yarn
Docker (optional, for containerized deployment)

Local Development Setup

1. Install Dependencies

npm install

2. Install Playwright Browsers

npx playwright install chromium

3. Run Development Server

npm run dev

The application will be available at http://localhost:3000

4. Build for Production

npm run build
npm start

Docker Setup

Using Docker Compose (Recommended)

Build and start the container:

docker-compose up --build

The application will be available at http://localhost:3000
To stop the container:

docker-compose down

Using Docker Directly

Build the Docker image:

docker build -t browser-annotator .

Run the container:

docker run -p 3000:3000 browser-annotator

Usage

Start a Session:
- Enter a URL (e.g., https://example.com)
- Enter a task prompt describing what you want to accomplish
- Click "Start Annotation"
Record Actions:
- Click Action:
  - Click "Enable Click Selection"
  - Click on the screenshot to select coordinates
  - Enter an explanation
  - Click "Execute Action"
- Scroll Action:
  - Click "Scroll Up" or "Scroll Down"
  - Enter an explanation
  - Click "Execute Action"
- Stop Action:
  - Click "Stop"
  - Enter an explanation and final answer
  - Click "Execute Action"
View History:
- All actions are displayed in the Action History panel
- Each action shows type, coordinates (if applicable), explanation, and timestamp
Export Data:
- Click "Export JSON" to download the annotation session
- The JSON file contains:
  - URL and prompt
  - All actions with coordinates, explanations, and timestamps
  - Final answer (if stop action was performed)

API Endpoints

POST `/api/screenshot`

Captures a screenshot of the specified URL.

Request Body:

{
  "url": "https://example.com",
  "sessionId": "session-123"
}

Response:

{
  "screenshot": "base64-encoded-image",
  "width": 1280,
  "height": 720
}

POST `/api/action`

Executes a browser action (click, scroll).

Request Body:

{
  "sessionId": "session-123",
  "actionType": "click",
  "x": 100,
  "y": 200,
  "explanation": "Clicked on the login button"
}

Response:

{
  "screenshot": "base64-encoded-image",
  "width": 1280,
  "height": 720
}

POST `/api/export`

Exports annotation session as JSON.

Request Body:

{
  "url": "https://example.com",
  "prompt": "Task description",
  "actions": [...],
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Export Format

The exported JSON file follows this structure:

{
  "url": "https://example.com",
  "prompt": "Find the contact information",
  "actions": [
    {
      "type": "click",
      "x": 100,
      "y": 200,
      "explanation": "Clicked on the contact link",
      "timestamp": "2024-01-01T00:00:00.000Z"
    },
    {
      "type": "scroll-down",
      "explanation": "Scrolled down to see more content",
      "timestamp": "2024-01-01T00:01:00.000Z"
    },
    {
      "type": "stop",
      "explanation": "Found the contact information",
      "finalAnswer": "The contact email is [email protected]",
      "timestamp": "2024-01-01T00:02:00.000Z"
    }
  ],
  "createdAt": "2024-01-01T00:00:00.000Z"
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
components		components
hooks		hooks
lib		lib
public		public
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
biome.json		biome.json
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Annotation Tool

Demo:

Features

Prerequisites

Local Development Setup

1. Install Dependencies

2. Install Playwright Browsers

3. Run Development Server

4. Build for Production

Docker Setup

Using Docker Compose (Recommended)

Using Docker Directly

Usage

API Endpoints

POST `/api/screenshot`

POST `/api/action`

POST `/api/export`

Export Format

About

Uh oh!

Releases

Packages

Languages

mdros/browser-annotator

Folders and files

Latest commit

History

Repository files navigation

Browser Annotation Tool

Demo:

Features

Prerequisites

Local Development Setup

1. Install Dependencies

2. Install Playwright Browsers

3. Run Development Server

4. Build for Production

Docker Setup

Using Docker Compose (Recommended)

Using Docker Directly

Usage

API Endpoints

POST /api/screenshot

POST /api/action

POST /api/export

Export Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/api/screenshot`

POST `/api/action`

POST `/api/export`

Packages