A web-based annotation tool for teaching AI models how to use a web browser. This tool allows annotators to record browser interactions (clicks, scrolls) with explanations, and export the annotation data as JSON.
annotator-demo.mov
- URL Input: Enter any URL and a task prompt
- Screenshot Capture: Automatically captures and displays webpage screenshots
- Action Recording: Record three types of actions:
- Click: Record x,y coordinates on the screenshot
- Scroll: Scroll up or down on the page
- Stop: Finish annotation and provide the final answer
- Action Explanations: Each action requires a text explanation
- Action History: View all recorded actions with timestamps
- JSON Export: Export annotation sessions as JSON files
- Node.js 20 or higher
- npm or yarn
- Docker (optional, for containerized deployment)
npm installnpx playwright install chromiumnpm run devThe application will be available at http://localhost:3000
npm run build
npm start- Build and start the container:
docker-compose up --build-
The application will be available at
http://localhost:3000 -
To stop the container:
docker-compose down- Build the Docker image:
docker build -t browser-annotator .- Run the container:
docker run -p 3000:3000 browser-annotator-
Start a Session:
- Enter a URL (e.g.,
https://example.com) - Enter a task prompt describing what you want to accomplish
- Click "Start Annotation"
- Enter a URL (e.g.,
-
Record Actions:
- Click Action:
- Click "Enable Click Selection"
- Click on the screenshot to select coordinates
- Enter an explanation
- Click "Execute Action"
- Scroll Action:
- Click "Scroll Up" or "Scroll Down"
- Enter an explanation
- Click "Execute Action"
- Stop Action:
- Click "Stop"
- Enter an explanation and final answer
- Click "Execute Action"
- Click Action:
-
View History:
- All actions are displayed in the Action History panel
- Each action shows type, coordinates (if applicable), explanation, and timestamp
-
Export Data:
- Click "Export JSON" to download the annotation session
- The JSON file contains:
- URL and prompt
- All actions with coordinates, explanations, and timestamps
- Final answer (if stop action was performed)
Captures a screenshot of the specified URL.
Request Body:
{
"url": "https://example.com",
"sessionId": "session-123"
}Response:
{
"screenshot": "base64-encoded-image",
"width": 1280,
"height": 720
}Executes a browser action (click, scroll).
Request Body:
{
"sessionId": "session-123",
"actionType": "click",
"x": 100,
"y": 200,
"explanation": "Clicked on the login button"
}Response:
{
"screenshot": "base64-encoded-image",
"width": 1280,
"height": 720
}Exports annotation session as JSON.
Request Body:
{
"url": "https://example.com",
"prompt": "Task description",
"actions": [...],
"createdAt": "2024-01-01T00:00:00.000Z"
}The exported JSON file follows this structure:
{
"url": "https://example.com",
"prompt": "Find the contact information",
"actions": [
{
"type": "click",
"x": 100,
"y": 200,
"explanation": "Clicked on the contact link",
"timestamp": "2024-01-01T00:00:00.000Z"
},
{
"type": "scroll-down",
"explanation": "Scrolled down to see more content",
"timestamp": "2024-01-01T00:01:00.000Z"
},
{
"type": "stop",
"explanation": "Found the contact information",
"finalAnswer": "The contact email is [email protected]",
"timestamp": "2024-01-01T00:02:00.000Z"
}
],
"createdAt": "2024-01-01T00:00:00.000Z"
}