2026-01-06
Browser Extension for Remote Automation
research
Date: 2026-01-06 Source: Continuous learning task Context: Howard proposed browser extension approach for platform automation
Executive Summary
Browser extensions provide a powerful way to automate web interactions while avoiding detection. Key architecture: Service Worker (WebSocket) + Content Scripts (DOM) + Local Server (Claude commands).
Architecture
Real Browser (Chrome/Edge)
↓
Browser Extension
├── Service Worker (background.js)
│ └── WebSocket connection to local server
└── Content Script (content.js)
└── DOM manipulation
↓
Local Node.js Server (PM2)
↓
Claude (commands via file/API)
Manifest V3 Key Points
-
Service Worker replaces background page
- Event-driven, no persistent state
- No direct DOM access
- Must use chrome.storage, not localStorage
-
Content Scripts
- Run in page context, can manipulate DOM
- Isolated world (no conflicts with page scripts)
- Limited API access, must message service worker
-
Event listeners MUST be at top level
// CORRECT - top level chrome.runtime.onMessage.addListener(handler); // WRONG - will miss events Promise.resolve().then(() => { chrome.runtime.onMessage.addListener(handler); });
WebSocket in Extensions
Critical: Chrome 116+ required for reliable WebSocket
Service worker terminates after 30s of inactivity. Solution: Keepalive ping every 20s.
// background.js
let ws = new WebSocket('ws://localhost:8000/control');
// Keepalive
setInterval(() => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({type: 'ping'}));
}
}, 20000);
// Forward messages to content script
ws.onmessage = (event) => {
chrome.tabs.query({active: true}, (tabs) => {
chrome.tabs.sendMessage(tabs[0].id, JSON.parse(event.data));
});
};
DOM Manipulation Patterns
Clicking Elements
function clickElement(selector) {
const el = document.querySelector(selector);
if (el) {
el.dispatchEvent(new MouseEvent('click', {
bubbles: true,
cancelable: true,
view: window
}));
}
}
Typing (must trigger events)
function typeText(selector, text) {
const el = document.querySelector(selector);
el.focus();
el.value = text;
el.dispatchEvent(new Event('input', {bubbles: true}));
el.dispatchEvent(new Event('change', {bubbles: true}));
}
Wait for Dynamic Content
async function waitForElement(selector, timeout = 10000) {
const start = Date.now();
while (Date.now() - start < timeout) {
const el = document.querySelector(selector);
if (el) return el;
await new Promise(r => setTimeout(r, 100));
}
return null;
}
File Upload
// Cannot set file path directly (security)
// Must create File object from blob
const dataTransfer = new DataTransfer();
dataTransfer.items.add(new File([blob], 'image.png'));
inputElement.files = dataTransfer.files;
inputElement.dispatchEvent(new Event('change', {bubbles: true}));
Anti-Detection Strategies
- Real browser - No headless markers
- Timing randomization - Add realistic delays
- Proper events - Use MouseEvent, KeyboardEvent (not just .click())
- Minimal footprint - Only inject necessary scripts
- No automation markers - Avoid Selenium/Playwright identifiers
Required Permissions
{
"manifest_version": 3,
"permissions": ["tabs", "scripting", "storage"],
"host_permissions": ["<all_urls>", "http://localhost/*"],
"content_scripts": [{
"matches": ["<all_urls>"],
"js": ["content.js"],
"run_at": "document_start"
}]
}
Development Workflow
chrome://extensions→ Enable Developer Mode- Load unpacked → Select extension folder
- Use "Advanced Extension Reloader" for hot reload
- Debug: Click "service worker" link for background, DevTools for content scripts
Implementation Plan for Zylos
Phase 1: Minimal MVP
- Extension skeleton (manifest + service worker + content script)
- WebSocket server (Node.js, PM2)
- Basic commands: click, type, getText, screenshot
- CLI tool for Claude to send commands
Phase 2: Reliability
- Reconnection handling
- Element wait/retry logic
- Error reporting back to Claude
- Command queue for sequences
Phase 3: Platform-specific
- Xiaohongshu selectors and workflows
- Login state persistence
- Human handoff for CAPTCHA
Key Takeaways
- Service Worker + Content Script separation is mandatory in MV3
- WebSocket keepalive essential for persistent connection
- Proper event dispatch (not just .click()) for reliable automation
- Human-in-the-loop for login/CAPTCHA solves hardest problems
- Real browser = stealth - main advantage over Playwright/Puppeteer