docs: add workspace workflow to browser control skill

2981f72e · Lisa (Hermes AI) · 41095399 · 2981f72e
Commit 2981f72e authored May 15, 2026 by Lisa (Hermes AI)
Show whitespace changes
Inline Side-by-side

Showing with 609 additions and 0 deletions

SKILL.md skills/browser-control/SKILL.md +609 -0

No files found.
--- a/skills/browser-control/SKILL.md
+++ b/skills/browser-control/SKILL.md
+---
+name: browser-control
+description: Control browsers remotely via Hermes Node Protocol
+version: 1.0.0
+author: Lisa
+tags: [automation, browser, playwright, testing, scraping]
+---
+
+# Browser Control via Hermes Node Protocol
+
+Control Chrome/Firefox/WebKit on remote nodes (sissy, zeiss, etc.) via Playwright.
+
+## Prerequisites
+
+- Node must have browser control capability registered
+- Gateway running on zeiss:8766
+- Node agent with Playwright installed
+
+## Quick Start
+
+```python
+import requests
+import json
+
+GATEWAY = "http://zeiss:8766"
+NODE = "sissy"
+
+def browser_cmd(action, **params):
+    """Send browser command to node"""
+    payload = {"action": action, "params": params}
+    resp = requests.post(f"{GATEWAY}/nodes/{NODE}/browser", json=payload)
+    return resp.json()
+
+# Launch browser
+browser_cmd("launch", config={"headless": False})
+
+# Create context and get page_id
+result = browser_cmd("create_context", config={"name": "work"})
+page_id = result["page_id"]
+
+# Navigate
+browser_cmd("navigate", url="https://github.com", page_id=page_id)
+
+# Take screenshot
+result = browser_cmd("screenshot", page_id=page_id, full_page=True)
+screenshot_b64 = result["screenshot"]
+```
+
+## Common Workflows
+
+### 1. Simple Navigation and Screenshot
+
+```python
+from hermes_tools import terminal
+import json
+
+# Launch + navigate + screenshot
+commands = [
+    '{"action": "launch", "params": {"config": {"headless": true}}}',
+    '{"action": "create_context"}',
+    '{"action": "navigate", "page_id": "page_0", "params": {"url": "https://example.com"}}',
+    '{"action": "screenshot", "page_id": "page_0", "params": {"full_page": true}}'
+]
+
+for cmd in commands:
+    result = terminal(
+        f'curl -s -X POST http://zeiss:8766/nodes/sissy/browser -H "Content-Type: application/json" -d \'{cmd}\''
+    )
+    print(json.loads(result["output"]))
+```
+
+### 2. Form Automation
+
+```python
+page_id = "page_0"
+
+# Fill login form
+terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{{"action": "fill", "page_id": "{page_id}", "params": {{"selector": "#username", "value": "user@example.com"}}}}'
+''')
+
+terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{{"action": "fill", "page_id": "{page_id}", "params": {{"selector": "#password", "value": "secret"}}}}'
+''')
+
+# Submit
+terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{{"action": "click", "page_id": "{page_id}", "params": {{"selector": "button[type=submit]"}}}}'
+''')
+```
+
+### 3. Data Extraction
+
+```python
+# Get page content
+result = terminal(f'''curl -s -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{{"action": "evaluate", "page_id": "page_0", "params": {{"expression": "Array.from(document.querySelectorAll(\\"h2\\")).map(h => h.textContent)"}}}}'
+''')
+
+data = json.loads(result["output"])
+headings = data["result"]
+print(f"Found {len(headings)} headings: {headings}")
+```
+
+### 4. Incognito Session
+
+```python
+# Create incognito context
+result = terminal('''curl -s -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{"action": "create_context", "params": {"config": {"incognito": true}}}'
+''')
+
+page_id = json.loads(result["output"])["page_id"]
+
+# Use incognito page
+terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
+  -H "Content-Type: application/json" \\
+  -d '{{"action": "navigate", "page_id": "{page_id}", "params": {{"url": "https://example.com"}}}}'
+''')
+```
+
+### 5. Multi-Step Workflow with execute_code
+
+```python
+from hermes_tools import terminal, write_file
+import json
+import base64
+
+def browser_request(action, page_id=None, **params):
+    """Helper to send browser commands"""
+    payload = {"action": action}
+    if page_id:
+        payload["page_id"] = page_id
+    if params:
+        payload["params"] = params
+    
+    cmd = f'curl -s -X POST http://zeiss:8766/nodes/sissy/browser -H "Content-Type: application/json" -d \'{json.dumps(payload)}\''
+    result = terminal(command=cmd)
+    return json.loads(result["output"])
+
+# Launch browser
+browser_request("launch", config={"headless": True})
+
+# Create context
+ctx = browser_request("create_context", config={"name": "scraper"})
+page_id = ctx["page_id"]
+
+# Navigate to target
+browser_request("navigate", page_id, url="https://news.ycombinator.com")
+
+# Extract titles
+titles_result = browser_request(
+    "evaluate", 
+    page_id, 
+    expression='Array.from(document.querySelectorAll(".titleline > a")).slice(0, 10).map(a => a.textContent)'
+)
+
+print("Top 10 HN stories:")
+for i, title in enumerate(titles_result["result"], 1):
+    print(f"{i}. {title}")
+
+# Screenshot
+screenshot_result = browser_request("screenshot", page_id, full_page=False)
+screenshot_data = base64.b64decode(screenshot_result["screenshot"])
+write_file("/tmp/hn_screenshot.png", screenshot_data)
+print("Screenshot saved to /tmp/hn_screenshot.png")
+
+# Cleanup
+browser_request("close")
+```
+
+## Launch Modes
+
+### Headless (background)
+```json
+{"action": "launch", "params": {"config": {"headless": true}}}
+```
+
+### Headed (visible window)
+```json
+{"action": "launch", "params": {"config": {"headless": false}}}
+```
+
+### Attach to existing browser
+```bash
+# First, start Chrome with remote debugging:
+# chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-cdp-profile
+
+# Then attach:
+{"action": "launch", "params": {"config": {"attach": true, "cdp_url": "http://localhost:9222"}}}
+```
+
+**Important Chrome caveat:** modern Google Chrome may refuse DevTools remote debugging on the default everyday profile and show an error like:
+
+```text
+DevTools remote debugging requires a non-default data directory.
+Specify this using --user-data-dir.
+```
+
+When that happens:
+- do **not** assume the browser is controllable just because the process command line contains `--remote-debugging-port=9222`
+- verify the listener is actually up with `curl http://127.0.0.1:9222/json/version`
+- if it refuses connection, Chrome never brought CDP up
+
+**Recommended workaround for a "near-real session" without touching the live default profile:**
+- make a **one-time clone** of the default Chrome profile
+- launch Chrome with `--user-data-dir` pointed at the clone
+- bind CDP to `127.0.0.1`, not `0.0.0.0`
+- remove stale `SingletonLock`, `SingletonSocket`, and `SingletonCookie` files from the clone after the initial copy if present
+
+**Important profile-drift lesson:**
+- do **not** assume any existing `google-chrome-cdp-clone` directory is a clean baseline for later attach debugging
+- if attach hangs or times out against a reused clone, create a **new distinct wrapper/profile pair** (for example `~/.config/google-chrome-cdp-hermes`) rather than silently reusing the old clone
+- A/B test attach against the old clone and the new profile explicitly; if the new profile attaches immediately while the old clone times out after websocket connect, treat the issue as **profile/session-state specific**, not generic CDP failure
+- once the new profile is proven good, keep it as the canonical Hermes CDP profile instead of inheriting unknown historical session state from the old clone
+
+Example wrapper shape:
+
+```sh
+#!/bin/sh
+set -eu
+
+REAL_CHROME="${REAL_CHROME:-/usr/bin/google-chrome-stable}"
+PORT="${CHROME_CDP_PORT:-9222}"
+ADDR="${CHROME_CDP_ADDR:-127.0.0.1}"
+USER_DATA_DIR="${CHROME_USER_DATA_DIR:-$HOME/.config/google-chrome-cdp-clone}"
+
+exec "$REAL_CHROME" \
+  --user-data-dir="$USER_DATA_DIR" \
+  --remote-debugging-address="$ADDR" \
+  --remote-debugging-port="$PORT" \
+  "$@"
+```
+
+**Do not use a symlinked fake profile** that points back at the live default profile. That is brittle and risks profile-lock or corruption weirdness.
+
+## Important integration pitfall: CDP healthy, but attach still hangs or times out
+
+If all of these are true:
+- `curl http://127.0.0.1:9222/json/version` works
+- `curl http://127.0.0.1:9222/json/list` returns targets
+- Hermes `browser_control` can at least round-trip simple commands like `list_pages`
+- but `launch` with `attach: true` fails with something like:
+
+```text
+BrowserType.connect_over_cdp: Timeout 30000ms exceeded.
+Call log:
+  - <ws preparing> retrieving websocket url from http://127.0.0.1:9222
+  - <ws connecting> ws://127.0.0.1:9222/devtools/browser/...
+  - <ws connected> ws://127.0.0.1:9222/devtools/browser/...
+```
+
+then the remaining bug is probably **inside the node browser controller's Playwright/CDP attach path**, not Chrome startup and not basic node/gateway routing.
+
+### What this pattern means
+
+That call log proves more than it looks like:
+- the CDP HTTP endpoint is reachable
+- Playwright successfully fetched the websocket URL
+- the websocket connection itself opened
+- the stall happens *after* transport connect, during attach completion inside Playwright or immediately after in controller-side attach handling
+
+So stop blaming:
+- Chrome not running
+- wrong `cdp_url`
+- firewall/network reachability
+- missing browser-control dispatch, if simple browser commands already succeed
+
+### High-value narrowing sequence
+
+1. **Prove browser-control transport separately**
+   - Run a cheap command like `list_pages` first.
+   - If `list_pages` succeeds but attach times out, routing is alive and the fault is attach-specific.
+
+2. **Check the attach implementation in `browser_controller.py`**
+   - Inspect the `launch(... attach=True ...)` path.
+   - Look for `connect_over_cdp(...)` followed by any attach-state hydration such as `_ingest_attached_browser_state()`.
+   - If logs never reach the "attached successfully" line, the timeout is occurring before or during that step.
+
+3. **Add attach-specific observability**
+   - Log attach success with counts of discovered contexts/pages.
+   - Return discovered `contexts` / `page_ids` in the attach result.
+   - Make CDP connect timeout configurable (for example `connect_timeout_ms`) so retries are faster while debugging.
+
+4. **Differentiate Playwright-browser launch failures from attach failures**
+   - Missing Playwright browser binaries affect fresh `launch(headless=...)`.
+   - They do **not** explain a CDP attach path that already shows websocket connection success.
+
+5. **If Chrome uses a real everyday profile, suspect profile/session-specific weirdness next**
+   - A clean `--user-data-dir` test is a good A/B check after transport is ruled out.
+
+### Concrete debugging cues
+
+If node logs show:
+```text
+Playwright initialized
+browser ▶️ launch
+Failed to launch/attach browser: BrowserType.connect_over_cdp: Timeout ...
+```
+and do **not** show your post-attach success log, then the timeout is occurring before your controller can finish attach bookkeeping.
+
+### Useful controller hardening pattern
+
+```python
+connect_timeout_ms = int(config.get("connect_timeout_ms", 10000))
+self.browser = await self.playwright.chromium.connect_over_cdp(
+    endpoint,
+    timeout=connect_timeout_ms,
+)
+self._ingest_attached_browser_state()
+logger.info(
+    f"Attached to existing chromium browser at {endpoint} "
+    f"(contexts={len(self.contexts)}, pages={len(self.pages)})"
+)
+result = {"success": True, "mode": "attach"}
+result["contexts"] = list(self.contexts.keys())
+result["page_ids"] = list(self.pages.keys())
+return result
+```
+
+This does not guarantee a fix, but it sharply narrows where the stall lives.
+
+### High-probability checks
+
+1. **BrowserController entrypoint mismatch**
+   - Inspect the live `browser_controller.py` used by the node agent.
+   - If the node agent calls something like `self.browser.execute(command, params)` or `self.browser.run(command, params)`, verify the controller class actually implements that dispatcher.
+   - A controller with only method-per-action (`launch`, `navigate`, `click`, etc.) but no `execute()` / `run()` adapter will advertise capability successfully yet fail every real browser command.
+
+2. **Gateway response-schema mismatch**
+   - Inspect the gateway/browser response handler.
+   - If the node sends payloads shaped like:
+     - `type: browser_control_result`
+     - `success: true/false`
+   - but the gateway decides success using something like `msg.get("result") == "ok"`, the waiter logic is stale.
+   - That mismatch can make the browser path look like a timeout even when the node replied.
+
+3. **Verify the live runtime copy, not just repo source**
+   - Browser-control bugs often survive because the repo file is fixed but the installed node/gateway runtime still uses stale code.
+   - Inspect the exact installed file on the node and the live gateway plugin copy.
+   - On packaged Linux nodes, useful live paths are often:
+     - `/home/nextime/.local/bin/hermes-node-agent`
+     - `/home/nextime/.local/bin/browser_controller.py`
+
+4. **Prove where the timeout lives with a split-brain check**
+   - If `node_status` shows `browser_control` available, *and* raw CDP works via:
+     - `curl http://127.0.0.1:9222/json/version`
+     - `curl http://127.0.0.1:9222/json/list`
+   - *and* the installed live node files already contain the expected fixes (for example the agent uses `await self.browser.execute(command, params)` and the controller has `if self.playwright is None: await self.initialize()`),
+   - but Hermes `browser_control(...)` still returns `Gateway timeout`,
+   - then the highest-probability remaining fault is the **live gateway/plugin browser-control path**, not Chrome and not the node runtime.
+
+### Diagnostic lesson
+
+Once raw CDP is healthy, stop blaming Chrome. Shift to:
+- node-agent browser dispatcher wiring
+- gateway waiter/response handling
+- runtime copy drift between repo and installed plugin/node files
+- and, after ruling those out on the live node, the live Hermes gateway/plugin browser-control path itself
+
+## Important attach-mode pitfall: attach succeeds, but `list_pages` is empty
+
+If all of these are true:
+- `launch` with `attach: true` succeeds
+- `curl http://127.0.0.1:9222/json/list` shows real Chrome targets/pages
+- but Hermes `list_pages` returns `[]`
+- and follow-up commands fail with `Page not found: page_X`
+
+then the likely bug is **inside `browser_controller.py` attach-state hydration**, not Chrome startup and not necessarily the gateway.
+
+### Root cause pattern
+
+`connect_over_cdp(...)` can successfully attach to an existing browser session, but the controller's own internal registries may still be empty:
+- `self.contexts`
+- `self.default_context`
+- `self.pages`
+
+If the controller only stores pages it created itself via `create_context()` / `new_page()`, then attached real-session tabs never become addressable by Hermes high-level commands.
+
+### What to inspect
+
+1. Verify live CDP state first:
+   - `curl http://127.0.0.1:9222/json/version`
+   - `curl http://127.0.0.1:9222/json/list`
+2. Attach via Hermes browser control.
+3. Immediately test `list_pages`.
+4. Inspect the live `browser_controller.py` used by the node and look for logic that ingests existing browser contexts/pages after `connect_over_cdp(...)`.
+
+### Correct fix shape
+
+After successful attach, hydrate the controller from the attached browser object:
+- enumerate `self.browser.contexts`
+- set a usable default context
+- register existing pages into `self.pages` with stable synthetic IDs like `page_0`, `page_1`, ...
+
+Example fix shape:
+
+```python
+def _ingest_attached_browser_state(self) -> None:
+    self.contexts = {}
+    self.pages = {}
+    self.default_context = None
+
+    if not self.browser:
+        return
+
+    all_contexts = list(getattr(self.browser, "contexts", []) or [])
+    if all_contexts:
+        self.default_context = all_contexts[0]
+
+    page_index = 0
+    for idx, ctx in enumerate(all_contexts):
+        ctx_name = "default" if idx == 0 else f"attached_ctx_{idx}"
+        self.contexts[ctx_name] = ctx
+        if idx == 0:
+            self.default_context = ctx
+        for page in list(getattr(ctx, "pages", []) or []):
+            self.pages[f"page_{page_index}"] = page
+            page_index += 1
+```
+
+and call it immediately after `connect_over_cdp(...)` succeeds.
+
+### Verification
+
+After patching the live runtime copy:
+1. attach again
+2. run `list_pages` — it should return real existing tabs
+3. run `get_title` / `get_url` against one returned `page_id`
+4. only after that test richer navigation/evaluate flows
+
+### Related request-shape footgun
+
+If a command like `get_title` fails with a missing `page_id` positional-argument error even though you supplied a top-level tool `page_id`, inspect how the node protocol passes arguments into `browser_controller.execute(...)`.
+
+In this stack, the controller filters accepted args from `params`, so high-level commands that require `page_id` may need it present inside the message params payload, not only at the outer tool wrapper layer.
+
+That request-shape mismatch is secondary, but it can mask the main attach-state bug.
+
+## Virtual desktops / workspaces on X11
+
+On X11 desktops like `sissy` (XFCE), workspace switching can already be done without adding new protocol actions.
+
+### Read current workspace
+Use node exec on the remote node:
+
+```bash
+xdotool get_desktop
+```
+
+### Read workspace count
+```bash
+xdotool get_num_desktops
+```
+
+### Switch workspace with current computer_control
+Use keyboard shortcuts through `computer_control`:
+
+- next workspace:
+  - `key_press` with `ctrl+alt+Right`
+- previous workspace:
+  - `key_press` with `ctrl+alt+Left`
+
+Example:
+```json
+{
+  "action": "key_press",
+  "params": {"key": "ctrl+alt+Right"}
+}
+```
+
+### Verify the switch
+After switching, verify with:
+
+```bash
+xdotool get_desktop
+```
+
+### Caveats
+- This is an operational pattern, not a first-class `desktop_observe` / `computer_control` workspace API.
+- Works for X11/xdotool-friendly desktops.
+- Do not assume the same method works unchanged on Wayland or Windows.
+
+## Different browser types
+```json
+{"action": "launch", "params": {"config": {"browser_type": "firefox"}}}
+{"action": "launch", "params": {"config": {"browser_type": "webkit"}}}
+```
+
+## Context Management
+
+### Persistent (default)
+Cookies and localStorage preserved across commands:
+```json
+{"action": "create_context", "params": {"config": {"name": "session1"}}}
+```
+
+### Incognito
+Fresh context, discarded after use:
+```json
+{"action": "create_context", "params": {"config": {"incognito": true}}}
+```
+
+### Multiple named contexts
+```json
+{"action": "create_context", "params": {"config": {"name": "work"}}}
+{"action": "create_context", "params": {"config": {"name": "personal"}}}
+```
+
+## Command Reference
+
+### High-Level Commands
+
+| Action | Description | Required Params |
+|--------|-------------|-----------------|
+| `launch` | Start browser | `config` (optional) |
+| `create_context` | New context | `config` (optional) |
+| `new_page` | Create page | `context_name` (optional) |
+| `navigate` | Go to URL | `url`, `page_id` |
+| `click` | Click element | `selector`, `page_id` |
+| `fill` | Fill input | `selector`, `value`, `page_id` |
+| `type_text` | Type text | `selector`, `text`, `page_id` |
+| `screenshot` | Capture page | `page_id`, `full_page` (optional) |
+| `execute_script` | Run JS (no return) | `script`, `page_id` |
+| `evaluate` | Run JS (return value) | `expression`, `page_id` |
+| `get_content` | Get HTML | `page_id` |
+| `get_title` | Get title | `page_id` |
+| `wait_for_selector` | Wait for element | `selector`, `page_id` |
+| `close_page` | Close page | `page_id` |
+| `close_context` | Close context | `context_name` |
+| `close` | Close browser | - |
+| `list_pages` | List all pages | - |
+| `list_contexts` | List contexts | - |
+
+### Playwright Layer
+
+Direct Playwright API access:
+```json
+{
+  "layer": "playwright",
+  "command": "locator.click",
+  "page_id": "page_0",
+  "params": {
+    "args": ["button#submit"],
+    "kwargs": {"timeout": 5000}
+  }
+}
+```
+
+### CDP Layer
+
+Raw Chrome DevTools Protocol:
+```json
+{
+  "layer": "cdp",
+  "command": "Network.enable",
+  "page_id": "page_0",
+  "params": {}
+}
+```
+
+## Troubleshooting
+
+### Check node capabilities
+```bash
+curl http://zeiss:8766/nodes/sissy/status
+# Should show "capabilities": ["exec", "browser_control"]
+```
+
+### Check if browser is running
+```bash
+curl -X POST http://zeiss:8766/nodes/sissy/browser \
+  -H "Content-Type: application/json" \
+  -d '{"action": "list_pages"}'
+```
+
+### Playwright not installed
+```bash
+# On sissy:
+cd ~/hermes-node-protocol/node-agent
+source venv/bin/activate
+pip install playwright
+playwright install chromium
+```
+
+### Browser launch fails
+Check node agent logs:
+```bash
+tail -f /var/log/hermes-node-agent.log
+```
+
+## Security Notes
+
+- Browser control bypasses sexec permission system
+- Only use on trusted nodes
+- Incognito mode doesn't persist cookies/localStorage
+- Screenshots may contain sensitive data
+- CDP layer has full browser access
+
+## See Also
+
+- `/home/lisa/hermes-node-protocol/BROWSER_CONTROL.md` - Full documentation
+- `/home/lisa/hermes-node-protocol/node-agent/browser_controller.py` - Implementation
+- `/home/lisa/hermes-node-protocol/node-agent/test_browser.py` - Test examples