Commit 2981f72e authored by Lisa (Hermes AI)'s avatar Lisa (Hermes AI)

docs: add workspace workflow to browser control skill

parent 41095399
---
name: browser-control
description: Control browsers remotely via Hermes Node Protocol
version: 1.0.0
author: Lisa
tags: [automation, browser, playwright, testing, scraping]
---
# Browser Control via Hermes Node Protocol
Control Chrome/Firefox/WebKit on remote nodes (sissy, zeiss, etc.) via Playwright.
## Prerequisites
- Node must have browser control capability registered
- Gateway running on zeiss:8766
- Node agent with Playwright installed
## Quick Start
```python
import requests
import json
GATEWAY = "http://zeiss:8766"
NODE = "sissy"
def browser_cmd(action, **params):
"""Send browser command to node"""
payload = {"action": action, "params": params}
resp = requests.post(f"{GATEWAY}/nodes/{NODE}/browser", json=payload)
return resp.json()
# Launch browser
browser_cmd("launch", config={"headless": False})
# Create context and get page_id
result = browser_cmd("create_context", config={"name": "work"})
page_id = result["page_id"]
# Navigate
browser_cmd("navigate", url="https://github.com", page_id=page_id)
# Take screenshot
result = browser_cmd("screenshot", page_id=page_id, full_page=True)
screenshot_b64 = result["screenshot"]
```
## Common Workflows
### 1. Simple Navigation and Screenshot
```python
from hermes_tools import terminal
import json
# Launch + navigate + screenshot
commands = [
'{"action": "launch", "params": {"config": {"headless": true}}}',
'{"action": "create_context"}',
'{"action": "navigate", "page_id": "page_0", "params": {"url": "https://example.com"}}',
'{"action": "screenshot", "page_id": "page_0", "params": {"full_page": true}}'
]
for cmd in commands:
result = terminal(
f'curl -s -X POST http://zeiss:8766/nodes/sissy/browser -H "Content-Type: application/json" -d \'{cmd}\''
)
print(json.loads(result["output"]))
```
### 2. Form Automation
```python
page_id = "page_0"
# Fill login form
terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{{"action": "fill", "page_id": "{page_id}", "params": {{"selector": "#username", "value": "user@example.com"}}}}'
''')
terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{{"action": "fill", "page_id": "{page_id}", "params": {{"selector": "#password", "value": "secret"}}}}'
''')
# Submit
terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{{"action": "click", "page_id": "{page_id}", "params": {{"selector": "button[type=submit]"}}}}'
''')
```
### 3. Data Extraction
```python
# Get page content
result = terminal(f'''curl -s -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{{"action": "evaluate", "page_id": "page_0", "params": {{"expression": "Array.from(document.querySelectorAll(\\"h2\\")).map(h => h.textContent)"}}}}'
''')
data = json.loads(result["output"])
headings = data["result"]
print(f"Found {len(headings)} headings: {headings}")
```
### 4. Incognito Session
```python
# Create incognito context
result = terminal('''curl -s -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{"action": "create_context", "params": {"config": {"incognito": true}}}'
''')
page_id = json.loads(result["output"])["page_id"]
# Use incognito page
terminal(f'''curl -X POST http://zeiss:8766/nodes/sissy/browser \\
-H "Content-Type: application/json" \\
-d '{{"action": "navigate", "page_id": "{page_id}", "params": {{"url": "https://example.com"}}}}'
''')
```
### 5. Multi-Step Workflow with execute_code
```python
from hermes_tools import terminal, write_file
import json
import base64
def browser_request(action, page_id=None, **params):
"""Helper to send browser commands"""
payload = {"action": action}
if page_id:
payload["page_id"] = page_id
if params:
payload["params"] = params
cmd = f'curl -s -X POST http://zeiss:8766/nodes/sissy/browser -H "Content-Type: application/json" -d \'{json.dumps(payload)}\''
result = terminal(command=cmd)
return json.loads(result["output"])
# Launch browser
browser_request("launch", config={"headless": True})
# Create context
ctx = browser_request("create_context", config={"name": "scraper"})
page_id = ctx["page_id"]
# Navigate to target
browser_request("navigate", page_id, url="https://news.ycombinator.com")
# Extract titles
titles_result = browser_request(
"evaluate",
page_id,
expression='Array.from(document.querySelectorAll(".titleline > a")).slice(0, 10).map(a => a.textContent)'
)
print("Top 10 HN stories:")
for i, title in enumerate(titles_result["result"], 1):
print(f"{i}. {title}")
# Screenshot
screenshot_result = browser_request("screenshot", page_id, full_page=False)
screenshot_data = base64.b64decode(screenshot_result["screenshot"])
write_file("/tmp/hn_screenshot.png", screenshot_data)
print("Screenshot saved to /tmp/hn_screenshot.png")
# Cleanup
browser_request("close")
```
## Launch Modes
### Headless (background)
```json
{"action": "launch", "params": {"config": {"headless": true}}}
```
### Headed (visible window)
```json
{"action": "launch", "params": {"config": {"headless": false}}}
```
### Attach to existing browser
```bash
# First, start Chrome with remote debugging:
# chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-cdp-profile
# Then attach:
{"action": "launch", "params": {"config": {"attach": true, "cdp_url": "http://localhost:9222"}}}
```
**Important Chrome caveat:** modern Google Chrome may refuse DevTools remote debugging on the default everyday profile and show an error like:
```text
DevTools remote debugging requires a non-default data directory.
Specify this using --user-data-dir.
```
When that happens:
- do **not** assume the browser is controllable just because the process command line contains `--remote-debugging-port=9222`
- verify the listener is actually up with `curl http://127.0.0.1:9222/json/version`
- if it refuses connection, Chrome never brought CDP up
**Recommended workaround for a "near-real session" without touching the live default profile:**
- make a **one-time clone** of the default Chrome profile
- launch Chrome with `--user-data-dir` pointed at the clone
- bind CDP to `127.0.0.1`, not `0.0.0.0`
- remove stale `SingletonLock`, `SingletonSocket`, and `SingletonCookie` files from the clone after the initial copy if present
**Important profile-drift lesson:**
- do **not** assume any existing `google-chrome-cdp-clone` directory is a clean baseline for later attach debugging
- if attach hangs or times out against a reused clone, create a **new distinct wrapper/profile pair** (for example `~/.config/google-chrome-cdp-hermes`) rather than silently reusing the old clone
- A/B test attach against the old clone and the new profile explicitly; if the new profile attaches immediately while the old clone times out after websocket connect, treat the issue as **profile/session-state specific**, not generic CDP failure
- once the new profile is proven good, keep it as the canonical Hermes CDP profile instead of inheriting unknown historical session state from the old clone
Example wrapper shape:
```sh
#!/bin/sh
set -eu
REAL_CHROME="${REAL_CHROME:-/usr/bin/google-chrome-stable}"
PORT="${CHROME_CDP_PORT:-9222}"
ADDR="${CHROME_CDP_ADDR:-127.0.0.1}"
USER_DATA_DIR="${CHROME_USER_DATA_DIR:-$HOME/.config/google-chrome-cdp-clone}"
exec "$REAL_CHROME" \
--user-data-dir="$USER_DATA_DIR" \
--remote-debugging-address="$ADDR" \
--remote-debugging-port="$PORT" \
"$@"
```
**Do not use a symlinked fake profile** that points back at the live default profile. That is brittle and risks profile-lock or corruption weirdness.
## Important integration pitfall: CDP healthy, but attach still hangs or times out
If all of these are true:
- `curl http://127.0.0.1:9222/json/version` works
- `curl http://127.0.0.1:9222/json/list` returns targets
- Hermes `browser_control` can at least round-trip simple commands like `list_pages`
- but `launch` with `attach: true` fails with something like:
```text
BrowserType.connect_over_cdp: Timeout 30000ms exceeded.
Call log:
- <ws preparing> retrieving websocket url from http://127.0.0.1:9222
- <ws connecting> ws://127.0.0.1:9222/devtools/browser/...
- <ws connected> ws://127.0.0.1:9222/devtools/browser/...
```
then the remaining bug is probably **inside the node browser controller's Playwright/CDP attach path**, not Chrome startup and not basic node/gateway routing.
### What this pattern means
That call log proves more than it looks like:
- the CDP HTTP endpoint is reachable
- Playwright successfully fetched the websocket URL
- the websocket connection itself opened
- the stall happens *after* transport connect, during attach completion inside Playwright or immediately after in controller-side attach handling
So stop blaming:
- Chrome not running
- wrong `cdp_url`
- firewall/network reachability
- missing browser-control dispatch, if simple browser commands already succeed
### High-value narrowing sequence
1. **Prove browser-control transport separately**
- Run a cheap command like `list_pages` first.
- If `list_pages` succeeds but attach times out, routing is alive and the fault is attach-specific.
2. **Check the attach implementation in `browser_controller.py`**
- Inspect the `launch(... attach=True ...)` path.
- Look for `connect_over_cdp(...)` followed by any attach-state hydration such as `_ingest_attached_browser_state()`.
- If logs never reach the "attached successfully" line, the timeout is occurring before or during that step.
3. **Add attach-specific observability**
- Log attach success with counts of discovered contexts/pages.
- Return discovered `contexts` / `page_ids` in the attach result.
- Make CDP connect timeout configurable (for example `connect_timeout_ms`) so retries are faster while debugging.
4. **Differentiate Playwright-browser launch failures from attach failures**
- Missing Playwright browser binaries affect fresh `launch(headless=...)`.
- They do **not** explain a CDP attach path that already shows websocket connection success.
5. **If Chrome uses a real everyday profile, suspect profile/session-specific weirdness next**
- A clean `--user-data-dir` test is a good A/B check after transport is ruled out.
### Concrete debugging cues
If node logs show:
```text
Playwright initialized
browser ▶️ launch
Failed to launch/attach browser: BrowserType.connect_over_cdp: Timeout ...
```
and do **not** show your post-attach success log, then the timeout is occurring before your controller can finish attach bookkeeping.
### Useful controller hardening pattern
```python
connect_timeout_ms = int(config.get("connect_timeout_ms", 10000))
self.browser = await self.playwright.chromium.connect_over_cdp(
endpoint,
timeout=connect_timeout_ms,
)
self._ingest_attached_browser_state()
logger.info(
f"Attached to existing chromium browser at {endpoint} "
f"(contexts={len(self.contexts)}, pages={len(self.pages)})"
)
result = {"success": True, "mode": "attach"}
result["contexts"] = list(self.contexts.keys())
result["page_ids"] = list(self.pages.keys())
return result
```
This does not guarantee a fix, but it sharply narrows where the stall lives.
### High-probability checks
1. **BrowserController entrypoint mismatch**
- Inspect the live `browser_controller.py` used by the node agent.
- If the node agent calls something like `self.browser.execute(command, params)` or `self.browser.run(command, params)`, verify the controller class actually implements that dispatcher.
- A controller with only method-per-action (`launch`, `navigate`, `click`, etc.) but no `execute()` / `run()` adapter will advertise capability successfully yet fail every real browser command.
2. **Gateway response-schema mismatch**
- Inspect the gateway/browser response handler.
- If the node sends payloads shaped like:
- `type: browser_control_result`
- `success: true/false`
- but the gateway decides success using something like `msg.get("result") == "ok"`, the waiter logic is stale.
- That mismatch can make the browser path look like a timeout even when the node replied.
3. **Verify the live runtime copy, not just repo source**
- Browser-control bugs often survive because the repo file is fixed but the installed node/gateway runtime still uses stale code.
- Inspect the exact installed file on the node and the live gateway plugin copy.
- On packaged Linux nodes, useful live paths are often:
- `/home/nextime/.local/bin/hermes-node-agent`
- `/home/nextime/.local/bin/browser_controller.py`
4. **Prove where the timeout lives with a split-brain check**
- If `node_status` shows `browser_control` available, *and* raw CDP works via:
- `curl http://127.0.0.1:9222/json/version`
- `curl http://127.0.0.1:9222/json/list`
- *and* the installed live node files already contain the expected fixes (for example the agent uses `await self.browser.execute(command, params)` and the controller has `if self.playwright is None: await self.initialize()`),
- but Hermes `browser_control(...)` still returns `Gateway timeout`,
- then the highest-probability remaining fault is the **live gateway/plugin browser-control path**, not Chrome and not the node runtime.
### Diagnostic lesson
Once raw CDP is healthy, stop blaming Chrome. Shift to:
- node-agent browser dispatcher wiring
- gateway waiter/response handling
- runtime copy drift between repo and installed plugin/node files
- and, after ruling those out on the live node, the live Hermes gateway/plugin browser-control path itself
## Important attach-mode pitfall: attach succeeds, but `list_pages` is empty
If all of these are true:
- `launch` with `attach: true` succeeds
- `curl http://127.0.0.1:9222/json/list` shows real Chrome targets/pages
- but Hermes `list_pages` returns `[]`
- and follow-up commands fail with `Page not found: page_X`
then the likely bug is **inside `browser_controller.py` attach-state hydration**, not Chrome startup and not necessarily the gateway.
### Root cause pattern
`connect_over_cdp(...)` can successfully attach to an existing browser session, but the controller's own internal registries may still be empty:
- `self.contexts`
- `self.default_context`
- `self.pages`
If the controller only stores pages it created itself via `create_context()` / `new_page()`, then attached real-session tabs never become addressable by Hermes high-level commands.
### What to inspect
1. Verify live CDP state first:
- `curl http://127.0.0.1:9222/json/version`
- `curl http://127.0.0.1:9222/json/list`
2. Attach via Hermes browser control.
3. Immediately test `list_pages`.
4. Inspect the live `browser_controller.py` used by the node and look for logic that ingests existing browser contexts/pages after `connect_over_cdp(...)`.
### Correct fix shape
After successful attach, hydrate the controller from the attached browser object:
- enumerate `self.browser.contexts`
- set a usable default context
- register existing pages into `self.pages` with stable synthetic IDs like `page_0`, `page_1`, ...
Example fix shape:
```python
def _ingest_attached_browser_state(self) -> None:
self.contexts = {}
self.pages = {}
self.default_context = None
if not self.browser:
return
all_contexts = list(getattr(self.browser, "contexts", []) or [])
if all_contexts:
self.default_context = all_contexts[0]
page_index = 0
for idx, ctx in enumerate(all_contexts):
ctx_name = "default" if idx == 0 else f"attached_ctx_{idx}"
self.contexts[ctx_name] = ctx
if idx == 0:
self.default_context = ctx
for page in list(getattr(ctx, "pages", []) or []):
self.pages[f"page_{page_index}"] = page
page_index += 1
```
and call it immediately after `connect_over_cdp(...)` succeeds.
### Verification
After patching the live runtime copy:
1. attach again
2. run `list_pages` — it should return real existing tabs
3. run `get_title` / `get_url` against one returned `page_id`
4. only after that test richer navigation/evaluate flows
### Related request-shape footgun
If a command like `get_title` fails with a missing `page_id` positional-argument error even though you supplied a top-level tool `page_id`, inspect how the node protocol passes arguments into `browser_controller.execute(...)`.
In this stack, the controller filters accepted args from `params`, so high-level commands that require `page_id` may need it present inside the message params payload, not only at the outer tool wrapper layer.
That request-shape mismatch is secondary, but it can mask the main attach-state bug.
## Virtual desktops / workspaces on X11
On X11 desktops like `sissy` (XFCE), workspace switching can already be done without adding new protocol actions.
### Read current workspace
Use node exec on the remote node:
```bash
xdotool get_desktop
```
### Read workspace count
```bash
xdotool get_num_desktops
```
### Switch workspace with current computer_control
Use keyboard shortcuts through `computer_control`:
- next workspace:
- `key_press` with `ctrl+alt+Right`
- previous workspace:
- `key_press` with `ctrl+alt+Left`
Example:
```json
{
"action": "key_press",
"params": {"key": "ctrl+alt+Right"}
}
```
### Verify the switch
After switching, verify with:
```bash
xdotool get_desktop
```
### Caveats
- This is an operational pattern, not a first-class `desktop_observe` / `computer_control` workspace API.
- Works for X11/xdotool-friendly desktops.
- Do not assume the same method works unchanged on Wayland or Windows.
## Different browser types
```json
{"action": "launch", "params": {"config": {"browser_type": "firefox"}}}
{"action": "launch", "params": {"config": {"browser_type": "webkit"}}}
```
## Context Management
### Persistent (default)
Cookies and localStorage preserved across commands:
```json
{"action": "create_context", "params": {"config": {"name": "session1"}}}
```
### Incognito
Fresh context, discarded after use:
```json
{"action": "create_context", "params": {"config": {"incognito": true}}}
```
### Multiple named contexts
```json
{"action": "create_context", "params": {"config": {"name": "work"}}}
{"action": "create_context", "params": {"config": {"name": "personal"}}}
```
## Command Reference
### High-Level Commands
| Action | Description | Required Params |
|--------|-------------|-----------------|
| `launch` | Start browser | `config` (optional) |
| `create_context` | New context | `config` (optional) |
| `new_page` | Create page | `context_name` (optional) |
| `navigate` | Go to URL | `url`, `page_id` |
| `click` | Click element | `selector`, `page_id` |
| `fill` | Fill input | `selector`, `value`, `page_id` |
| `type_text` | Type text | `selector`, `text`, `page_id` |
| `screenshot` | Capture page | `page_id`, `full_page` (optional) |
| `execute_script` | Run JS (no return) | `script`, `page_id` |
| `evaluate` | Run JS (return value) | `expression`, `page_id` |
| `get_content` | Get HTML | `page_id` |
| `get_title` | Get title | `page_id` |
| `wait_for_selector` | Wait for element | `selector`, `page_id` |
| `close_page` | Close page | `page_id` |
| `close_context` | Close context | `context_name` |
| `close` | Close browser | - |
| `list_pages` | List all pages | - |
| `list_contexts` | List contexts | - |
### Playwright Layer
Direct Playwright API access:
```json
{
"layer": "playwright",
"command": "locator.click",
"page_id": "page_0",
"params": {
"args": ["button#submit"],
"kwargs": {"timeout": 5000}
}
}
```
### CDP Layer
Raw Chrome DevTools Protocol:
```json
{
"layer": "cdp",
"command": "Network.enable",
"page_id": "page_0",
"params": {}
}
```
## Troubleshooting
### Check node capabilities
```bash
curl http://zeiss:8766/nodes/sissy/status
# Should show "capabilities": ["exec", "browser_control"]
```
### Check if browser is running
```bash
curl -X POST http://zeiss:8766/nodes/sissy/browser \
-H "Content-Type: application/json" \
-d '{"action": "list_pages"}'
```
### Playwright not installed
```bash
# On sissy:
cd ~/hermes-node-protocol/node-agent
source venv/bin/activate
pip install playwright
playwright install chromium
```
### Browser launch fails
Check node agent logs:
```bash
tail -f /var/log/hermes-node-agent.log
```
## Security Notes
- Browser control bypasses sexec permission system
- Only use on trusted nodes
- Incognito mode doesn't persist cookies/localStorage
- Screenshots may contain sensitive data
- CDP layer has full browser access
## See Also
- `/home/lisa/hermes-node-protocol/BROWSER_CONTROL.md` - Full documentation
- `/home/lisa/hermes-node-protocol/node-agent/browser_controller.py` - Implementation
- `/home/lisa/hermes-node-protocol/node-agent/test_browser.py` - Test examples
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment