Commit 8479b9e3 authored by Lisa's avatar Lisa

feat(node-agent): add computer_control desktop automation and capability-based tool discovery

Major integration of desktop automation and structured capability reporting for Hermes Node Gateway.
This enables a unified node execution framework with optional tool support that is
automatically detected and advertised.

Features:
- ComputerController class (Linux/X11) using xdotool and import (ImageMagick)
  - Screenshot (full screen, optional path or base64-return)
  - Mouse: move(x,y), click(button), position()
  - Keyboard: type_text(text), key_press(key like 'Return', 'Ctrl+c')
  - Window: active_window() returns focused window info
- Capability-based registration to gateway
  - Agent sends tool list ["exec", "browser_control", "computer_control"] on connect
  - gateway filters tools based on declared capabilities
  - missing PC deps handled: checks for xdotool/import; browser extension presence
- Universal installer updated:
  - Prompts: enable browser control? enable computer control?
  - Optional per-node sexec permissions quick-edit (allow/deny/ask comma patterns)
  - Writes config.json with enable_browser, enable_computer_control and permissions JSON
- Fixes:
  - agent formerly advertised capability key 'browser'; gateway expects 'browser_control' → aligned
  - installers baked agent compressed with base64 in shell script, no external files

Files:
- node-agent/: new ComputerController class; enhanced hermes_node_agent.py
- ~/.hermes/plugins/hermes-node-gateway/: added COMPUTER_CONTROL_SCHEMA + handlers + routing
- install_hermes_node_universal.sh (merged in repo dist/ location)

Node Agent endpoint type: "computer_control" commands (gateway forwards to node agent)
Gateway: registers tool 'computer_control' with schema and executes cc_result responses handled

Deployed nodes can now:
  - Execute shell commands via sexec (preserves existing allow/ask/deny)
  - Control browsers (if extension installed)
  - Control desktop (if X11 + xdotool + ImageMagick installed)

Hermes Gateway plugin now exposes 4 tools:
  node_list, node_status, node_exec, browser_control, computer_control ✓
Co-authored-by: 's avatarLisa <lisa@nexlab.net>
Signed-off-by: 's avatarLisa <lisa@nexlab.net>
parents
Pipeline #295 canceled with stages
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/bin/sh
# /etc/init.d/hermes-node-agent
# SysVinit script for Hermes Node Agent
#
# chkconfig: 2345 95 05
# description: Hermes Node Agent reverse-connected WebSocket client
### BEGIN INIT INFO
# Provides: hermes-node-agent
# Required-Start: $network $remote_fs $syslog
# Required-Stop: $network $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Hermes Node Agent
# Description: Reverse-connected WebSocket node agent.
### END INIT INFO
NAME="hermes-node-agent"
DAEMON="/usr/bin/python3"
SCRIPT_DIR="/usr/local/bin"
DAEMON_SCRIPT="${SCRIPT_DIR}/hermes_node_agent.py"
PIDFILE="/var/run/${NAME}.pid"
LOGFILE="/var/log/${NAME}.log"
USER="root"
GROUP="root"
# Check daemon exists
if [ ! -x "$DAEMON" ]; then
echo "$DAEMON not found or not executable."
exit 5
fi
if [ ! -f "$DAEMON_SCRIPT" ]; then
echo "$DAEMON_SCRIPT not found."
exit 5
fi
# Ensure config exists
if [ ! -f "/etc/hermes-node/config.json" ]; then
echo "/etc/hermes-node/config.json not found."
exit 6
fi
. /lib/lsb/init-functions 2>/dev/null || true
start() {
echo "Starting $NAME..."
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
echo "$NAME is already running (PID $PID)."
return 0
else
rm -f "$PIDFILE"
fi
fi
touch "$LOGFILE"
chown "$USER:$GROUP" "$LOGFILE" 2>/dev/null || chmod 644 "$LOGFILE"
$DAEMON $DAEMON_SCRIPT >> "$LOGFILE" 2>&1 &
echo $! > "$PIDFILE"
sleep 1
if kill -0 $(cat "$PIDFILE") 2>/dev/null; then
echo "$NAME started (PID $(cat $PIDFILE))."
else
echo "$NAME failed to start. Check $LOGFILE"
exit 1
fi
}
stop() {
echo "Stopping $NAME..."
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
kill "$PID" 2>/dev/null
for i in $(seq 1 30); do
if ! kill -0 "$PID" 2>/dev/null; then
break
fi
sleep 0.5
done
if kill -0 "$PID" 2>/dev/null; then
echo "Force killing..."
kill -9 "$PID" 2>/dev/null
sleep 1
fi
fi
rm -f "$PIDFILE"
echo "$NAME stopped."
else
echo "$NAME is not running."
fi
pkill -f "hermes_node_agent.py" 2>/dev/null || true
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
sleep 1
start
;;
reload|force-reload)
echo "Reload not supported, restarting..."
stop
sleep 1
start
;;
status)
RUNNING=0
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
echo "$NAME is running (PID $PID)."
RUNNING=1
else
echo "$NAME is not running (stale PID file)."
RUNNING=0
fi
else
PID=$(pgrep -f "hermes_node_agent.py" | head -1)
if [ -n "$PID" ]; then
echo "$NAME is running (PID $PID) but no PID file."
RUNNING=1
else
echo "$NAME is not running."
RUNNING=0
fi
fi
exit $(( 1 - RUNNING ))
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac
exit 0
[Unit]
Description=Hermes Node Agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/bin/hermes-node-agent
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/tmp
[Install]
WantedBy=default.target
// Hermes Extension Background Service Worker
// Provides bidirectional communication between CDP and content scripts
const PORT_NAME = 'hermes_agent_port';
let ports = new Map();
let messageQueue = [];
chrome.runtime.onConnect.addListener((port) => {
if (port.name === PORT_NAME) {
const tabId = port.sender?.tab?.id;
console.log('[Hermes] Port connected from tab:', tabId);
if (tabId) ports.set(tabId, port);
port.onMessage.addListener((msg) => {
console.log('[Hermes] Message from tab', tabId, ':', msg);
if (msg.type === 'hermes_injected_ready') {
// Inject script reported ready
}
});
port.onDisconnect.addListener(() => {
console.log('[Hermes] Port disconnected from tab:', tabId);
ports.delete(tabId);
});
}
});
// Message from external (CDP Runtime.evaluate)
chrome.runtime.onMessageExternal.addListener((msg, sender, sendResponse) => {
if (msg.type && msg.type.startsWith('hermes_')) {
handleHermesMessage(msg, sender, sendResponse);
return true; // async response
}
});
function handleHermesMessage(msg, sender, sendResponse) {
switch (msg.type) {
case 'hermes_eval_in_page':
// Execute JS in all tabs
chrome.tabs.query({}, (tabs) => {
tabs.forEach(tab => {
if (ports.has(tab.id)) {
ports.get(tab.id).postMessage({
type: 'hermes_exec',
script: msg.script,
id: msg.id
});
}
});
sendResponse({status: 'sent'});
});
break;
case 'hermes_get_info':
sendResponse({
status: 'ok',
extension: 'hermes_browser_agent',
version: '1.0',
tabs: Array.from(ports.keys())
});
break;
default:
sendResponse({status: 'unknown'});
}
}
// CDP can call chrome.runtime.sendMessage via Runtime.evaluate
// This allows remote commands to trigger extension actions
console.log('[Hermes] Background service worker initialized');
// Hermes Content Script - runs in every page
// Establishes communication channel with injected code
const PORT_NAME = 'hermes_agent_port';
const port = chrome.runtime.connect({name: PORT_NAME});
port.onMessage.addListener((msg) => {
if (msg.type === 'hermes_exec') {
// Execute script in page context
try {
const result = eval(msg.script);
// Result is not sent back via port (no return mechanism in MV3)
// Use CDP Runtime.evaluate for return values
} catch (e) {
console.error('[Hermes] Script execution error:', e);
}
}
});
// Signal that content script is loaded
port.postMessage({type: 'hermes_content_ready'});
console.log('[Hermes] Content script loaded');
// Hermes Injected API - exposed to page JavaScript context
// Allows page scripts to communicate with the extension and agent
window.HermesAgent = {
version: '1.0',
// Execute code via extension (safer than direct eval)
execute: async function(script) {
return new Promise((resolve) => {
const msg = {type: 'hermes_exec', script, id: Date.now()};
// CDP Runtime.evaluate is the primary channel, this is secondary
console.log('[Hermes] execute called:', script);
resolve({ok: true, note: 'use CDP Runtime.evaluate for results'});
});
},
// Get page info
getInfo: async function() {
return {
url: window.location.href,
title: document.title,
domain: window.location.hostname,
referrer: document.referrer,
timestamp: Date.now()
};
},
// Helper: wait for selector
waitForSelector: function(selector, timeout = 5000) {
return new Promise((resolve, reject) => {
const start = Date.now();
const check = () => {
const el = document.querySelector(selector);
if (el) {
resolve(el);
} else if (Date.now() - start > timeout) {
reject(new Error('Timeout waiting for: ' + selector));
} else {
requestAnimationFrame(check);
}
};
check();
});
},
// Helper: fill form
fillForm: function(selector, value) {
const el = document.querySelector(selector);
if (el) {
el.value = value;
el.dispatchEvent(new Event('input', {bubbles: true}));
el.dispatchEvent(new Event('change', {bubbles: true}));
return true;
}
return false;
}
};
console.log('[Hermes] Injected API loaded - window.HermesAgent available');
// Notify background script
if (chrome && chrome.runtime) {
chrome.runtime.sendMessage({
type: 'hermes_injected_ready',
url: window.location.href
});
}
{
"manifest_version": 3,
"name": "Hermes Node Agent Extension",
"version": "1.0",
"description": "Hermes agent helper - provides CDP communication and JS injection utilities for remote browser automation.",
"permissions": [
"storage",
"scripting",
"activeTab",
"tabs",
"webNavigation"
],
"host_permissions": [
"<all_urls>"
],
"background": {
"service_worker": "background.js",
"type": "module"
},
"content_scripts": [
{
"matches": [
"<all_urls>"
],
"js": [
"content.js"
],
"run_at": "document_start"
}
],
"web_accessible_resources": [
{
"resources": [
"injected.js"
],
"matches": [
"<all_urls>"
]
}
]
}
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/bin/bash
# Hermes Node Agent Installer for sissy (laptop)
# Run as: sudo ./install-on-sissy.sh
set -e
echo "=== Hermes Node Agent Installer for sissy ==="
# Must run as root for /etc/hermes-node config and init script
if [ "$EUID" -ne 0 ]; then
echo "ERROR: Run with sudo"
exit 1
fi
# Configuration
USER_NAME="nextime"
USER_HOME="/home/$USER_NAME"
NODE_AGENT_DIR="$USER_HOME/hermes-node-protocol/node-agent"
CONFIG_PATH="/etc/hermes-node/config.json"
NODE_NAME="sissy"
GATEWAY_HOST="lisa" # VPS where Hermes gateway runs
GATEWAY_PORT="8765"
TOKEN="dbed0834bfc502f3017add9be902c9d321c9cd62f09732a55ee2f8b2b633622f"
SECEXC_PATH="/home/openclaw/.openclaw/skills/sexec/sexec.sh"
echo "Installing for user: $USER_NAME (home: $USER_HOME)"
echo "Target node: $NODE_NAME"
echo "Gateway: $GATEWAY_HOST:$GATEWAY_PORT"
echo ""
# Step 1: Create node config directory
echo "[1/6] Creating /etc/hermes-node/config.json..."
sudo -u "$USER_NAME" mkdir -p /etc/hermes-node 2>/dev/null || mkdir -p /etc/hermes-node
cat > "$CONFIG_PATH" <<EOF
{
"gateway_url": "ws://$GATEWAY_HOST:$GATEWAY_PORT",
"node_name": "$NODE_NAME",
"token": "$TOKEN",
"sexec_path": "$SECEXC_PATH",
"reconnect_interval": 5,
"heartbeat_interval": 30
}
EOF
chmod 600 "$CONFIG_PATH"
echo "✅ Config created:"
cat "$CONFIG_PATH"
# Step 2: Ensure node agent code exists
echo ""
echo "[2/6] Checking node agent code..."
if [ ! -d "$NODE_AGENT_DIR" ]; then
echo "ERROR: Node agent not found at $NODE_AGENT_DIR"
echo "Please copy hermes-node-protocol/ to $USER_HOME first"
echo ""
echo "On lisa (or wherever the code lives):"
echo " rsync -av ~/hermes-node-protocol/ $USER_NAME@sissy:$USER_HOME/hermes-node-protocol/"
exit 1
fi
cd "$NODE_AGENT_DIR"
# Step 3: Setup venv and deps
echo ""
echo "[3/6] Setting up Python virtual environment..."
if [ ! -d "venv" ]; then
sudo -u "$USER_NAME" python3 -m venv venv
fi
source venv/bin/activate
pip install --upgrade pip 2>&1 | tail -1
pip install -r requirements.txt 2>&1 | tail -1
playwright install chromium 2>&1 | tail -1
echo "✅ Dependencies installed"
# Step 4: Create SysV init script
echo ""
echo "[4/6] Installing SysV init script..."
cat > /etc/init.d/hermes-node <<INITSCRIPT
#!/bin/sh
### BEGIN INIT INFO
# Provides: hermes-node
# Required-Start: \$network \$local_fs \$remote_fs
# Required-Stop: \$network \$local_fs \$remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Hermes Node Agent
# Description: Reverse-connection node agent for Hermes
### END INIT INFO
DAEMON="$NODE_AGENT_DIR/venv/bin/python3"
DAEMON_OPTS="$NODE_AGENT_DIR/hermes_node_agent.py"
PIDFILE="/var/run/hermes-node.pid"
LOGFILE="/var/log/hermes-node-agent.log"
USER="$USER_NAME"
. /lib/lsb/init-functions
start() {
log_daemon_msg "Starting Hermes Node Agent"
start-stop-daemon --start --quiet --background \\
--pidfile "\$PIDFILE" \\
--make-pidfile \\
--chuid "\$USER" \\
--exec "\$DAEMON" -- \$DAEMON_OPTS \\
2>&1 | tee -a "\$LOGFILE"
log_end_msg \$?
}
stop() {
log_daemon_msg "Stopping Hermes Node Agent"
start-stop-daemon --stop --quiet --pidfile "\$PIDFILE" --retry 10
log_end_msg \$?
}
status() {
if [ -f "\$PIDFILE" ]; then
PID=\$(cat "\$PIDFILE")
if kill -0 "\$PID" 2>/dev/null; then
log_success_msg "Hermes Node Agent is running (PID \$PID)"
return 0
else
log_failure_msg "Hermes Node Agent PID file exists but process not running"
return 1
fi
else
log_failure_msg "Hermes Node Agent is not running"
return 3
fi
}
case "\$1" in
start)
start
;;
stop)
stop
;;
restart|force-reload)
stop
sleep 2
start
;;
status)
status
;;
*)
echo "Usage: \$0 {start|stop|restart|status}"
exit 1
;;
esac
exit 0
INITSCRIPT
chmod +x /etc/init.d/hermes-node
# Add to default runlevels (Debian/Ubuntu)
if command -v update-rc.d &>/dev/null; then
update-rc.d hermes-node defaults
elif command -v rc-update &>/dev/null; then
rc-update add hermes-node default
fi
echo "✅ Init script installed: /etc/init.d/hermes-node"
# Step 5: Create log file with proper ownership
echo ""
echo "[5/6] Setting up log file..."
touch /var/log/hermes-node-agent.log
chown "$USER_NAME:$USER_NAME" /var/log/hermes-node-agent.log 2>/dev/null || true
chmod 644 /var/log/hermes-node-agent.log
echo "✅ Log file at /var/log/hermes-node-agent.log"
# Step 6: Start the service
echo ""
echo "[6/6] Starting Hermes Node Agent..."
/etc/init.d/hermes-node start
sleep 3
# Verify
echo ""
echo "═══════════════════════════════════════════════"
echo "Installation complete!"
echo "═══════════════════════════════════════════════"
echo ""
echo "Node name: $NODE_NAME"
echo "Config: $CONFIG_PATH"
echo "Service: /etc/init.d/hermes-node"
echo "Logs: sudo tail -f /var/log/hermes-node-agent.log"
echo ""
echo "Check status: sudo /etc/init.d/hermes-node status"
echo ""
echo "Wait 5-10 seconds for connection to gateway ($GATEWAY_HOST)..."
echo "Then test from $GATEWAY_HOST:"
echo ""
echo " curl http://$GATEWAY_HOST:8766/nodes/$NODE_NAME/status"
echo " curl -X POST http://$GATEWAY_HOST:8766/nodes/$NODE_NAME/browser \\"
echo " -H 'Content-Type: application/json' \\"
echo " -d '{\"action\": \"list_pages\"}'"
echo ""
echo "Browser control is ready!"
#!/bin/bash
# Hermes Node Agent Installation Script
# Installs the node agent on a remote machine
set -e
echo "=== Hermes Node Agent Installer ==="
echo ""
# Check if running as root
if [ "$EUID" -eq 0 ]; then
echo "ERROR: Do not run as root. Run as the user who will run the agent."
exit 1
fi
# Check for Python 3
if ! command -v python3 &> /dev/null; then
echo "ERROR: Python 3 is required but not found."
exit 1
fi
# Check for pip
if ! command -v pip3 &> /dev/null; then
echo "ERROR: pip3 is required but not found."
echo "Install with: sudo apt install python3-pip"
exit 1
fi
# Install websockets library
echo "[1/6] Installing Python dependencies..."
apt-get update
apt-get install -y python3-websockets
# Create config directory
echo "[2/6] Creating config directory..."
sudo mkdir -p /etc/hermes-node
sudo chown $USER:$USER /etc/hermes-node
# Copy agent script
echo "[3/6] Installing agent script..."
sudo cp hermes_node_agent.py /usr/local/bin/hermes-node-agent
sudo chmod +x /usr/local/bin/hermes-node-agent
# Create example config if it doesn't exist
if [ ! -f /etc/hermes-node/config.json ]; then
echo "[4/6] Creating example config..."
cat > /etc/hermes-node/config.json <<EOF
{
"gateway_url": "ws://192.168.42.115:8765",
"node_name": "$(hostname)",
"token": "CHANGE-ME-$(openssl rand -hex 16)",
"sexec_path": "$HOME/.openclaw/skills/sexec/sexec.sh",
"reconnect_interval": 5,
"heartbeat_interval": 30
}
EOF
echo " ⚠️ Config created at /etc/hermes-node/config.json"
echo " ⚠️ EDIT THIS FILE: Set gateway_url, node_name, and token"
else
echo "[4/6] Config already exists, skipping..."
fi
# Install SysV init service
echo "[5/6] Installing SysV init service..."
sudo cp hermes-node-agent.init.d /etc/init.d/hermes-node-agent
sudo chmod +x /etc/init.d/hermes-node-agent
sudo update-rc.d hermes-node-agent defaults 2>/dev/null || true
# Enable but don't start (user needs to configure first)
echo "[6/6] Service configured..."
echo ""
echo "✅ Installation complete!"
echo ""
echo "Next steps:"
echo " 1. Edit /etc/hermes-node/config.json with your gateway URL and token"
echo " 2. Ensure sexec.sh is installed at the configured path"
echo " 3. Start the agent: /etc/init.d/hermes-node-agent start"
echo " 4. Check status: /etc/init.d/hermes-node-agent status"
echo " 5. View logs: tail -f /var/log/hermes-node-agent.log"
echo ""
#!/usr/bin/env python3
"""Test browser controller"""
import asyncio
from browser_controller import BrowserController
async def test():
controller = BrowserController()
try:
print("Initializing Playwright...")
await controller.initialize()
print("✅ Playwright initialized")
print("\nLaunching browser (headless)...")
result = await controller.launch({"headless": True})
print(f"✅ {result}")
print("\nCreating context...")
result = await controller.create_context({"name": "test_ctx"})
print(f"✅ {result}")
page_id = result.get("page_id")
print(f"\nNavigating to example.com (page_id={page_id})...")
result = await controller.navigate(page_id, "https://example.com")
print(f"✅ {result}")
print("\nGetting title...")
result = await controller.get_title(page_id)
print(f"✅ {result}")
print("\nTaking screenshot...")
result = await controller.screenshot(page_id, full_page=True)
print(f"✅ Screenshot captured ({len(result.get('screenshot', ''))} chars base64)")
print("\nClosing browser...")
await controller.close()
print("✅ Browser closed")
print("\n🎉 All tests passed!")
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(test())
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment