Commit 29a308f6 authored by Lisa's avatar Lisa

feat(node-agent): add computer_control desktop automation and capability-based tool discovery

Major integration of desktop automation and structured capability reporting for Hermes Node Gateway.
This enables a unified node execution framework with optional tool support that is
automatically detected and advertised.

Features:
- ComputerController class (Linux/X11) using xdotool and import (ImageMagick)
  - Screenshot (full screen, optional path or base64-return)
  - Mouse: move(x,y), click(button), position()
  - Keyboard: type_text(text), key_press(key like 'Return', 'Ctrl+c')
  - Window: active_window() returns focused window info
- Capability-based registration to gateway
  - Agent sends tool list ["exec", "browser_control", "computer_control"] on connect
  - gateway filters tools based on declared capabilities
  - missing PC deps handled: checks for xdotool/import; browser extension presence
- Universal installer updated:
  - Prompts: enable browser control? enable computer control?
  - Optional per-node sexec permissions quick-edit (allow/deny/ask comma patterns)
  - Writes config.json with enable_browser, enable_computer_control and permissions JSON
- Fixes:
  - agent formerly advertised capability key 'browser'; gateway expects 'browser_control' → aligned
  - installers baked agent compressed with base64 in shell script, no external files

Files:
- node-agent/: new ComputerController class; enhanced hermes_node_agent.py
- ~/.hermes/plugins/hermes-node-gateway/: added COMPUTER_CONTROL_SCHEMA + handlers + routing
- install_hermes_node_universal.sh (merged in repo dist/ location)

Node Agent endpoint type: "computer_control" commands (gateway forwards to node agent)
Gateway: registers tool 'computer_control' with schema and executes cc_result responses handled

Deployed nodes can now:
  - Execute shell commands via sexec (preserves existing allow/ask/deny)
  - Control browsers (if extension installed)
  - Control desktop (if X11 + xdotool + ImageMagick installed)

Hermes Gateway plugin now exposes 4 tools:
  node_list, node_status, node_exec, browser_control, computer_control ✓
Co-authored-by: 's avatarLisa <lisa@nexlab.net>
Signed-off-by: 's avatarLisa <lisa@nexlab.net>
parents
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/bin/sh
# /etc/init.d/hermes-node-agent
# SysVinit script for Hermes Node Agent
#
# chkconfig: 2345 95 05
# description: Hermes Node Agent reverse-connected WebSocket client
### BEGIN INIT INFO
# Provides: hermes-node-agent
# Required-Start: $network $remote_fs $syslog
# Required-Stop: $network $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Hermes Node Agent
# Description: Reverse-connected WebSocket node agent.
### END INIT INFO
NAME="hermes-node-agent"
DAEMON="/usr/bin/python3"
SCRIPT_DIR="/usr/local/bin"
DAEMON_SCRIPT="${SCRIPT_DIR}/hermes_node_agent.py"
PIDFILE="/var/run/${NAME}.pid"
LOGFILE="/var/log/${NAME}.log"
USER="root"
GROUP="root"
# Check daemon exists
if [ ! -x "$DAEMON" ]; then
echo "$DAEMON not found or not executable."
exit 5
fi
if [ ! -f "$DAEMON_SCRIPT" ]; then
echo "$DAEMON_SCRIPT not found."
exit 5
fi
# Ensure config exists
if [ ! -f "/etc/hermes-node/config.json" ]; then
echo "/etc/hermes-node/config.json not found."
exit 6
fi
. /lib/lsb/init-functions 2>/dev/null || true
start() {
echo "Starting $NAME..."
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
echo "$NAME is already running (PID $PID)."
return 0
else
rm -f "$PIDFILE"
fi
fi
touch "$LOGFILE"
chown "$USER:$GROUP" "$LOGFILE" 2>/dev/null || chmod 644 "$LOGFILE"
$DAEMON $DAEMON_SCRIPT >> "$LOGFILE" 2>&1 &
echo $! > "$PIDFILE"
sleep 1
if kill -0 $(cat "$PIDFILE") 2>/dev/null; then
echo "$NAME started (PID $(cat $PIDFILE))."
else
echo "$NAME failed to start. Check $LOGFILE"
exit 1
fi
}
stop() {
echo "Stopping $NAME..."
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
kill "$PID" 2>/dev/null
for i in $(seq 1 30); do
if ! kill -0 "$PID" 2>/dev/null; then
break
fi
sleep 0.5
done
if kill -0 "$PID" 2>/dev/null; then
echo "Force killing..."
kill -9 "$PID" 2>/dev/null
sleep 1
fi
fi
rm -f "$PIDFILE"
echo "$NAME stopped."
else
echo "$NAME is not running."
fi
pkill -f "hermes_node_agent.py" 2>/dev/null || true
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
sleep 1
start
;;
reload|force-reload)
echo "Reload not supported, restarting..."
stop
sleep 1
start
;;
status)
RUNNING=0
if [ -f "$PIDFILE" ]; then
PID=$(cat "$PIDFILE")
if kill -0 "$PID" 2>/dev/null; then
echo "$NAME is running (PID $PID)."
RUNNING=1
else
echo "$NAME is not running (stale PID file)."
RUNNING=0
fi
else
PID=$(pgrep -f "hermes_node_agent.py" | head -1)
if [ -n "$PID" ]; then
echo "$NAME is running (PID $PID) but no PID file."
RUNNING=1
else
echo "$NAME is not running."
RUNNING=0
fi
fi
exit $(( 1 - RUNNING ))
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac
exit 0
[Unit]
Description=Hermes Node Agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/bin/hermes-node-agent
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/tmp
[Install]
WantedBy=default.target
// Hermes Extension Background Service Worker
// Provides bidirectional communication between CDP and content scripts
const PORT_NAME = 'hermes_agent_port';
let ports = new Map();
let messageQueue = [];
chrome.runtime.onConnect.addListener((port) => {
if (port.name === PORT_NAME) {
const tabId = port.sender?.tab?.id;
console.log('[Hermes] Port connected from tab:', tabId);
if (tabId) ports.set(tabId, port);
port.onMessage.addListener((msg) => {
console.log('[Hermes] Message from tab', tabId, ':', msg);
if (msg.type === 'hermes_injected_ready') {
// Inject script reported ready
}
});
port.onDisconnect.addListener(() => {
console.log('[Hermes] Port disconnected from tab:', tabId);
ports.delete(tabId);
});
}
});
// Message from external (CDP Runtime.evaluate)
chrome.runtime.onMessageExternal.addListener((msg, sender, sendResponse) => {
if (msg.type && msg.type.startsWith('hermes_')) {
handleHermesMessage(msg, sender, sendResponse);
return true; // async response
}
});
function handleHermesMessage(msg, sender, sendResponse) {
switch (msg.type) {
case 'hermes_eval_in_page':
// Execute JS in all tabs
chrome.tabs.query({}, (tabs) => {
tabs.forEach(tab => {
if (ports.has(tab.id)) {
ports.get(tab.id).postMessage({
type: 'hermes_exec',
script: msg.script,
id: msg.id
});
}
});
sendResponse({status: 'sent'});
});
break;
case 'hermes_get_info':
sendResponse({
status: 'ok',
extension: 'hermes_browser_agent',
version: '1.0',
tabs: Array.from(ports.keys())
});
break;
default:
sendResponse({status: 'unknown'});
}
}
// CDP can call chrome.runtime.sendMessage via Runtime.evaluate
// This allows remote commands to trigger extension actions
console.log('[Hermes] Background service worker initialized');
// Hermes Content Script - runs in every page
// Establishes communication channel with injected code
const PORT_NAME = 'hermes_agent_port';
const port = chrome.runtime.connect({name: PORT_NAME});
port.onMessage.addListener((msg) => {
if (msg.type === 'hermes_exec') {
// Execute script in page context
try {
const result = eval(msg.script);
// Result is not sent back via port (no return mechanism in MV3)
// Use CDP Runtime.evaluate for return values
} catch (e) {
console.error('[Hermes] Script execution error:', e);
}
}
});
// Signal that content script is loaded
port.postMessage({type: 'hermes_content_ready'});
console.log('[Hermes] Content script loaded');
// Hermes Injected API - exposed to page JavaScript context
// Allows page scripts to communicate with the extension and agent
window.HermesAgent = {
version: '1.0',
// Execute code via extension (safer than direct eval)
execute: async function(script) {
return new Promise((resolve) => {
const msg = {type: 'hermes_exec', script, id: Date.now()};
// CDP Runtime.evaluate is the primary channel, this is secondary
console.log('[Hermes] execute called:', script);
resolve({ok: true, note: 'use CDP Runtime.evaluate for results'});
});
},
// Get page info
getInfo: async function() {
return {
url: window.location.href,
title: document.title,
domain: window.location.hostname,
referrer: document.referrer,
timestamp: Date.now()
};
},
// Helper: wait for selector
waitForSelector: function(selector, timeout = 5000) {
return new Promise((resolve, reject) => {
const start = Date.now();
const check = () => {
const el = document.querySelector(selector);
if (el) {
resolve(el);
} else if (Date.now() - start > timeout) {
reject(new Error('Timeout waiting for: ' + selector));
} else {
requestAnimationFrame(check);
}
};
check();
});
},
// Helper: fill form
fillForm: function(selector, value) {
const el = document.querySelector(selector);
if (el) {
el.value = value;
el.dispatchEvent(new Event('input', {bubbles: true}));
el.dispatchEvent(new Event('change', {bubbles: true}));
return true;
}
return false;
}
};
console.log('[Hermes] Injected API loaded - window.HermesAgent available');
// Notify background script
if (chrome && chrome.runtime) {
chrome.runtime.sendMessage({
type: 'hermes_injected_ready',
url: window.location.href
});
}
{
"manifest_version": 3,
"name": "Hermes Node Agent Extension",
"version": "1.0",
"description": "Hermes agent helper - provides CDP communication and JS injection utilities for remote browser automation.",
"permissions": [
"storage",
"scripting",
"activeTab",
"tabs",
"webNavigation"
],
"host_permissions": [
"<all_urls>"
],
"background": {
"service_worker": "background.js",
"type": "module"
},
"content_scripts": [
{
"matches": [
"<all_urls>"
],
"js": [
"content.js"
],
"run_at": "document_start"
}
],
"web_accessible_resources": [
{
"resources": [
"injected.js"
],
"matches": [
"<all_urls>"
]
}
]
}
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/bin/bash
# Hermes Node Agent Installation Script
# Installs the node agent on a remote machine
set -e
echo "=== Hermes Node Agent Installer ==="
echo ""
# Check if running as root
if [ "$EUID" -eq 0 ]; then
echo "ERROR: Do not run as root. Run as the user who will run the agent."
exit 1
fi
# Check for Python 3
if ! command -v python3 &> /dev/null; then
echo "ERROR: Python 3 is required but not found."
exit 1
fi
# Check for pip
if ! command -v pip3 &> /dev/null; then
echo "ERROR: pip3 is required but not found."
echo "Install with: sudo apt install python3-pip"
exit 1
fi
# Install websockets library
echo "[1/6] Installing Python dependencies..."
apt-get update
apt-get install -y python3-websockets
# Create config directory
echo "[2/6] Creating config directory..."
sudo mkdir -p /etc/hermes-node
sudo chown $USER:$USER /etc/hermes-node
# Copy agent script
echo "[3/6] Installing agent script..."
sudo cp hermes_node_agent.py /usr/local/bin/hermes-node-agent
sudo chmod +x /usr/local/bin/hermes-node-agent
# Create example config if it doesn't exist
if [ ! -f /etc/hermes-node/config.json ]; then
echo "[4/6] Creating example config..."
cat > /etc/hermes-node/config.json <<EOF
{
"gateway_url": "ws://192.168.42.115:8765",
"node_name": "$(hostname)",
"token": "CHANGE-ME-$(openssl rand -hex 16)",
"sexec_path": "$HOME/.openclaw/skills/sexec/sexec.sh",
"reconnect_interval": 5,
"heartbeat_interval": 30
}
EOF
echo " ⚠️ Config created at /etc/hermes-node/config.json"
echo " ⚠️ EDIT THIS FILE: Set gateway_url, node_name, and token"
else
echo "[4/6] Config already exists, skipping..."
fi
# Install SysV init service
echo "[5/6] Installing SysV init service..."
sudo cp hermes-node-agent.init.d /etc/init.d/hermes-node-agent
sudo chmod +x /etc/init.d/hermes-node-agent
sudo update-rc.d hermes-node-agent defaults 2>/dev/null || true
# Enable but don't start (user needs to configure first)
echo "[6/6] Service configured..."
echo ""
echo "✅ Installation complete!"
echo ""
echo "Next steps:"
echo " 1. Edit /etc/hermes-node/config.json with your gateway URL and token"
echo " 2. Ensure sexec.sh is installed at the configured path"
echo " 3. Start the agent: /etc/init.d/hermes-node-agent start"
echo " 4. Check status: /etc/init.d/hermes-node-agent status"
echo " 5. View logs: tail -f /var/log/hermes-node-agent.log"
echo ""
#!/usr/bin/env python3
"""Test browser controller"""
import asyncio
from browser_controller import BrowserController
async def test():
controller = BrowserController()
try:
print("Initializing Playwright...")
await controller.initialize()
print("✅ Playwright initialized")
print("\nLaunching browser (headless)...")
result = await controller.launch({"headless": True})
print(f"✅ {result}")
print("\nCreating context...")
result = await controller.create_context({"name": "test_ctx"})
print(f"✅ {result}")
page_id = result.get("page_id")
print(f"\nNavigating to example.com (page_id={page_id})...")
result = await controller.navigate(page_id, "https://example.com")
print(f"✅ {result}")
print("\nGetting title...")
result = await controller.get_title(page_id)
print(f"✅ {result}")
print("\nTaking screenshot...")
result = await controller.screenshot(page_id, full_page=True)
print(f"✅ Screenshot captured ({len(result.get('screenshot', ''))} chars base64)")
print("\nClosing browser...")
await controller.close()
print("✅ Browser closed")
print("\n🎉 All tests passed!")
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(test())
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment