This is the Android sibling of the Linux `hermes-node-agent`: an outbound-only node that connects to the Hermes Node Gateway over WSS, registers Android-safe capabilities, and executes supported actions without requiring inbound firewall rules, SSH, or rooted devices.
This project is the Android sibling of the Linux `hermes-node-agent`. The goal is **functional parity where Android permits it**, not a toy “status-only” agent.
The gateway should not need a new “phone_control” tool just because the node is Android. Where the existing capability names already describe the intent, Android should reuse them and implement Android-native semantics behind the same wire contract.
## Current status
Project scaffold. No remote Git repository is configured yet; the GitLab remote will be added later when provided.
Early Android/Kotlin scaffold with protocol/capability classes. The project already has a GitLab remote:
```text
git@git.nexlab.net:lisa/hermes-node-android.git
```
## Capability naming decision
Use existing gateway capability names where practical:
-`computer_control` = interactive device control.
- On Linux: mouse, keyboard, X11 desktop actions.
- On Android: touch/input abstraction, launch apps, open intents, press Back/Home/Recents, type text, tap/swipe, optionally accessibility-backed UI actions.
- On Linux: active window, cursor, screen geometry, screenshots.
- On Android: foreground app/activity where available, display metrics, orientation, screen state, screenshots where permission/API allows.
-`audio_control` = device/media audio operations.
- On Android: media session controls, volume, audio route/status, playback of provided audio, mic capture only with permission and visible UX.
-`camera_control` = camera read/write paths where supported.
- On Android: list cameras, capture frame/video with CameraX/Camera2 and explicit permission/visible UX. Virtual camera injection is not an Android baseline feature.
-`browser_control` = browser automation if implemented later via Chrome DevTools/WebView/intent flows. For first pass, browser launching/open URL can live under `computer_control` actions.
-`exec` = Android shell execution if available, disabled by default and honest about non-root limitations.
This avoids gateway-side churn while still letting the gateway inspect `capability_info.platform == "android"` when it needs platform-specific hints.
## Design goals
-**Protocol-compatible** with the existing Hermes Node Gateway JSON/WebSocket protocol.
-**Protocol-compatible** with existing Hermes Node Gateway JSON/WebSocket protocol.
-**Outbound-only** connection from Android to gateway: `wss://gateway:8765`.
-**No root requirement** for baseline Android operation.
-**Capability-first architecture** so Android can expose what is safe/available:
-`android_status` / device info
- notifications bridge, if explicitly enabled
- media/audio controls where Android APIs permit it
- camera capture only with explicit permission and visible UX
- file operations limited to app-accessible storage / SAF grants
- exec only for non-root shell where available, and disabled by default
-**User-owned configuration** inside app storage, never hardcoded tokens.
-**Visible foreground service** for persistent connectivity, because Android kills hidden background agents.
-**Functional parity with Linux capabilities where Android permits it.**
-**No root requirement** for baseline operation.
-**Foreground service** for persistent connectivity.
-**No hidden capture or spyware behavior.** If Android requires a visible permission prompt or foreground notification, we respect that.
-**No hardcoded tokens.** Config is app-local and user-supplied.
## Non-goals for the first milestone
## First implementation targets
- Root-only device control.
- Silent spyware-style capture.
- Full Linux `exec` parity.
- Bypassing Android permission prompts or background limits.
### `computer_control`
Android is a different security model, not just Linux with a touchscreen. We'll keep this boring and explicit rather than building a brittle hack pile.
Android-native actions:
## Proposed architecture
-`launch_app` by package name.
-`open_url` via browser intent.
-`open_intent` for explicit/implicit intents.
-`key_press` for Android navigation keys where permitted: Back, Home, Recents, Enter, Volume Up/Down.
-`type_text` through accessibility/IME path where available.
-`tap`, `swipe`, `long_press` through accessibility gesture dispatch where enabled.
-`get_active_window` mapped to foreground app/activity when available.
Android capabilities must map to Android's permission and lifecycle model. Linux parity is not the goal; protocol compatibility with honest capability advertisement is.
The Android agent should expose **the same gateway capability names** as Linux wherever the gateway semantics are already good enough. The implementation behind those names is Android-native.
-`play_audio`: play gateway-provided audio through MediaPlayer/ExoPlayer.
-`capture_input`: microphone recording with runtime permission and foreground UX.
-`capture_output`: return unsupported unless Android API/app role permits it.
### `camera_control`
Actions to support:
-`list_cameras`: Camera2/CameraX camera inventory.
-`get_camera_status`: backend/permission state.
-`capture_frame`: explicit camera capture.
-`capture_video`: explicit video capture.
-`inject_video`: unsupported on normal Android; no fake parity.
### `browser_control`
Not first milestone. Basic browser launching/open URL belongs in `computer_control.open_url`. Full browser automation can be added later if there is a clean Android backend.
### `exec`
Disabled by default. Android shell execution is SELinux/app-sandbox constrained and not Linux-equivalent. If enabled later, it must implement the same allow/deny/ask model as Linux and clearly report limitations.
## Permission model
Android requires explicit grants for the interesting bits:
- AccessibilityService for gestures, key actions, UI inspection.
- UsageStats or Accessibility for foreground app/activity.
- MediaProjection for screenshots/screen recording.
- Camera permission for capture.
- Microphone permission for input capture.
- Notification listener permission for notification bridge if added.
The app must surface these as setup/status checks, not pretend they are available.
Hermes Node Android should speak the same JSON/WebSocket protocol as the Linux node agent where the semantics match.
Hermes Node Android should speak the same JSON/WebSocket protocol as the Linux node agent.
## Shared messages
## Registration
-`register`
-`register_ack`
-`heartbeat`
-`heartbeat_ack`
- command messages routed by capability
## Android-specific principle
The Android agent should advertise Android-specific capabilities rather than pretending to support Linux capabilities. Gateway-side tools can route to these capabilities once implemented.
## Initial registration example
The Android node registers existing tool names:
```json
{
"type":"register",
"node_name":"phone",
"version":"0.1.0-android",
"capabilities":["android_status"],
"capability_info":{
"tools":[
"computer_control",
"desktop_observe",
"audio_control",
"camera_control"
],
"capabilities":{
"platform":"android",
"enable_exec":false,
"enable_camera_control":false,
"enable_audio_control":false
"computer_control":{
"available":true,
"platform_semantics":"android_phone_control"
}
}
}
```
## Wire compatibility rule
Use existing message/result types:
-`computer_control` → `computer_control_result`
-`desktop_observe` → `desktop_observe_result`
-`audio_control` → `audio_control_result`
-`camera_control` → `camera_control_result`
-`exec` → existing exec output/complete flow, if later enabled
Do not introduce `phone_control` unless the gateway needs a user-facing alias. The Android node can expose Android-only actions inside `computer_control` while preserving the gateway-side tool name.
## Android-specific action examples
### Launch an app
```json
{
"type":"computer_control",
"id":"cmd-1",
"action":"launch_app",
"params":{
"package":"org.telegram.messenger"
}
}
```
### Open a URL
```json
{
"type":"computer_control",
"id":"cmd-2",
"action":"open_url",
"params":{
"url":"https://lisa.nexlab.net/"
}
}
```
### Get screen info
```json
{
"type":"desktop_observe",
"id":"cmd-3",
"action":"screen_info",
"params":{}
}
```
## Compatibility principle
Same capability names, honest platform metadata, explicit unsupported errors for Android-impossible actions. That keeps the gateway stable while giving Android real functionality.