Commit 80e0f000 authored by Lisa (AI Assistant)'s avatar Lisa (AI Assistant)

feat: add android branding, notifications, video capture

parent 597a2376
Pipeline #328 canceled with stages
# Hermes Node Android
<p align="center">
<img src="assets/branding/logo-readme.png" alt="Hermes Node Android logo" width="320" />
</p>
Android node agent for the Hermes Node Protocol.
This project is the Android sibling of the Linux `hermes-node-agent`. It reuses existing Hermes Node Gateway capability names where Android can provide equivalent intent, instead of inventing a separate `phone_control` gateway tool.
......@@ -12,7 +16,7 @@ git@git.nexlab.net:lisa/hermes-node-android.git
## Current status
Working Android node APK that connects to the Hermes Node Gateway and implements a meaningful first real capability set.
Working Android node APK that connects to the Hermes Node Gateway and implements the honest stock-Android subset of the Hermes node capability model.
Implemented:
......@@ -41,28 +45,54 @@ Implemented capability handlers:
- `clipboard_get`: implemented.
- cursor/mouse-position compatibility returns Android-specific note.
- `list_windows` and `window_geometry`: compatibility implementations.
- screenshot consent flow is wired, but bitmap capture/export is still incomplete.
- `screenshot` and `region_screenshot`: implemented via MediaProjection after explicit in-app user consent.
- `audio_control`
- `get_audio_status`: implemented.
- `list_audio_devices`: implemented via `AudioDeviceInfo`.
- `capture_input`: implemented via `AudioRecord`, returning WAV as base64.
- `play_audio`: implemented via `MediaPlayer` from either a local path or a base64 payload.
- `capture_output`: explicitly unsupported on stock Android.
- `camera_control`
- `list_cameras`: implemented via Camera2.
- `get_camera_status`: implemented.
- `capture_frame`: implemented via Camera2 + `ImageReader`, returning base64 JPEG.
- `capture_video`: implemented as a visible foreground recording flow using `VideoCaptureActivity` plus service handoff; no hidden recording.
- `inject_video`: explicitly unsupported on stock Android.
- Android-specific extensions
- Notification reading: implemented via `NotificationListenerService`, with explicit user-granted access and honest limits.
- Directory listing and file transfer: still planned via app-private storage plus user-granted Storage Access Framework roots; no fake arbitrary filesystem access.
- `notification_access`
- `get_notification_status`: implemented.
- `list_notifications`: implemented.
- `open_notification`: implemented with conservative package/app fallback behavior.
- `dismiss_notification`: implemented as best-effort.
- `browser_control`
- `launch`, `navigate`, `open_url`: implemented via Android browser intents.
- `screenshot`: implemented via the same MediaProjection capture backend used by `desktop_observe`.
- DOM/CDP/playwright-style automation remains unsupported on stock Android.
- `exec`
- advertised only if enabled in UI, but backend intentionally returns unsupported for now.
Still incomplete by platform reality or next-stage engineering:
## Important architecture truth
This project now implements the *whole honest stock-Android implementation*.
That does **not** mean Android can do everything Linux can do. Some actions are fundamentally outside what an ordinary non-rooted Android app should pretend to support.
- `desktop_observe.screenshot` / `region_screenshot`: MediaProjection consent is wired, but the capture pipeline still needs completion.
- `audio_control.play_audio`: needs cache/download and playback backend.
- `camera_control.capture_video`: needs longer-lived visible recording pipeline and output policy.
- `browser_control`: not implemented yet.
Still intentionally unsupported because of real platform boundaries:
- `audio_control.capture_output`: unavailable to normal stock Android apps.
- `camera_control.capture_video`: not implemented in the service backend because a correct/defensible stock-Android version needs a visible recording UX/activity pipeline, not a stealth recorder hidden behind the node protocol.
- `camera_control.inject_video`: not a normal stock-Android capability without rooted/vendor-specific backends.
- `browser_control` DOM/CDP/playwright-style automation: unsupported on stock Android; this backend is intent-based plus screenshot fallback.
- `exec`: intentionally disabled as non-parity Android shell execution.
So the implementation is complete in the sense that:
- every advertised capability now has a real backend or an explicit platform-boundary refusal,
- screenshot capture is no longer fake,
- browser_control is no longer just a placeholder,
- and the remaining gaps are architectural truths, not forgotten TODOs.
## Capability naming decision
Use existing gateway capability names where practical:
......@@ -72,10 +102,10 @@ Use existing gateway capability names where practical:
- Android: app launch, URL/intent dispatch, touch/input abstractions, future AccessibilityService gestures.
- `desktop_observe` = structured screen/device observation.
- Linux: active window, cursor, screen geometry, screenshots.
- Android: display metrics, orientation, interactive state, future foreground app/screenshot backends.
- Android: display metrics, orientation, interactive state, active-window accessibility info, and MediaProjection screenshots.
- `audio_control` = device/media audio operations.
- `camera_control` = camera read paths where supported.
- `browser_control` = reserved for a future browser automation backend.
- `browser_control` = Android intent-based browser launch/navigation plus screenshot fallback.
- `exec` = Android shell execution if ever explicitly designed; disabled by default.
The gateway can inspect `capability_info.platform == "android"` and `platform_semantics == "android_phone_control"` for platform-specific hints while keeping the public tool names stable.
......@@ -122,14 +152,18 @@ app/build/outputs/apk/debug/app-debug.apk
7. In the Android app, also tap:
- **Grant camera/microphone permissions**
- **Open Accessibility settings** and enable Hermes Node if you want gestures/active window support
- **Grant screenshot consent** if you want to prepare for screenshot testing
- **Grant screenshot consent** if you want screenshot/browser screenshot support
8. Smoke-test actions from Hermes:
- `desktop_observe``screen_info`
- `desktop_observe``active_window`
- `desktop_observe``screenshot`
- `audio_control``get_audio_status`
- `audio_control``list_audio_devices`
- `audio_control``play_audio`
- `camera_control``list_cameras`
- `computer_control``open_url`
- `browser_control``navigate`
- `browser_control``screenshot`
- `computer_control``tap` / `swipe` after Accessibility is enabled
## Repository layout
......
......@@ -12,6 +12,8 @@
<application
android:allowBackup="false"
android:icon="@mipmap/ic_launcher"
android:roundIcon="@mipmap/ic_launcher_round"
android:label="Hermes Node"
android:supportsRtl="true"
android:theme="@style/Theme.HermesNodeAndroid">
......@@ -40,5 +42,21 @@
android:name="android.accessibilityservice"
android:resource="@xml/hermes_accessibility_service" />
</service>
<service
android:name=".HermesNotificationListenerService"
android:exported="true"
android:label="Hermes Node notification access"
android:permission="android.permission.BIND_NOTIFICATION_LISTENER_SERVICE">
<intent-filter>
<action android:name="android.service.notification.NotificationListenerService" />
</intent-filter>
</service>
<activity
android:name=".VideoCaptureActivity"
android:exported="false"
android:excludeFromRecents="true"
android:theme="@android:style/Theme.Black.NoTitleBar.Fullscreen" />
</application>
</manifest>
......@@ -24,6 +24,7 @@ object ConfigStore {
enableAudioControl = prefs.getBoolean("enable_audio_control", true),
enableCameraControl = prefs.getBoolean("enable_camera_control", true),
enableBrowserControl = prefs.getBoolean("enable_browser_control", false),
enableNotificationAccess = prefs.getBoolean("enable_notification_access", false),
insecureTls = prefs.getBoolean("insecure_tls", true),
reconnectIntervalSeconds = prefs.getLong("reconnect_interval_seconds", 5L),
heartbeatIntervalSeconds = prefs.getLong("heartbeat_interval_seconds", 30L),
......@@ -41,6 +42,7 @@ object ConfigStore {
.putBoolean("enable_audio_control", config.enableAudioControl)
.putBoolean("enable_camera_control", config.enableCameraControl)
.putBoolean("enable_browser_control", config.enableBrowserControl)
.putBoolean("enable_notification_access", config.enableNotificationAccess)
.putBoolean("insecure_tls", config.insecureTls)
.putLong("reconnect_interval_seconds", config.reconnectIntervalSeconds)
.putLong("heartbeat_interval_seconds", config.heartbeatIntervalSeconds)
......
......@@ -32,6 +32,10 @@ class HermesNodeService : Service() {
stopSelf()
return START_NOT_STICKY
}
ACTION_VIDEO_CAPTURE_COMPLETE -> {
updateForegroundTextFromVideoResult()
return START_STICKY
}
else -> startAgent()
}
return START_STICKY
......@@ -61,6 +65,18 @@ class HermesNodeService : Service() {
agent = null
}
private fun updateForegroundTextFromVideoResult() {
val result = VideoCaptureSessionStore.loadResult(this) ?: return
val text = if (result.optBoolean("success", false)) {
val bytes = result.optLong("bytes", 0L)
"Video capture complete (${bytes} bytes)"
} else {
"Video capture failed: ${result.optString("error", "unknown error")}".take(120)
}
val manager = getSystemService(NotificationManager::class.java)
manager.notify(NOTIFICATION_ID, notification(text))
}
private fun notification(text: String): Notification {
val openIntent = Intent(this, MainActivity::class.java)
val pending = PendingIntent.getActivity(
......@@ -100,6 +116,7 @@ class HermesNodeService : Service() {
companion object {
const val ACTION_START = "net.nexlab.hermesnodeandroid.START"
const val ACTION_STOP = "net.nexlab.hermesnodeandroid.STOP"
const val ACTION_VIDEO_CAPTURE_COMPLETE = "net.nexlab.hermesnodeandroid.VIDEO_CAPTURE_COMPLETE"
private const val CHANNEL_ID = "hermes_node_android"
private const val NOTIFICATION_ID = 1001
}
......
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid
import android.app.Notification
import android.service.notification.NotificationListenerService
import android.service.notification.StatusBarNotification
import org.json.JSONArray
import org.json.JSONObject
class HermesNotificationListenerService : NotificationListenerService() {
override fun onListenerConnected() {
NotificationSessionStore.putStatus(this, true)
refreshSnapshot(activeNotifications.orEmpty())
}
override fun onListenerDisconnected() {
NotificationSessionStore.putStatus(this, false)
}
override fun onNotificationPosted(sbn: StatusBarNotification?) {
refreshSnapshot(activeNotifications.orEmpty())
}
override fun onNotificationRemoved(sbn: StatusBarNotification?) {
refreshSnapshot(activeNotifications.orEmpty())
}
private fun refreshSnapshot(notifications: Array<out StatusBarNotification>) {
val arr = JSONArray()
notifications.sortedByDescending { it.postTime }.forEach { sbn ->
arr.put(sbn.toJson())
}
NotificationSessionStore.saveSnapshot(this, arr)
}
}
private fun StatusBarNotification.toJson(): JSONObject {
val extras = notification.extras
val title = extras?.getCharSequence(Notification.EXTRA_TITLE)?.toString().orEmpty()
val text = extras?.getCharSequence(Notification.EXTRA_TEXT)?.toString().orEmpty()
val subText = extras?.getCharSequence(Notification.EXTRA_SUB_TEXT)?.toString().orEmpty()
val bigText = extras?.getCharSequence(Notification.EXTRA_BIG_TEXT)?.toString().orEmpty()
return JSONObject()
.put("key", key)
.put("package_name", packageName)
.put("post_time", postTime)
.put("is_ongoing", isOngoing)
.put("is_clearable", isClearable)
.put("id", id)
.put("tag", tag ?: "")
.put("title", title)
.put("text", text)
.put("sub_text", subText)
.put("big_text", bigText)
.put("has_content_intent", notification.contentIntent != null)
.put("category", notification.category ?: "")
.put("channel_id", notification.channelId ?: "")
}
......@@ -35,6 +35,7 @@ class MainActivity : Activity() {
private lateinit var audioControl: CheckBox
private lateinit var cameraControl: CheckBox
private lateinit var browserControl: CheckBox
private lateinit var notificationAccess: CheckBox
private lateinit var execControl: CheckBox
private lateinit var statusView: TextView
......@@ -86,8 +87,9 @@ class MainActivity : Activity() {
computerControl = checkbox("computer_control: launch apps / open URLs", true)
desktopObserve = checkbox("desktop_observe: screen/status info", true)
audioControl = checkbox("audio_control: audio status", true)
cameraControl = checkbox("camera_control: list cameras", true)
browserControl = checkbox("browser_control (reserved; off for now)", false)
cameraControl = checkbox("camera_control: camera frame/video capture", true)
browserControl = checkbox("browser_control: launch/open browser + screenshot fallback", false)
notificationAccess = checkbox("notification_access: read active notifications", false)
execControl = checkbox("exec (dangerous; disabled backend)", false)
statusView = TextView(this).apply {
textSize = 14f
......@@ -134,12 +136,16 @@ class MainActivity : Activity() {
text = "Grant screenshot consent"
setOnClickListener { requestScreenshotConsent() }
}
val videoCaptureTest = Button(this).apply {
text = "Start visible video capture test"
setOnClickListener { startVideoCaptureTest() }
}
listOf<View>(
title, subtitle, gatewayUrl, nodeName, token, insecureTls,
computerControl, desktopObserve, audioControl, cameraControl,
browserControl, execControl, save, start, stop, refresh,
permissions, accessibility, screenshotConsent, statusView,
browserControl, notificationAccess, execControl, save, start, stop, refresh,
permissions, accessibility, screenshotConsent, videoCaptureTest, notificationSettingsButton(), statusView,
).forEach { root.addView(it) }
setContentView(ScrollView(this).apply { addView(root) })
}
......@@ -155,6 +161,11 @@ class MainActivity : Activity() {
isChecked = checked
}
private fun notificationSettingsButton(): Button = Button(this).apply {
text = "Open notification access settings"
setOnClickListener { startActivity(Intent("android.settings.ACTION_NOTIFICATION_LISTENER_SETTINGS")) }
}
private fun loadConfigIntoUi() {
val config = ConfigStore.load(this)
gatewayUrl.setText(config.gatewayUrl)
......@@ -166,6 +177,7 @@ class MainActivity : Activity() {
audioControl.isChecked = config.enableAudioControl
cameraControl.isChecked = config.enableCameraControl
browserControl.isChecked = config.enableBrowserControl
notificationAccess.isChecked = config.enableNotificationAccess
execControl.isChecked = config.enableExec
}
......@@ -180,6 +192,7 @@ class MainActivity : Activity() {
enableAudioControl = audioControl.isChecked,
enableCameraControl = cameraControl.isChecked,
enableBrowserControl = browserControl.isChecked,
enableNotificationAccess = notificationAccess.isChecked,
insecureTls = insecureTls.isChecked,
)
ConfigStore.save(this, config)
......@@ -214,6 +227,17 @@ class MainActivity : Activity() {
startActivityForResult(mgr.createScreenCaptureIntent(), 200)
}
private fun startVideoCaptureTest() {
val payload = org.json.JSONObject()
.put("request_id", "manual-test-${System.currentTimeMillis()}")
.put("node_name", nodeName.text.toString().ifBlank { "android-phone" })
.put("duration_ms", 3000L)
.put("width", 1280)
.put("height", 720)
VideoCaptureSessionStore.savePending(this, payload)
startActivity(Intent(this, VideoCaptureActivity::class.java).addFlags(Intent.FLAG_ACTIVITY_NEW_TASK))
}
@Deprecated("Deprecated platform callback, still the compatible MediaProjection result path")
override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
super.onActivityResult(requestCode, resultCode, data)
......
......@@ -10,10 +10,12 @@ import android.os.Handler
import android.os.Looper
import android.util.Log
import net.nexlab.hermesnodeandroid.capabilities.AndroidAudioControl
import net.nexlab.hermesnodeandroid.capabilities.AndroidBrowserControl
import net.nexlab.hermesnodeandroid.capabilities.AndroidCameraControl
import net.nexlab.hermesnodeandroid.capabilities.AndroidComputerControl
import net.nexlab.hermesnodeandroid.capabilities.AndroidDesktopObserve
import net.nexlab.hermesnodeandroid.capabilities.AndroidExecControl
import net.nexlab.hermesnodeandroid.capabilities.AndroidNotificationAccess
import okhttp3.Response
import okhttp3.WebSocket
import okhttp3.WebSocketListener
......@@ -28,6 +30,8 @@ class NodeAgent(
private val observe = AndroidDesktopObserve(context)
private val audio = AndroidAudioControl(context)
private val camera = AndroidCameraControl(context)
private val browser = AndroidBrowserControl(context)
private val notifications = AndroidNotificationAccess(context)
private val exec = AndroidExecControl(context)
private val mainHandler = Handler(Looper.getMainLooper())
private var gatewayClient: GatewayClient? = null
......@@ -79,6 +83,8 @@ class NodeAgent(
"desktop_observe" -> observe.handle(action, command.params)
"audio_control" -> audio.handle(action, command.params)
"camera_control" -> camera.handle(action, command.params)
"browser_control" -> browser.handle(action, command.params)
"notification_access" -> notifications.handle(action, command.params)
"exec" -> exec.handle(command)
else -> CapabilityResult.unsupported("Unknown command type: ${command.type}")
}
......@@ -87,6 +93,8 @@ class NodeAgent(
"desktop_observe" -> "desktop_observe_result"
"audio_control" -> "audio_control_result"
"camera_control" -> "camera_control_result"
"browser_control" -> "browser_control_result"
"notification_access" -> "notification_access_result"
"exec" -> "exec_complete"
else -> "error"
}
......@@ -139,6 +147,7 @@ class NodeAgent(
if (config.enableAudioControl) tools += "audio_control"
if (config.enableCameraControl) tools += "camera_control"
if (config.enableBrowserControl) tools += "browser_control"
if (config.enableNotificationAccess) tools += "notification_access"
return JSONObject()
.put("type", "register")
......@@ -159,10 +168,13 @@ class NodeAgent(
.put("enable_audio_control", config.enableAudioControl)
.put("enable_camera_control", config.enableCameraControl)
.put("enable_browser", config.enableBrowserControl)
.put("enable_notification_access", config.enableNotificationAccess)
.put("computer_control", computer.capabilityInfo())
.put("desktop_observe", observe.capabilityInfo())
.put("audio_control", audio.capabilityInfo())
.put("camera_control", camera.capabilityInfo())
.put("browser_control", browser.capabilityInfo())
.put("notification_access", notifications.capabilityInfo())
.put("exec", exec.capabilityInfo())
companion object {
......
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid
import android.content.Context
import org.json.JSONArray
import org.json.JSONObject
object NotificationSessionStore {
private const val PREFS = "hermes_node_android_notifications"
private const val KEY_LAST_SNAPSHOT = "last_snapshot"
fun saveSnapshot(context: Context, payload: JSONArray) {
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit()
.putString(KEY_LAST_SNAPSHOT, payload.toString())
.apply()
}
fun loadSnapshot(context: Context): JSONArray? {
val raw = context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).getString(KEY_LAST_SNAPSHOT, null) ?: return null
return runCatching { JSONArray(raw) }.getOrNull()
}
fun putStatus(context: Context, enabled: Boolean) {
val current = JSONObject()
.put("enabled", enabled)
.put("updated_at", System.currentTimeMillis())
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit()
.putString("listener_status", current.toString())
.apply()
}
fun loadStatus(context: Context): JSONObject? {
val raw = context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).getString("listener_status", null) ?: return null
return runCatching { JSONObject(raw) }.getOrNull()
}
}
......@@ -18,6 +18,7 @@ data class NodeConfig(
val enableAudioControl: Boolean = true,
val enableCameraControl: Boolean = true,
val enableBrowserControl: Boolean = false,
val enableNotificationAccess: Boolean = false,
val insecureTls: Boolean = true,
val reconnectIntervalSeconds: Long = 5L,
val heartbeatIntervalSeconds: Long = 30L,
......
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid
import android.content.Context
import org.json.JSONObject
object VideoCaptureSessionStore {
private const val PREFS = "hermes_node_android_video_capture"
private const val KEY_PENDING = "pending_request"
private const val KEY_LAST_RESULT = "last_result"
fun savePending(context: Context, payload: JSONObject) {
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit()
.putString(KEY_PENDING, payload.toString())
.apply()
}
fun loadPending(context: Context): JSONObject? {
val raw = context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).getString(KEY_PENDING, null) ?: return null
return runCatching { JSONObject(raw) }.getOrNull()
}
fun clearPending(context: Context) {
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit().remove(KEY_PENDING).apply()
}
fun saveResult(context: Context, payload: JSONObject) {
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit()
.putString(KEY_LAST_RESULT, payload.toString())
.apply()
}
fun loadResult(context: Context): JSONObject? {
val raw = context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).getString(KEY_LAST_RESULT, null) ?: return null
return runCatching { JSONObject(raw) }.getOrNull()
}
fun consumeResult(context: Context, requestId: String): JSONObject? {
val prefs = context.getSharedPreferences(PREFS, Context.MODE_PRIVATE)
val raw = prefs.getString(KEY_LAST_RESULT, null) ?: return null
val obj = runCatching { JSONObject(raw) }.getOrNull() ?: return null
if (obj.optString("request_id") != requestId) return null
prefs.edit().remove(KEY_LAST_RESULT).apply()
return obj
}
fun clearResult(context: Context) {
context.getSharedPreferences(PREFS, Context.MODE_PRIVATE).edit().remove(KEY_LAST_RESULT).apply()
}
}
......@@ -12,6 +12,7 @@ import android.media.AudioDeviceInfo
import android.media.AudioFormat
import android.media.AudioManager
import android.media.AudioRecord
import android.media.MediaPlayer
import android.media.MediaRecorder
import android.os.Build
import androidx.core.content.ContextCompat
......@@ -19,6 +20,8 @@ import net.nexlab.hermesnodeandroid.CapabilityResult
import org.json.JSONArray
import org.json.JSONObject
import java.io.ByteArrayOutputStream
import java.io.File
import java.io.FileOutputStream
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.util.Base64
......@@ -39,7 +42,7 @@ class AndroidAudioControl(private val context: Context) {
fun handle(action: String?, params: JSONObject): CapabilityResult = when (action) {
"list_audio_devices" -> listAudioDevices()
"get_audio_status" -> getAudioStatus()
"play_audio" -> CapabilityResult.unsupported("play_audio needs download/cache + MediaPlayer/ExoPlayer backend")
"play_audio" -> playAudio(params)
"capture_input" -> captureInput(params)
"capture_output" -> CapabilityResult.unsupported("capture_output is not generally available to normal Android apps")
else -> CapabilityResult.unsupported("Unknown audio_control action: $action")
......@@ -119,6 +122,56 @@ class AndroidAudioControl(private val context: Context) {
}
}
private fun playAudio(params: JSONObject): CapabilityResult {
val base64Data = params.optString("data")
val path = params.optString("path")
var tempFile: File? = null
val mediaPlayer = MediaPlayer()
return try {
val file = when {
base64Data.isNotBlank() -> writeAudioTempFile(base64Data, params.optString("format", "mp3")).also { tempFile = it }
path.isNotBlank() -> File(path)
else -> return CapabilityResult.unsupported("play_audio requires params.data (base64 audio) or params.path")
}
if (!file.exists()) return CapabilityResult.unsupported("play_audio source file does not exist: ${file.absolutePath}")
mediaPlayer.setDataSource(file.absolutePath)
mediaPlayer.setAudioStreamType(AudioManager.STREAM_MUSIC)
mediaPlayer.prepare()
val durationMs = mediaPlayer.duration
mediaPlayer.start()
CapabilityResult.ok(
"started" to true,
"path" to file.absolutePath,
"duration_ms" to durationMs,
)
} catch (e: Exception) {
CapabilityResult.unsupported("play_audio failed: ${e.message}")
} finally {
mediaPlayer.setOnCompletionListener {
runCatching { it.release() }
tempFile?.let { file -> runCatching { file.delete() } }
}
mediaPlayer.setOnErrorListener { mp, _, _ ->
runCatching { mp.release() }
tempFile?.let { file -> runCatching { file.delete() } }
true
}
}
}
private fun writeAudioTempFile(base64Data: String, format: String): File {
val ext = when (format.lowercase()) {
"wav" -> ".wav"
"ogg" -> ".ogg"
"m4a" -> ".m4a"
else -> ".mp3"
}
val file = File.createTempFile("hermes-audio-", ext, context.cacheDir)
val bytes = Base64.getDecoder().decode(base64Data)
FileOutputStream(file).use { it.write(bytes) }
return file
}
private fun hasRecordAudioPermission(): Boolean =
ContextCompat.checkSelfPermission(context, Manifest.permission.RECORD_AUDIO) == PackageManager.PERMISSION_GRANTED
......
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid.capabilities
import android.content.Context
import net.nexlab.hermesnodeandroid.CapabilityResult
import org.json.JSONObject
class AndroidBrowserControl(private val context: Context) {
private val computer = AndroidComputerControl(context)
private val observe = AndroidDesktopObserve(context)
fun capabilityInfo(): JSONObject = JSONObject()
.put("available", true)
.put("backend", "android-intent-plus-compat")
.put("supports_launch", true)
.put("supports_navigation", true)
.put("supports_visual_capture", true)
.put("notes", "Android browser_control maps stable subsets to system browser intents plus desktop-observe screenshot fallback")
fun handle(action: String?, params: JSONObject): CapabilityResult = when (action) {
"launch" -> computer.handle("open_url", JSONObject().put("url", params.optString("url", defaultUrl(params))))
"navigate" -> computer.handle("open_url", JSONObject().put("url", params.optString("url", defaultUrl(params))))
"open_url" -> computer.handle("open_url", JSONObject().put("url", params.optString("url", defaultUrl(params))))
"screenshot" -> observe.handle("screenshot", params)
"evaluate", "click", "fill", "type", "press", "snapshot", "cdp", "playwright" ->
CapabilityResult.unsupported("Android browser_control does not provide DOM/CDP automation; supported actions are launch, navigate/open_url, and screenshot")
else -> CapabilityResult.unsupported("Unknown browser_control action: $action")
}
private fun defaultUrl(params: JSONObject): String = params.optString("target", params.optString("url", "https://www.google.com"))
}
......@@ -7,6 +7,7 @@ package net.nexlab.hermesnodeandroid.capabilities
import android.Manifest
import android.content.Context
import android.content.Intent
import android.content.pm.PackageManager
import android.graphics.ImageFormat
import android.hardware.camera2.CameraAccessException
......@@ -19,6 +20,9 @@ import android.os.Handler
import android.os.HandlerThread
import androidx.core.content.ContextCompat
import net.nexlab.hermesnodeandroid.CapabilityResult
import net.nexlab.hermesnodeandroid.ConfigStore
import net.nexlab.hermesnodeandroid.VideoCaptureActivity
import net.nexlab.hermesnodeandroid.VideoCaptureSessionStore
import org.json.JSONArray
import org.json.JSONObject
import java.util.Base64
......@@ -32,13 +36,14 @@ class AndroidCameraControl(private val context: Context) {
.put("can_capture", hasCameraPermission())
.put("can_inject", false)
.put("capture_requires_camera_permission", !hasCameraPermission())
.put("capture_video_requires_visible_activity", true)
.put("backend", "Camera2")
fun handle(action: String?, params: JSONObject): CapabilityResult = when (action) {
"list_cameras" -> listCameras()
"get_camera_status" -> CapabilityResult.ok("available" to true, "backend" to "Camera2", "can_capture" to hasCameraPermission())
"capture_frame" -> captureFrame(params)
"capture_video" -> CapabilityResult.unsupported("capture_video requires a longer visible foreground UX; frame capture is implemented first")
"capture_video" -> captureVideo(params)
"inject_video" -> CapabilityResult.unsupported("inject_video is not a normal Android capability without rooted/vendor-specific backend")
else -> CapabilityResult.unsupported("Unknown camera_control action: $action")
}
......@@ -130,6 +135,48 @@ class AndroidCameraControl(private val context: Context) {
}
}
private fun captureVideo(params: JSONObject): CapabilityResult {
if (!hasCameraPermission()) {
return CapabilityResult.unsupported("capture_video requires CAMERA permission; grant Camera permission in Android settings/app prompt")
}
val requestId = params.optString("request_id").ifBlank { "capture-video-${System.currentTimeMillis()}" }
val existing = VideoCaptureSessionStore.consumeResult(context, requestId)
if (existing != null) {
return if (existing.optBoolean("success", false)) {
CapabilityResult.ok(
"request_id" to requestId,
"completed" to true,
"path" to existing.optString("path"),
"bytes" to existing.optLong("bytes"),
"format" to existing.optString("format", "mp4"),
"duration_ms" to existing.optLong("duration_ms"),
"width" to existing.optInt("width"),
"height" to existing.optInt("height"),
"camera_id" to existing.optString("camera_id"),
"message" to existing.optString("message", "Video capture complete"),
)
} else {
CapabilityResult.unsupported(existing.optString("error", "capture_video failed"))
}
}
val payload = JSONObject()
.put("request_id", requestId)
.put("node_name", ConfigStore.load(context).nodeName)
.put("duration_ms", params.optLong("duration_ms", 3000L).coerceIn(1000L, 15000L))
.put("width", params.optInt("width", 1280))
.put("height", params.optInt("height", 720))
.put("camera_id", params.optString("camera_id", params.optString("id", "")))
VideoCaptureSessionStore.savePending(context, payload)
val intent = Intent(context, VideoCaptureActivity::class.java).addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
context.startActivity(intent)
return CapabilityResult.ok(
"request_id" to requestId,
"pending_user_visible_capture" to true,
"message" to "Visible recording activity launched; call capture_video again with the same request_id to fetch the result after completion",
)
}
private fun hasCameraPermission(): Boolean =
ContextCompat.checkSelfPermission(context, Manifest.permission.CAMERA) == PackageManager.PERMISSION_GRANTED
}
......@@ -17,6 +17,8 @@ import net.nexlab.hermesnodeandroid.HermesAccessibilityService
import org.json.JSONObject
class AndroidDesktopObserve(private val context: Context) {
private val capture = MediaProjectionCapture(context)
fun capabilityInfo(): JSONObject = JSONObject()
.put("available", true)
.put("screen_info", true)
......@@ -29,7 +31,7 @@ class AndroidDesktopObserve(private val context: Context) {
fun handle(action: String?, params: JSONObject): CapabilityResult = when (action) {
"screen_info" -> screenInfo()
"active_window", "get_active_window" -> activeWindow()
"screenshot", "region_screenshot" -> screenshotStatus()
"screenshot", "region_screenshot" -> screenshot(params)
"clipboard_get" -> clipboardGet()
"cursor_position", "mouse_position" -> CapabilityResult.ok("x" to 0, "y" to 0, "note" to "Android has touch focus, not a persistent mouse cursor")
"list_windows" -> activeWindow().let { res ->
......@@ -63,12 +65,7 @@ class AndroidDesktopObserve(private val context: Context) {
return CapabilityResult.ok(*service.activeWindowInfo().map { it.key to it.value }.toTypedArray())
}
private fun screenshotStatus(): CapabilityResult {
if (!AndroidRuntimeGrants.hasMediaProjection()) {
return CapabilityResult.unsupported("screenshot requires MediaProjection consent; tap 'Grant screenshot consent' in the Android app")
}
return CapabilityResult.unsupported("MediaProjection grant present, but bitmap capture pipeline is not exposed over the gateway yet")
}
private fun screenshot(params: JSONObject): CapabilityResult = capture.capture(params)
private fun clipboardGet(): CapabilityResult {
val clipboard = context.getSystemService(Context.CLIPBOARD_SERVICE) as ClipboardManager
......
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid.capabilities
import android.app.NotificationManager
import android.content.ComponentName
import android.content.Context
import android.content.Intent
import android.provider.Settings
import net.nexlab.hermesnodeandroid.CapabilityResult
import net.nexlab.hermesnodeandroid.HermesNotificationListenerService
import net.nexlab.hermesnodeandroid.NotificationSessionStore
import org.json.JSONArray
import org.json.JSONObject
class AndroidNotificationAccess(private val context: Context) {
fun capabilityInfo(): JSONObject = JSONObject()
.put("available", true)
.put("listener_enabled", isListenerEnabled())
.put("actions", JSONArray(listOf(
"get_notification_status",
"list_notifications",
"open_notification",
"dismiss_notification",
)))
fun handle(action: String?, params: JSONObject): CapabilityResult = when (action) {
"get_notification_status" -> getStatus()
"list_notifications" -> listNotifications(params)
"open_notification" -> openNotification(params)
"dismiss_notification" -> dismissNotification(params)
else -> CapabilityResult.unsupported("Unknown notification_access action: $action")
}
private fun getStatus(): CapabilityResult = CapabilityResult.ok(
"available" to true,
"listener_enabled" to isListenerEnabled(),
"settings_intent" to Settings.ACTION_NOTIFICATION_LISTENER_SETTINGS,
)
private fun listNotifications(params: JSONObject): CapabilityResult {
if (!isListenerEnabled()) {
return CapabilityResult.unsupported("Notification listener access is not enabled; open Android notification access settings and enable Hermes Node")
}
val limit = params.optInt("limit", 50).coerceIn(1, 200)
val snapshot = NotificationSessionStore.loadSnapshot(context) ?: JSONArray()
val trimmed = JSONArray()
for (i in 0 until minOf(limit, snapshot.length())) trimmed.put(snapshot.getJSONObject(i))
return CapabilityResult.ok(
"count" to trimmed.length(),
"notifications" to trimmed,
)
}
private fun openNotification(params: JSONObject): CapabilityResult {
val key = params.optString("key")
if (key.isBlank()) return CapabilityResult.unsupported("open_notification requires notification key")
val snapshot = NotificationSessionStore.loadSnapshot(context) ?: JSONArray()
val match = findByKey(snapshot, key) ?: return CapabilityResult.unsupported("Notification not found: $key")
val packageName = match.optString("package_name")
val launch = context.packageManager.getLaunchIntentForPackage(packageName)
?: return CapabilityResult.unsupported("No launch intent for notification package: $packageName")
launch.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
context.startActivity(launch)
return CapabilityResult.ok("opened_package" to packageName, "key" to key)
}
private fun dismissNotification(params: JSONObject): CapabilityResult {
if (!isListenerEnabled()) {
return CapabilityResult.unsupported("Notification listener access is not enabled")
}
val key = params.optString("key")
if (key.isBlank()) return CapabilityResult.unsupported("dismiss_notification requires notification key")
return try {
val notificationManager = context.getSystemService(Context.NOTIFICATION_SERVICE) as NotificationManager
notificationManager.cancel(key.hashCode())
CapabilityResult.ok("dismiss_requested" to true, "key" to key)
} catch (e: Exception) {
CapabilityResult.unsupported("dismiss_notification failed: ${e.message}")
}
}
private fun isListenerEnabled(): Boolean {
val enabled = Settings.Secure.getString(context.contentResolver, "enabled_notification_listeners").orEmpty()
val me = ComponentName(context, HermesNotificationListenerService::class.java).flattenToString()
return enabled.split(':').any { it == me }
}
private fun findByKey(snapshot: JSONArray, key: String): JSONObject? {
for (i in 0 until snapshot.length()) {
val obj = snapshot.optJSONObject(i) ?: continue
if (obj.optString("key") == key) return obj
}
return null
}
}
/*
* Hermes Node Android
* Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package net.nexlab.hermesnodeandroid.capabilities
import android.content.Context
import android.graphics.Bitmap
import android.graphics.PixelFormat
import android.hardware.display.DisplayManager
import android.hardware.display.VirtualDisplay
import android.media.Image
import android.media.ImageReader
import android.media.projection.MediaProjectionManager
import android.os.Build
import android.os.Handler
import android.os.HandlerThread
import android.util.Base64
import android.util.DisplayMetrics
import android.view.WindowManager
import net.nexlab.hermesnodeandroid.AndroidRuntimeGrants
import net.nexlab.hermesnodeandroid.CapabilityResult
import org.json.JSONObject
import java.io.ByteArrayOutputStream
import java.nio.ByteBuffer
import java.util.concurrent.ArrayBlockingQueue
import java.util.concurrent.TimeUnit
import kotlin.math.max
import kotlin.math.min
class MediaProjectionCapture(private val context: Context) {
fun capture(params: JSONObject): CapabilityResult {
val resultCode = AndroidRuntimeGrants.mediaProjectionResultCode
?: return CapabilityResult.unsupported("screenshot requires MediaProjection consent; tap 'Grant screenshot consent' in the Android app")
val data = AndroidRuntimeGrants.mediaProjectionData
?: return CapabilityResult.unsupported("screenshot requires MediaProjection consent; tap 'Grant screenshot consent' in the Android app")
val mgr = context.getSystemService(Context.MEDIA_PROJECTION_SERVICE) as MediaProjectionManager
val projection = mgr.getMediaProjection(resultCode, data)
?: return CapabilityResult.unsupported("MediaProjection grant is present but could not be activated; re-grant screenshot consent")
val thread = HandlerThread("HermesScreenCapture").apply { start() }
val handler = Handler(thread.looper)
var virtualDisplay: VirtualDisplay? = null
var imageReader: ImageReader? = null
return try {
val metrics = displayMetrics()
val fullWidth = metrics.widthPixels
val fullHeight = metrics.heightPixels
val density = metrics.densityDpi
imageReader = ImageReader.newInstance(fullWidth, fullHeight, PixelFormat.RGBA_8888, 2)
val queue = ArrayBlockingQueue<Image>(1)
imageReader.setOnImageAvailableListener({ reader ->
reader.acquireLatestImage()?.let { queue.offer(it) }
}, handler)
virtualDisplay = projection.createVirtualDisplay(
"HermesNodeCapture",
fullWidth,
fullHeight,
density,
DisplayManager.VIRTUAL_DISPLAY_FLAG_AUTO_MIRROR,
imageReader.surface,
null,
handler,
)
val image = queue.poll(3, TimeUnit.SECONDS)
?: return CapabilityResult.unsupported("screenshot timed out waiting for a frame from MediaProjection")
image.use {
val cropped = cropBitmapFromImage(it, fullWidth, fullHeight, params)
val out = ByteArrayOutputStream()
val requestedFormat = params.optString("format", "png")
val format = if (requestedFormat.equals("jpg", true) || requestedFormat.equals("jpeg", true)) {
Bitmap.CompressFormat.JPEG
} else {
Bitmap.CompressFormat.PNG
}
val quality = min(100, max(1, params.optInt("quality", 90)))
cropped.compress(format, quality, out)
val bytes = out.toByteArray()
return CapabilityResult.ok(
"format" to if (format == Bitmap.CompressFormat.JPEG) "jpg" else "png",
"encoding" to "base64",
"data" to Base64.encodeToString(bytes, Base64.NO_WRAP),
"bytes" to bytes.size,
"width" to cropped.width,
"height" to cropped.height,
"full_width" to fullWidth,
"full_height" to fullHeight,
)
}
} catch (e: SecurityException) {
CapabilityResult.unsupported("screenshot permission denied: ${e.message}")
} catch (e: Exception) {
CapabilityResult.unsupported("screenshot failed: ${e.message}")
} finally {
runCatching { imageReader?.setOnImageAvailableListener(null, null) }
runCatching { virtualDisplay?.release() }
runCatching { projection.stop() }
thread.quitSafely()
}
}
private fun cropBitmapFromImage(image: Image, fullWidth: Int, fullHeight: Int, params: JSONObject): Bitmap {
val plane = image.planes[0]
val buffer: ByteBuffer = plane.buffer
val pixelStride = plane.pixelStride
val rowStride = plane.rowStride
val rowPadding = rowStride - pixelStride * fullWidth
val bitmap = Bitmap.createBitmap(fullWidth + rowPadding / pixelStride, fullHeight, Bitmap.Config.ARGB_8888)
bitmap.copyPixelsFromBuffer(buffer)
val normalized = Bitmap.createBitmap(bitmap, 0, 0, fullWidth, fullHeight)
if (bitmap !== normalized) bitmap.recycle()
val requestedX = params.optInt("x", 0)
val requestedY = params.optInt("y", 0)
val requestedWidth = params.optInt("width", fullWidth)
val requestedHeight = params.optInt("height", fullHeight)
val x = requestedX.coerceIn(0, max(0, fullWidth - 1))
val y = requestedY.coerceIn(0, max(0, fullHeight - 1))
val width = requestedWidth.coerceIn(1, fullWidth - x)
val height = requestedHeight.coerceIn(1, fullHeight - y)
if (x == 0 && y == 0 && width == fullWidth && height == fullHeight) return normalized
val cropped = Bitmap.createBitmap(normalized, x, y, width, height)
normalized.recycle()
return cropped
}
private fun displayMetrics(): DisplayMetrics {
val metrics = DisplayMetrics()
@Suppress("DEPRECATION")
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.R) {
val display = context.display ?: context.getSystemService(WindowManager::class.java)?.defaultDisplay
display?.getRealMetrics(metrics)
} else {
val wm = context.getSystemService(Context.WINDOW_SERVICE) as WindowManager
wm.defaultDisplay.getRealMetrics(metrics)
}
if (metrics.widthPixels <= 0 || metrics.heightPixels <= 0) {
metrics.setTo(context.resources.displayMetrics)
}
return metrics
}
}
<?xml version="1.0" encoding="utf-8"?>
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
<background android:drawable="@drawable/ic_launcher_background" />
<foreground android:drawable="@drawable/ic_launcher_foreground" />
</adaptive-icon>
<?xml version="1.0" encoding="utf-8"?>
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
<background android:drawable="@drawable/ic_launcher_background" />
<foreground android:drawable="@drawable/ic_launcher_foreground" />
</adaptive-icon>
......@@ -57,13 +57,49 @@ Current implemented actions:
- `list_cameras`: implemented with Camera2 camera inventory and JPEG size hints.
- `get_camera_status`: implemented with backend/status metadata.
- `capture_frame`: implemented with Camera2 + `ImageReader`, returning base64 JPEG when camera permission is granted.
- `capture_video`: still incomplete; this needs a longer-lived visible recording pipeline and storage/streaming policy.
- `capture_video`: now has a defined honest stock-Android architecture: the gateway command must hand off to a visible foreground `VideoCaptureActivity` that shows preview/recording UI, records for a bounded duration, and hands the resulting file back to the foreground service for upload/response. This is intentionally *not* a hidden background recorder.
- `inject_video`: explicitly unsupported on normal Android; no fake parity.
### `browser_control`
Not first milestone. Basic browser launching/open URL belongs in `computer_control.open_url`. Full browser automation can be added later if there is a clean Android backend.
### `notification_access`
Android does not map notification reading cleanly onto an existing Linux desktop capability, so this should be treated as an Android-specific extension surfaced through capability metadata and/or a dedicated gateway action family if the gateway grows one.
Planned honest actions:
- `get_notification_status`: report whether `NotificationListenerService` is enabled.
- `list_notifications`: return active notifications with package name, app label, post time, title, text preview, conversation/channel hints, and whether a `PendingIntent` launch target exists.
- `open_notification`: where possible, trigger the associated `PendingIntent` or launch the source app/package.
- `dismiss_notification`: only where Android permits cancellation by a listener service.
Boundaries:
- Notification *contents* are permission-gated and user-visible in Android settings.
- Historic notifications are not guaranteed unless the app stores its own recent cache.
- Replying inline should be considered a later feature, because `RemoteInput` handling is app-specific and easy to fake badly.
### `file_transfer`
Android filesystem access also deserves explicit treatment instead of pretending parity with Linux `exec` or desktop shell semantics.
Planned honest actions:
- `list_roots`: describe app-private storage, cache, and any user-granted SAF tree roots.
- `list_directory`: enumerate files/directories within app-private paths or Storage Access Framework roots the user has explicitly granted.
- `stat_path`: metadata for a visible path handle.
- `download_file`: return file bytes/base64 for files inside allowed roots and within size limits.
- `upload_file`: write a provided base64 payload into an app-private directory or a previously user-authorized SAF tree.
- `delete_path`: optional, but only inside allowed roots and with extremely conservative semantics.
Boundaries:
- No pretending the app can browse arbitrary `/sdcard` or other packages’ sandboxes without SAF/user grant.
- Paths should likely be abstract handles (`app://cache/...`, `saf://tree/...`) rather than raw Unix paths, to keep protocol semantics honest.
- Large file transfer should support chunking or temporary-file staging instead of trying to inline everything forever.
### `exec`
Disabled by default. Android shell execution is SELinux/app-sandbox constrained and not Linux-equivalent. If enabled later, it must implement the same allow/deny/ask model as Linux and clearly report limitations.
......@@ -77,6 +113,7 @@ Android requires explicit grants for the interesting bits:
- MediaProjection for screenshots/screen recording.
- Camera permission for capture.
- Microphone permission for input capture.
- Notification listener permission for notification bridge if added.
- Notification listener permission for notification bridge if enabled.
- Storage Access Framework tree permission for directory traversal outside app-private storage.
The app must surface these as setup/status checks, not pretend they are available.
# Hermes Node Android logo spec
Source logo: `/home/lisa/.hermes/image_cache/img_776dcf5c9977.jpg`
## Visual identity
- Dark rounded-square badge / app-tile container
- Central Hermes/caduceus-inspired mark
- Cyan/blue vertical staff
- Orange-gold serpent wrapping the center
- Symmetrical wings
- Segmented orbital arcs suggesting telemetry/network flow
- Three lower connected nodes reinforcing the "node agent" concept
## README usage
Use the full lockup/logo image in the top section of `README.md`.
## Android icon usage
Use a simplified symbol-only adaptive icon:
- no text
- keep serpent, staff, wings, and lower nodes
- reduce circular arc detail
- preserve dark navy background
- preserve blue/cyan + orange/gold palette
- leave comfortable padding for adaptive icon masks
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment