[CS.AI] OSGuard: A Revolutionary Benchmark for Safety in ...

OSGuard is a dual-granularity benchmark suite aimed at evaluating the safety of computer-use agents while keeping user instructions unchanged. Traditionally, the success of agents in completing tasks does not fully reflect their safety, as they may achieve goals through unsafe shortcuts. OSGuard offers two main components:

Action-level Benchmark: Assesses proposed actions by agents in specific contexts, labeled as allowed, unrelated, or unsafe, judged relative to the original instruction and current interface state.
Risk-augmented Execution Suite: Contains manually constructed task variants where the original task remains achievable, but the environment is modified to introduce latent hazards such as destructive overwrites. Each variant is paired with augmented evaluators that maintain the original task-success criteria while adding explicit state-based safety invariants, allowing us to distinguish between safe and unsafe completions.

Experimental results on OSGuard show that current multimodal guardrails perform well on isolated action judgments, while risk-augmented execution exposes gaps between local oversight and reliable end-to-end safety. This dual-granularity design enables more precise diagnosis of whether models can recognize unsafe proposed actions and improve full-task safety when deployed as guardrails.

Blogger's Review: The introduction of OSGuard provides a fresh perspective on the evaluation of safety in computer-use agents, especially in recognizing potential risks and unsafe actions. By integrating action-level and execution-level assessments, OSGuard effectively addresses the shortcomings of traditional methods, advancing the field of safety research.

[CS.AI] OSGuard: A Revolutionary Benchmark for Safety in Computer-Use Agents