junkuan

|

Profile
Blog
Archive
Search
Tags
FAQ

vision-language models

Notes on multimodal models that perceive and act.

VLM + RL + Data (Environment) = GUI Agent

Why reinforcement learning, tailored environments, and data pipelines are converging to make reliable GUI agents possible.

September 18, 2025 · 8 min read