Xiangru (Edward) Jian's Avatar

Xiangru (Edward) Jian

@edwardjian.bsky.social

CS PhD student at University of Waterloo. Visiting Researcher at ServiceNow Research. Working on Al and DB.

7 Followers  |  8 Following  |  10 Posts  |  Joined: 17.11.2024  |  1.8311

Latest posts by edwardjian.bsky.social on Bluesky

Post image Post image Post image Post image

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, @edwardjian.bsky.social, Xi He, @philiptorr.bsky.social

tl;dr: great stuff!

arxiv.org/abs/2505.21497

28.05.2025 14:43 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

πŸš€ Excited to share that UI-Vision has been accepted at ICML 2025! πŸŽ‰

We have also released the UI-Vision grounding datasets. Test your agents on it now! πŸš€

πŸ€— Dataset: huggingface.co/datasets/Ser...

#ICML2025 #AI #DatasetRelease #Agents

15.05.2025 14:14 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Huge thanks to my wonderful coauthor
@shravannayak.bsky.social and the amazing team at @servicenowresearch.bsky.social and @mila-quebec.bsky.social.

24.03.2025 17:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We want UI-Vision to be the go-to benchmark for desktop GUI agents.
πŸ“’ Data, benchmarks & code coming soon!
πŸ’‘Next: scaling training data & models for long-horizon tasks.
Let’s build, benchmark & push GUI agents forward πŸš€

24.03.2025 17:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Interacting with desktop GUIs remains a challenge.
πŸ–±οΈ Models struggle with click & drag actions due to poor grounding and limited motion understanding
πŸ† UI-TARS leads across models!
🧠 Closed models (GPT-4o, Claude, Gemini) excel at planning but fail to localize.

24.03.2025 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Detecting functional UI regions is tough!
πŸ€– Even top GUI agents miss functional regions.
πŸ† Closed-source VLMs shine with stronger visual understanding.
πŸ“‰ Cluttered UIs bring down IoU.
πŸš€ We’re the first to propose this task.

24.03.2025 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Grounding UI elements is challenging!
πŸ€– Even top VLMs struggle with fine-grained GUI grounding.
πŸ“Š GUI agents like UI-TARS (25.5%) & UGround (23.2%) do better but still fall short.
⚠️ Small elements, dense UIs, and limited domain/spatial understanding are major hurdles.

24.03.2025 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We propose three key benchmark tasks to evaluate GUI Agents
πŸ”Ή Element Grounding – Identify a UI element from the text
πŸ”Ή Layout Grounding – Understand UI layout structure & group elements
πŸ”Ή Action Prediction – Predict the next action given a goal, past actions & screen state

24.03.2025 17:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

UI-Vision consists of
βœ… 83 open-source desktop apps across 6 domains
βœ… 450 human demonstrations of computer-use workflows
βœ… Human annotated dense bounding box annotations for UI elements and rich action trajectories

24.03.2025 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Most GUI benchmarks focus on web or mobile.
πŸ–₯️ But what about desktop software, where most real work happens?
UI-Vision fills this gap by providing a large-scale benchmark with diverse and dense annotations to systematically evaluate GUI agents.

24.03.2025 17:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

πŸš€ Super excited to announce UI-Vision: the largest and most diverse desktop GUI benchmark for evaluating agents in real-world desktop GUIs in offline settings.

πŸ“„ Paper: arxiv.org/abs/2503.15661
🌐 Website: uivision.github.io

🧡 Key takeaways πŸ‘‡

24.03.2025 17:08 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2

@edwardjian is following 8 prominent accounts