 
                        
                StarFlow: Generating Structured Workflow Outputs From Sketch Images
                Workflows are a fundamental component of automation in enterprise platforms, enabling the orchestration of tasks, data processing, and system integrations. Despite being widely used, building workflow...
            
        
    
    
            From notebook to workflowโjust by sketching.
Thatโs the vision.
๐ arxiv.org/abs/2503.21889
๐ tinyurl.com/3utdbn97
Thanks to @joanrod.bsky.social, @perouz.bsky.social, @spandanagella.bsky.social and all co-authors!
#AI #VLM #WorkflowAutomation #Sketch2Flow #arXiv
               
            
            
                29.05.2025 03:34 โ ๐ 0    ๐ 0    ๐ฌ 0    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            ๐ Extra findings:
โข Models struggle most with handwritten & whiteboard sketches 
โข UI screenshots are easiest 
โข End-to-end generation beats decomposed pipelines 
โข Finetuning on diverse sketch data is key to generalization
               
            
            
                29.05.2025 03:34 โ ๐ 1    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐ We benchmarked top VLMs (GPT-4o, Claude, Gemini) vs. open-weight models (Qwen, LLaMA, Pixtral).
๐ Finetuned open models outperform proprietary ones:
Qwen2.5-VL-7B โ FlowSim: 0.614
GPT-4o โ FlowSim: 0.786
๐๐ฐ๐๐ง๐.๐-๐๐-๐๐ (๐๐ข๐ง๐๐ญ๐ฎ๐ง๐๐) โ ๐
๐ฅ๐จ๐ฐ๐๐ข๐ฆ: ๐.๐๐๐
               
            
            
                29.05.2025 03:34 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            ๐ง  We built a large dataset (22K+ samples) of workflow diagrams:
โข Synthetic (Graphviz)
โข Manual (hand-drawn)
โข Whiteboard
โข Digital
โข UI screenshots
These were paired with structured JSON workflow outputs for training and evaluation.
               
            
            
                29.05.2025 03:34 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐๐ก๐ฒ?
Workflow automation is powerfulโbut authoring flows is still complex, even with low-code tools.
๐ซ๐๐ญ๐๐ซ๐
๐ฅ๐จ๐ฐ explores a simpler interface: ๐ฃ๐ฎ๐ฌ๐ญ ๐๐ซ๐๐ฐ ๐ข๐ญ.
Imagine sketching a workflow on a whiteboard and getting a runnable flow in return.
               
            
            
                29.05.2025 03:34 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
                            
            
            
            
    
    
    
    
            ๐ New paper from our team at @servicenowresearch.bsky.social!โฃ
โฃ
๐ซ๐๐ญ๐๐ซ๐
๐ฅ๐จ๐ฐ: ๐๐๐ง๐๐ซ๐๐ญ๐ข๐ง๐  ๐๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐๐ ๐๐จ๐ซ๐ค๐๐ฅ๐จ๐ฐ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ๐ฌ ๐
๐ซ๐จ๐ฆ ๐๐ค๐๐ญ๐๐ก ๐๐ฆ๐๐ ๐๐ฌโฃ
We use VLMs to turn ๐ฉ๐ข๐ฏ๐ฅ-๐ฅ๐ณ๐ข๐ธ๐ฏ ๐ด๐ฌ๐ฆ๐ต๐ค๐ฉ๐ฆ๐ด and diagrams into executable workflows ๐๏ธโโ๏ธโฃ
โฃ
๐ arxiv.org/abs/2503.218...
๐ tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM
               
            
            
                29.05.2025 03:34 โ ๐ 0    ๐ 1    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            ๐ Key Features:
* One retriever for many use cases
* Works across languages! ๐
* Handles structured data like workflows
* Lightweight & fast for production
* Generalizes to new domains & tasks
               
            
            
                09.01.2025 15:46 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐ Our Results: 
Multi-task instruction fine-tuning FTW! Our approach beats both BM25 and strong off-the-shelf encoder models across all retrieval tasks (in-distribution and out-of-distribution).
               
            
            
                09.01.2025 15:46 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐ก The Challenge:
* RAG needs domain-specific knowledge
* Multiple apps = multiple retrievers = ๐ฐ
* Different types of data (steps, tables, fields, ...)
               
            
            
                09.01.2025 15:46 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐ Excited to share our new work on making RAG actually work for enterprise applications!
We present a recipe to build a custom retriever that handles multiple retrieval tasks simultaneously for domain-specific RAG applications ๐งต
               
            
            
                09.01.2025 15:46 โ ๐ 1    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Weโre really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.
In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet
               
            
            
                12.12.2024 17:55 โ ๐ 20    ๐ 11    ๐ฌ 1    ๐ 2                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            ๐ Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
๐ Documents
๐ Web content
๐ฅ๏ธ GUI understanding
๐จโ๐ป Code generation from images
Weโre also launching BigDocs-Bench:
โก๏ธ Document, Web, GUI Visual reasoning
โก๏ธ Converting images into JSON, Markdown, LaTeX, SVG, and more!
               
            
            
                10.12.2024 18:34 โ ๐ 16    ๐ 8    ๐ฌ 1    ๐ 2                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            Finally, we outline trade-offs and practical considerations, from latency improvements to deployment strategies. If youโre designing GenAI systems, this is a goldmine of insights!
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Evaluation was key: we developed a novel tree-based metric, Flow Similarity, to assess workflow correctness. Plus, we measured each sub-task and RAG component separately for fine-grained insights.
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            We dive deep into dataset creation, discussing how Task Decomposition guided our labeling efforts. By focusing on smaller tasks, we sped up labeling, reduced costs, and iteratively improved our system.
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            RAG enhances the system by grounding the generation process in real-time data from the environment. This reduces hallucinations and ensures that the generated workflows are accurate and context-aware.
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Task Decomposition allows us to split the workflow generation into two sub-tasks:
1. Outlining the workflow structure
2. Populating inputs for each step
Each sub-task is easier to solve and test, boosting the systemโs modularity and maintainability.
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            We tackle a real-world use case: Workflow Generation. Given a user requirement in natural language, our system generates complex workflows step by step. This involves breaking the problem into smaller, manageable tasks.
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Looking to build an LLM-powered app but finding it hard to make it robust? Weโve got you covered! Our new paper explores how Task Decomposition and Retrieval-Augmented Generation (RAG) can help you create reliable systems. ๐งต๐
               
            
            
                03.12.2024 15:15 โ ๐ 0    ๐ 0    ๐ฌ 1    ๐ 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                    
                            
                    
                    
                                            The 2025 Conference on Language Modeling will take place at the Palais des Congrรจs in Montreal, Canada from October 7-10, 2025
                                     
                            
                    
                    
                                            International Conference on Learning Representations  https://iclr.cc/
                                     
                            
                    
                    
                                            EMNLP 2025 - The annual Conference on Empirical Methods in Natural Language Processing
Dates: November 5-9, 2025 in Suzhou, China
Hashtags: #EMNLP2025 #NLP
Submission Deadline: May 19th, 2025
                                     
                            
                    
                    
                                            Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
                                     
                            
                    
                    
                                            Le plus grand centre de recherche universitaire en apprentissage profond โ The world's largest academic research center in deep learning.
                                     
                            
                    
                    
                                            Writer http://jalammar.github.io. O'Reilly Author http://LLM-book.com. LLM Builder Cohere.com.
                                     
                            
                    
                    
                                            AI Researcher. Working on Multimodal AI at ServiceNow, Mila
joanrod.github.io
                                     
                            
                    
                    
                                            Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google.  Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
                                     
                            
                    
                    
                                            Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
                                     
                            
                    
                    
                                            Professor a NYU; Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate. 
http://yann.lecun.com
                                     
                            
                    
                    
                                    
                            
                    
                    
                                            Lead Research Scientist @servicenowresearch.bsky.social. All opinions my own.
                                     
                            
                    
                    
                                            Parker Distinguished Professor, @UNC. Program Chair #EMNLP2024. Director http://MURGeLab.cs.unc.edu (@uncnlp). @Berkeley_AI @TTIC_Connect @IITKanpur
 #NLP #CV #AI #ML
https://www.cs.unc.edu/~mbansal/
                                     
                            
                    
                    
                                            Associate professor at CMU, studying natural language processing and machine learning. Co-founder All Hands AI
                                     
                            
                    
                    
                                            Researcher in NLP, ML, computer music. Prof @uwcse @uwnlp & helper @allen_ai @ai2_allennlp & familiar to two cats.  Single reeds, tango, swim, run, cocktails, ืืึทืืขึพืืฉืื, GenX. Opinions not your business.
                                     
                            
                    
                    
                                            Professor at UW; Researcher at Meta. LMs, NLP, ML. PNW life.
                                     
                            
                    
                    
                                    
                            
                    
                    
                                            Research Scientist at Meta โข ex Cohere, Google DeepMind โข https://www.ruder.io/
                                     
                            
                    
                    
                                            Sr Mgr & Research Scientist @ServiceNowRSRCH, Montreal