Patrice Bechard's Avatar

Patrice Bechard

@patricebechard.bsky.social

Applied Research Scientist working on LLMs at @ServiceNow. Opinions are my own.

12 Followers  |  86 Following  |  19 Posts  |  Joined: 13.11.2024  |  1.9545

Latest posts by patricebechard.bsky.social on Bluesky

Preview
StarFlow: Generating Structured Workflow Outputs From Sketch Images Workflows are a fundamental component of automation in enterprise platforms, enabling the orchestration of tasks, data processing, and system integrations. Despite being widely used, building workflow...

From notebook to workflowโ€”just by sketching.
Thatโ€™s the vision.

๐Ÿ”— arxiv.org/abs/2503.21889
๐Ÿ“ tinyurl.com/3utdbn97

Thanks to @joanrod.bsky.social, @perouz.bsky.social, @spandanagella.bsky.social and all co-authors!
#AI #VLM #WorkflowAutomation #Sketch2Flow #arXiv

29.05.2025 03:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ” Extra findings:

โ€ข Models struggle most with handwritten & whiteboard sketches
โ€ข UI screenshots are easiest
โ€ข End-to-end generation beats decomposed pipelines
โ€ข Finetuning on diverse sketch data is key to generalization

29.05.2025 03:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“Š We benchmarked top VLMs (GPT-4o, Claude, Gemini) vs. open-weight models (Qwen, LLaMA, Pixtral).

๐Ÿ“ˆ Finetuned open models outperform proprietary ones:

Qwen2.5-VL-7B โ†’ FlowSim: 0.614
GPT-4o โ†’ FlowSim: 0.786
๐๐ฐ๐ž๐ง๐Ÿ.๐Ÿ“-๐•๐‹-๐Ÿ•๐ (๐Ÿ๐ข๐ง๐ž๐ญ๐ฎ๐ง๐ž๐) โ†’ ๐…๐ฅ๐จ๐ฐ๐’๐ข๐ฆ: ๐ŸŽ.๐Ÿ—๐Ÿ“๐Ÿ•

29.05.2025 03:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿง  We built a large dataset (22K+ samples) of workflow diagrams:

โ€ข Synthetic (Graphviz)
โ€ข Manual (hand-drawn)
โ€ข Whiteboard
โ€ข Digital
โ€ข UI screenshots

These were paired with structured JSON workflow outputs for training and evaluation.

29.05.2025 03:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐–๐ก๐ฒ?

Workflow automation is powerfulโ€”but authoring flows is still complex, even with low-code tools.
๐Ÿ’ซ๐’๐ญ๐š๐ซ๐…๐ฅ๐จ๐ฐ explores a simpler interface: ๐ฃ๐ฎ๐ฌ๐ญ ๐๐ซ๐š๐ฐ ๐ข๐ญ.

Imagine sketching a workflow on a whiteboard and getting a runnable flow in return.

29.05.2025 03:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

๐Ÿš€ New paper from our team at @servicenowresearch.bsky.social!โฃ
โฃ
๐Ÿ’ซ๐’๐ญ๐š๐ซ๐…๐ฅ๐จ๐ฐ: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐’๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž๐ ๐–๐จ๐ซ๐ค๐Ÿ๐ฅ๐จ๐ฐ ๐Ž๐ฎ๐ญ๐ฉ๐ฎ๐ญ๐ฌ ๐…๐ซ๐จ๐ฆ ๐’๐ค๐ž๐ญ๐œ๐ก ๐ˆ๐ฆ๐š๐ ๐ž๐ฌโฃ
We use VLMs to turn ๐˜ฉ๐˜ข๐˜ฏ๐˜ฅ-๐˜ฅ๐˜ณ๐˜ข๐˜ธ๐˜ฏ ๐˜ด๐˜ฌ๐˜ฆ๐˜ต๐˜ค๐˜ฉ๐˜ฆ๐˜ด and diagrams into executable workflows ๐Ÿ–๏ธโ†’โš™๏ธโฃ
โฃ
๐Ÿ”— arxiv.org/abs/2503.218...
๐Ÿ“ tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM

29.05.2025 03:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Multi-task retriever fine-tuning for domain-specific and efficient RAG Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical limitations such as generating hallucinated or outdated information. H...

๐Ÿ” Want to learn more? Look at our paper to learn more on how to:

* Build balanced training datasets for real-world tasks
* Learn how to handle data imbalance
* Get insights on how to design for at-scale deployment

arxiv.org/abs/2501.04652

09.01.2025 15:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐ŸŒŸ Key Features:

* One retriever for many use cases
* Works across languages! ๐ŸŒ
* Handles structured data like workflows
* Lightweight & fast for production
* Generalizes to new domains & tasks

09.01.2025 15:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“Š Our Results:

Multi-task instruction fine-tuning FTW! Our approach beats both BM25 and strong off-the-shelf encoder models across all retrieval tasks (in-distribution and out-of-distribution).

09.01.2025 15:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ’ก The Challenge:

* RAG needs domain-specific knowledge
* Multiple apps = multiple retrievers = ๐Ÿ’ฐ
* Different types of data (steps, tables, fields, ...)

09.01.2025 15:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿš€ Excited to share our new work on making RAG actually work for enterprise applications!
We present a recipe to build a custom retriever that handles multiple retrieval tasks simultaneously for domain-specific RAG applications ๐Ÿงต

09.01.2025 15:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Weโ€™re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

12.12.2024 17:55 โ€” ๐Ÿ‘ 20    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Post image

๐ŸŽ‰ Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
๐Ÿ“„ Documents
๐ŸŒ Web content
๐Ÿ–ฅ๏ธ GUI understanding
๐Ÿ‘จโ€๐Ÿ’ป Code generation from images
Weโ€™re also launching BigDocs-Bench:
โžก๏ธ Document, Web, GUI Visual reasoning
โžก๏ธ Converting images into JSON, Markdown, LaTeX, SVG, and more!

10.12.2024 18:34 โ€” ๐Ÿ‘ 16    ๐Ÿ” 8    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
Generating a Low-code Complete Workflow via Task Decomposition and RAG AI technologies are moving rapidly from research to production. With the popularity of Foundation Models (FMs) that generate text, images, and video, AI-based systems are increasing their complexity. ...

Ready to learn more? Check out our full paper here: arxiv.org/abs/2412.00239

If this sounds exciting, follow us! Weโ€™ve got more papers and insights on the wayโ€”donโ€™t miss out! ๐Ÿš€

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Finally, we outline trade-offs and practical considerations, from latency improvements to deployment strategies. If youโ€™re designing GenAI systems, this is a goldmine of insights!

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Evaluation was key: we developed a novel tree-based metric, Flow Similarity, to assess workflow correctness. Plus, we measured each sub-task and RAG component separately for fine-grained insights.

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We dive deep into dataset creation, discussing how Task Decomposition guided our labeling efforts. By focusing on smaller tasks, we sped up labeling, reduced costs, and iteratively improved our system.

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

RAG enhances the system by grounding the generation process in real-time data from the environment. This reduces hallucinations and ensures that the generated workflows are accurate and context-aware.

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Task Decomposition allows us to split the workflow generation into two sub-tasks:

1. Outlining the workflow structure
2. Populating inputs for each step

Each sub-task is easier to solve and test, boosting the systemโ€™s modularity and maintainability.

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We tackle a real-world use case: Workflow Generation. Given a user requirement in natural language, our system generates complex workflows step by step. This involves breaking the problem into smaller, manageable tasks.

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Looking to build an LLM-powered app but finding it hard to make it robust? Weโ€™ve got you covered! Our new paper explores how Task Decomposition and Retrieval-Augmented Generation (RAG) can help you create reliable systems. ๐Ÿงต๐Ÿ‘‡

03.12.2024 15:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@patricebechard is following 20 prominent accounts