Niladri Shekhar Dutt's Avatar

Niladri Shekhar Dutt

@niladridutt.bsky.social

Research Intern @adobe.com | PhD @ucl.ac.uk | @ellis.eu | ex-Nvidia, Berkeley | Interested in generative modelling in vision and graphics + reasoning (LLMs) https://niladridutt.com/

1,915 Followers  |  742 Following  |  40 Posts  |  Joined: 16.11.2024  |  1.639

Latest posts by niladridutt.bsky.social on Bluesky


MonetGPT

๐Ÿงต10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu!
For more details check out our paper below-

๐ŸŒ Project Website: monetgpt.github.io
๐Ÿ“„ Arxiv: arxiv.org/abs/2505.06176

27.05.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿงต9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

27.05.2025 15:13 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต8/10 Photo editing is subjective ๐ŸŽจ. Our framework adapts to user preference by guidance from natural language tags like โ€˜vibrantโ€™ or โ€˜retro vibeโ€™ to produce personalized and stylistically distinct retouching plans from the same input image.

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต6/10 ๐Ÿงฉ Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต5/10 ๐Ÿงฉ Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต4/10 ๐Ÿงฉ Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿงต3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle โœ๏ธ. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿงต2/10 MLLMs lack the visual understanding to plan edits. ๐Ÿง  So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

27.05.2025 15:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต1/10 Excited to share our #SIGGRAPH paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" ๐ŸŒŸ
We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching
#MLLM

27.05.2025 15:13 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
MonetGPT

๐Ÿงต10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu!
For more details check out our paper below-

๐ŸŒ Project Website: monetgpt.github.io
๐Ÿ“„ Arxiv: arxiv.org/abs/2505.06176

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿงต9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต8/10 Photo editing is subjective ๐ŸŽจ. Our framework adapts to user preference by guidance from natural language tags like โ€˜vibrantโ€™ or โ€˜retro vibeโ€™ to produce personalized and stylistically distinct retouching plans from the same input image.

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต7/10 Our puzzle-based training with a 'reasoning as a pathway' approach allows MonetGPT to generate detailed justifications for each edit, delivering truly explainable image retouching

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต6/10 ๐Ÿงฉ Puzzle C builds planning capabilities. The model learns to generate a complete, multi-step retouching plan to enhance a photo, structuring its reasoning as a sequence of discrete issues and solutions for clarity and control.

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต5/10 ๐Ÿงฉ Puzzle B imparts aesthetic judgement. By ranking professionally edited photos against altered versions, the MLLM learns to recognize the visual characteristics of an optimally adjusted image for any given operation, building an internal aesthetic model.

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต4/10 ๐Ÿงฉ Puzzle A builds an understanding of individual operations. The MLLM learns to map visual changes in before/after images to a specific tool and its precise parameter value, effectively learning the semantics of our procedural library.

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿงต3/10 Our key recipe: MLLMs struggle to predict edit values directly. We solve this by generating rich textual reasoning for each puzzle โœ๏ธ. We then fine-tune MonetGPT on this data, creating a 'reasoning pathway' that enables it to regress final adjustment values.

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿงต2/10 MLLMs lack the visual understanding to plan edits. ๐Ÿง  So, we use expert photos as our ground truth and work backward, procedurally creating puzzles by assuming any change to an expert edit makes it less optimal

27.05.2025 15:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Hi Londoners

Join us on April 15 for an evening on Gen AI for 3D at UCL! We have an amazing list of keynote speakers and lightning talks. Register at londongenai.github.io

Very excited to co-organize this with Michael and Preddy!

10.04.2025 09:58 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Amazon came pretty late to India and we already had some homegrown companies like Flipkart which now competes with Amazon and is valued at $40B.
I think big tech's early access killed homegrown companies. China and to an extent South Korea (Naver) has some great tech companies because of barriers

27.02.2025 11:50 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Who will tell the silicon valley tech bros that it wasn't them alone

07.02.2025 20:34 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Haha exactly what I did today!

24.01.2025 17:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
The image illustrates the evolution of cleaning tasks, balancing time (ATUS) and well-being (ATUS-WB). A central figure considers two options: Manual Labor (vacuum cleaner) and Automated Labor (robotic vacuum), connected by an orange arrow labeled B1-K.

The image illustrates the evolution of cleaning tasks, balancing time (ATUS) and well-being (ATUS-WB). A central figure considers two options: Manual Labor (vacuum cleaner) and Automated Labor (robotic vacuum), connected by an orange arrow labeled B1-K.

๐Ÿค”What tasks do we want robots to handle? Are these preferences based on saved time or feelings we associate with the tasks?

Introducing Why Automate This?โ€”a study exploring automation preferences across social groups, using feelings & time-spent as key factors. ๐Ÿ‘‡ (1/5)

17.01.2025 16:13 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Have a great time in Seattle!

02.12.2024 19:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

All the papers I've reviewed still have some reviewers who haven't participated in the discussion/replied to the rebuttal at all. This is true even after the authors have nudged the reviewers a few times already :(

25.11.2024 18:29 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Added you!

24.11.2024 22:37 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Added you!

24.11.2024 19:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Added you!

24.11.2024 19:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ‘‹

24.11.2024 12:55 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@niladridutt is following 18 prominent accounts