Jonathan Lorraine's Avatar

Jonathan Lorraine

@jonlorraine.bsky.social

Research scientist @NVIDIA | PhD in machine learning @UofT. Previously @Google / @MetaAI. Opinions are my own. πŸ€– πŸ’» β˜•οΈ

561 Followers  |  1,905 Following  |  44 Posts  |  Joined: 20.11.2024  |  1.7503

Latest posts by jonlorraine.bsky.social on Bluesky

Preview
NVIDIA 2026 Internships: PhD Generative AI Research - US | NVIDIA Corporation By submitting your resume, you're expressing interest in one of our 2026 Generative AI focused Research Internships. We'll review resumes on an ongoing basis, and a recruiter may reach out if your exp...

Apply here: nvidia.eightfold.ai/careers?star...

I'm personally interested in multimodal generation and the tools that power it.

09.10.2025 04:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NVIDIA Spatial Intelligence Lab (SIL) Advancing foundational technologies enabling AI systems to perceive, model, and interact with the world.

πŸ” New NVIDIA Spatial Intelligence Lab internship postings for 2026.

Come work with us to advance foundational technologies that enable AI systems to model and interact meaningfully with the world!

Topics on our homepage: research.nvidia.com/labs/sil/

Application link below

09.10.2025 04:56 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Join us at #CVPR2025 for a preview of this #NVIDIA tech during a live-coding session. A #GPU back end will be reserved for all attending – just don’t forget to bring your laptop for some hands-on fun!
Wed, Jun 11, 8am-noon, or join in at 10:20 after the break. tinyurl.com/nv-kaolin-cv...

06.06.2025 21:20 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

We find a new set of use cases for Stable Audio Open ( @jordiponsdotme.bsky.social, @stabilityai.bsky.social, @hf.co) and other large pretrained audio generative models, like AudioLDM and beyond!

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Our work is inspired by and builds on the SDS update of DreamFusion (dreamfusion3d.github.io/, @benmpoole.bsky.social , @ajayjain9.bsky.social , @jonbarron.bsky.social), and related updates (VSD, SDI @vincentsitzmann.bsky.social, SJC, many more!)

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ SDS treats any differentiable parameter set as optimizable from a prompt. Source-guided separation emerged when we brainstormed novel uses. We hope for similarly practical tasks to surfaceβ€”e.g., automatic Foley layering?β€”as the community experiments.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸš€ Vision of the Future: Content designers easily use one video + audio diffusion backbone with SDS-style updates to nudge any differentiable taskβ€”impacts, lighting, cloth, fluidsβ€”until the joint model says β€œlooks & sounds right” given powerful user controls, like text.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

⚠️ Limitations ⚠️

Clip-Length Budget: We optimized on ≀10 s clips; minute-scale audio may have artifacts or blow up memory. A hierarchical/windowed Audio-SDS could help here.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

⚠️ Limitations ⚠️

Audio-Model Bias: We rely on Stable Audio Open, so when this struggles, e.g., on rare instruments, speech, audio without silence at the end, or out-of-domain SFX, our method can have difficulties. Other diffusion models can help here.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This project was led by the great work of @jrichterpowell.bsky.social along with Antonio Torralba.

See more work from the NVIDIA Spatial Intelligence Lab: research.nvidia.com/labs/toronto...

Work supported indirectly by MIT CSAIL, @vectorinstitute.ai

#nvidia #mit

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Results on Prompt-Guided Source Separation:

We report an improved SDR to ground-truth sources when available and show improved CLAP scores after training.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Results on Tuning FM Synthesizers & Impact Synthesis:

We improve CLAP scores over training for prompts, along with qualitative results. Impact synthesis shows improved performance on impact-oriented prompts.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Results on Fully-Automatic In-the-Wild Source Separation:

We demonstrate a pipeline that takes a video from the internet, captions the audio with a model (like AudioCaps), and provides that to an LLM-assistant who suggests source decompositions. We run our method on the suggested decompositions.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Modifications to SDS for Audio Diffusion:

πŸ…° We use an augmented Decoder-SDS in audio space, πŸ…± using a spectrogram emphasis to better weight transients, and πŸ…²οΈ multiple denoising steps to increase fidelity.

This image highlights these in red in the detailed overview of our update.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

β‘’ Prompt-Guided Source Separation:

A prompt-conditioning source separation for a given audio, such as separating a β€œsax …” and β€œcars …” from a music recording on a road, by using the audio-SDS update for each channel while forcing the sum of channels to reconstruct the audio.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

β‘‘ Physical Impact Synthesis:

We generate impacts consistent with prompts like β€œhitting pot with wooden spoon” by convolving an impact with a learned object and reverb impulse. We learn the parametrized forms of the object and reverb impulses.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

β‘  FM Synthesis:

A toy setup where we generate settings aligning with prompts like β€œkick drum, bass, reverb” using sine oscillators modulating each other’s frequency as in a synthesizer.

We visualize the final optimized parameters as the dial settings on a synthesizer instrument's user interface.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We propose three novel audio tasks: β‘  FM Synthesis, β‘‘ Physical Impact Synthesis, and β‘’ Prompt-Guided Source Separation.

This image briefly summarizes the use case, optimizable parameters, rendering function, and parameter update.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Intuitively, our update finds a direction to move the audio to increase its probability given the prompt, by noising and denoising with our diffusion model, then β€œnudging” our audio towards it by propagating the update through our differentiable rendering to our audio parameters.

09.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

πŸ”Š New NVIDIA paper: Audio-SDS πŸ”Š
We repurpose Score Distillation Sampling (SDS) for audio, turning any pretrained audio diffusion model into a tool for diverse tasks, including source separation, impact synthesis & more.

🎧 Demos, audio examples, paper: research.nvidia.com/labs/toronto...

🧡below

09.05.2025 16:06 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

What if you could control the weather in any video β€” just like applying a filter?
Meet WeatherWeaver, a video model for controllable synthesis and removal of diverse weather effects β€” such as 🌧️ rain, β˜ƒοΈ snow, 🌁 fog, and ☁️ clouds β€” for any input video.

02.05.2025 14:19 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
Video thumbnail

πŸš€ Announcing meshgen (AI in Blender) Update 0.6

πŸ€— Now supports remote backend via litellm, e.g. Hugging Face, ollama
🎨 UI/UX overhaul

This lays the foundation for Blender agents and more advanced 3D AI models (coming this year)

14.03.2025 20:09 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

🧊 New dataset of 50k+ low poly obj meshes

🎒 objaverse subsampled to <500 poly models, converted to untextured objs

πŸ”§ suitable for training autoregressive transformer-based 3D models, which have limited context length, such as LLaMA-Mesh

04.03.2025 22:28 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Here's a recap of what happened in AI 3D in 2024

30.12.2024 22:44 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

We envision a future where LLMs are universal generative tools capable of seamlessly producing content across multiple modalities, including text, images, videos, and 3D structures.

12.12.2024 19:10 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Integrating 3D mesh generation into LLMs opens exciting possibilities for interactive design. Users can converse with a model to create and manipulate 3D objects in real time.

12.12.2024 19:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We're excited to scale LLaMA-Mesh to handle more complex and detailed meshes by extending context lengths. Integrating textures and physical properties, exploring larger base models, part-based generation, and enabling dynamic generation are interesting ways forward!

12.12.2024 19:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Due to context length constraints, we're currently limited to meshes with up to 500 faces. We generate one 3D object per dialog due to our fine-tuning dataset construction. We see a slight degradation in language ability, perhaps due to using UltraChat in fine-tuning.

12.12.2024 19:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This project was led by Zhengyi Wang with Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng.

See more work from the #NVIDIA Toronto AI Lab here: research.nvidia.com/labs/toronto...

Work supported by Tsinghua University, @vectorinst.bsky.social, @uoft.bsky.social #UofT #Tsinghua

12.12.2024 19:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We generate diverse and high-quality 3D meshes directly from textual prompts without expanding the vocabulary or introducing new tokenizers.

12.12.2024 19:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@jonlorraine is following 18 prominent accounts