Maxim's Avatar

Maxim

@backpropaganda.bsky.social

Doing Computer Vision stuff with ML.

19 Followers  |  40 Following  |  14 Posts  |  Joined: 18.11.2024  |  1.6152

Latest posts by backpropaganda.bsky.social on Bluesky

for i, (input, target) in enumerate(data):
    output = model(input)
    loss = loss_fn(output, target)
    loss = loss / iters_to_accumulate
    loss.backward()

    if (i + 1) % iters_to_accumulate == 0:
        optimizer.zero_grad()

for i, (input, target) in enumerate(data): output = model(input) loss = loss_fn(output, target) loss = loss / iters_to_accumulate loss.backward() if (i + 1) % iters_to_accumulate == 0: optimizer.zero_grad()

This is just your standard gradient accumulation?

for i, (input, target) in enumerate(data):
output = model(input)
loss = loss_fn(output, target)
loss = loss / iters_to_accumulate
loss.backward()

if (i + 1) % iters_to_accumulate == 0:
optimizer.zero_grad()

19.12.2024 19:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Still a bit confused on when to use Illuminate vs NotebookLM for getting an audio overview of papers. Currently using Illuminate.

14.12.2024 22:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I find 'uv sync' so fast that I would just change the version in project.toml and .python-version and sync again. The fact that the env is tied to a directory may sometimes be a negative but it also ensures all package versions are tracked in git.

04.12.2024 06:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What seems to be currently the best approach for depth estimation, diffusion models or "old-school" discriminative models? Both seem to claim SOTA models nowadays?

02.12.2024 14:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Advent of Code 2024

This was my tenth(!) year building 25 days of puzzles for #AdventOfCode. You can solve them all for free! Most people write code to solve them, but you can solve them however you like. I hope they help people become better programmers. 🌟

The first puzzle comes out in two hours: adventofcode.com

01.12.2024 02:57 β€” πŸ‘ 1129    πŸ” 208    πŸ’¬ 61    πŸ“Œ 22

Every once in a re-read Joseph Redmon's YOLOv3 paper. That was really a work of art..
"Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. [...] I managed to make some improvements to YOLO. But, honestly, nothing like super interesting"

29.11.2024 15:39 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Would be interesting to see how it would perform for BOP dynamic onboarding.

As you can tell, I’ve started sharing interesting 6D pose estimation papers I come across. I already track these for myself, so why not share them with all of you?

26.11.2024 14:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
PickScan: Object discovery and reconstruction from handheld interactions Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most...

Authors: Vincent van der Brugge, Marc Pollefeys, Joshua B. Tenenbaum, Ayush Tewari, Krishna Murthy Jatavallabhula

Arxiv: arxiv.org/abs/2411.1...
Code: github.com/vincentva...

26.11.2024 14:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

PickScan: Object discovery and reconstruction from handheld interactions
(IROS 2024)

tl;dr: an interaction-guided and class-agnostic pipeline for scene reconstruction. The method lets a user move around objects, and outputs the object masks, 3D model and per-frame poses.

26.11.2024 14:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

As you can tell, I’ve started sharing interesting 6D pose estimation papers I come across. I already track these for myself, so why not share them with all of you?

25.11.2024 14:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Vision Foundation Model Enables Generalizable Object Pose Estimation Object pose estimation plays a crucial role in robotic manipulation, however, its practical applicability still suffers from limited generalizability. This paper addresses the challenge of...

Authors: Kai Chen, Yiyao Ma, Xingyu Lin , Stephen James, Jianshu Zhou, Yun-Hui Liu, Pieter Abbeel, Qi Dou

Openreview: https://openreview.net/forum?id=FTpKGuxEfy
Project page: https://vfm-6d.github.io/

25.11.2024 14:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

Vision Foundation Model Enables Generalizable Object Pose Estimation
(Neurips 2024)

tl;dr: The paper explores uses existing foundation models, to elaborate object pose estimation in 2 stages: category-level object viewpoint estimation and object coordinate map estimation.

25.11.2024 14:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The GPU poor do not have it easy ;). But usually it is just multiple notebooks and nvidia-smi is handy to see how much each notebook is taking up.

21.11.2024 15:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Haven't switched yet, is there an easy way to see which programs take up how much gpu memory like in nvidia-smi?

21.11.2024 09:16 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Thats why all my pytorch code looks like:
```
from torchvision.transforms.v2.functional import to_dtype, to_image
img_tensor = to_dtype(to_image(image), scale=True)
```

20.11.2024 19:19 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@backpropaganda is following 20 prominent accounts