A few months ago we quietly open-sourced a PyTorch video decoding library called torchcodec -- small, nimble, fast, supports GPU decoding via ffmpeg.
The Hugging Face folks had some nice things to say about it as they integrated it into LeRobot.
Check it out here: github.com/pytorch/torc...
17.03.2025 16:29 β π 41 π 4 π¬ 0 π 0
the Aria 2 glasses are pretty great for robot data collection.
they're also getting really good for general agentic use...
Read the full announcement here: www.meta.com/blog/project...
27.02.2025 19:30 β π 12 π 2 π¬ 1 π 0
i added an example here: bsky.app/profile/soum...
01.01.2025 19:57 β π 5 π 0 π¬ 1 π 0
what I'm finding is that, the models want to be more of an artist than a replacement for photoshop -- which is fine, but I want to be the artist here, and want the tool to be more of a "magically easier photoshop where I ask it what to do in detail, and it does that -- not more not less"
01.01.2025 19:56 β π 8 π 0 π¬ 2 π 0
i'll give a representative but not exact example:
change the color of X's shirt from blue to red: the generations often change the entire shirt style itself -- they don't respect how much and what I'm trying to change, and dont try to preserve details I ask to preserve
01.01.2025 19:56 β π 8 π 0 π¬ 2 π 1
what are AI products that allow me to transform existing images, while preserving some selective details (that i select), like faces, areas, etc.?
the tools I've used so far only take the selection as a hint, or dont generate well around the selection?
trying personalized art
01.01.2025 19:46 β π 17 π 0 π¬ 3 π 0
3. They've also made it easy to load MJCF and other common specs used in robotics. They've also made visualization work out of the box (they hacked up a hybrid of pyrender, pyglet and LuisaRender with a ton of their own patches).
20.12.2024 21:03 β π 4 π 0 π¬ 0 π 0
2. The APIs are reasonably simple and well-designed, and they did take out the cross-platform pain in many ways -- CPU, CUDA, Metal etc. are all supported across Linux, OSX, Windows -- thanks to Taichi (and to a small part PyTorch).
20.12.2024 21:03 β π 3 π 0 π¬ 1 π 0
1. It's nice that the internals are written with Taichi, so all the sim code is written in python, more accessible and easy-to-read than retrofitting physics on top of a Tensor compiler (like mujoco did with MJX) and possibly faster because Taichi is a more suited DSL / compiler.
20.12.2024 21:03 β π 1 π 0 π¬ 1 π 0
The whole GenAI/LLM/VLM stuff seems to be unreleased or "aspirational".
My favorite aspects:
20.12.2024 21:03 β π 1 π 0 π¬ 1 π 0
It's basically like Mujoco but with more advanced materials/rendering/solvers, written all in Python thanks to being powered by Taichi, which makes it much more accessible.
I like it a lot. It's very accessible.
They went too far with marketing, but willing to ignore it for now.
20.12.2024 21:03 β π 7 π 1 π¬ 1 π 0
Genesis
i rabbit-holed into the Genesis Sim codebase because it went viral on X, and the website is hypey and unclear; and I didn't want to just blindly retweet.
genesis-embodied-ai.github.io
π
20.12.2024 21:03 β π 41 π 3 π¬ 1 π 0
also, congrats OpenAI on O3, and thank you for rapidly making progress on intelligence.
20.12.2024 20:59 β π 1 π 1 π¬ 0 π 0
Models are dumb as rock without the right context -- pretrained context doesn't help with day-to-day or specialized things.
Private ecosystems and company bureaucracies means you have to feed the models your own context for the next X years....unless computer-use gets ready.
Cant wait for it!
20.12.2024 20:59 β π 6 π 1 π¬ 1 π 0
intelligence is starting to get good, but context is still siloed for stupid reasons.
get models that do human-level computer-use already, please...!
20.12.2024 20:59 β π 12 π 2 π¬ 1 π 0
Glean for personal/self-hosted: is there an open source / self-hosted project that integrates pulling context from gmail, docs, sheets, calendar, whatsapp, ig, imessage, etc.?
18.12.2024 20:20 β π 16 π 2 π¬ 1 π 0
I'd like to introduce what I've been working at @hellorobot.bsky.social: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration.
Check it out: github.com/hello-robot/...
Thread ->
03.12.2024 16:51 β π 132 π 23 π¬ 6 π 4
so much detail, it's incredible that you've gotten this deep....twice βΊοΈ!!!
19.11.2024 12:20 β π 4 π 0 π¬ 1 π 0
hi sup!
17.11.2024 18:54 β π 3 π 0 π¬ 0 π 0
Very excited about this new project, DynaMem. It allows our robots to function in previously unseen environments, performing long-horizon manipulation tasks. Most importantly it *generalizes*, meaning you can try it out on a wide variety of homes and on different objects. (4x video)
09.11.2024 15:26 β π 31 π 6 π¬ 2 π 1
New here? Interested in AI/ML? Check out these great starter packs!
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...
09.11.2024 09:13 β π 558 π 216 π¬ 68 π 55
what are good starter packs for: AI researchers, AI Systems people, GenAI hackers, LLM enthusiasts?
16.11.2024 01:49 β π 44 π 2 π¬ 7 π 0
CS PhD student at UCSB
AI for healthcare / LLM writing assistants / Image Editing
kenantang.github.io
www.dgp.toronto.edu/~hertzman
Research Scientist Meta/FAIR, Prof. University of Geneva, co-founder Neural Concept SA. I like reality.
https://fleuret.org
Interpretable Deep Networks. http://baulab.info/ @davidbau
Professor, University Of Copenhagen π©π° PI @belongielab.org π΅οΈββοΈ Director @aicentre.dk π€ Board member @ellis.eu πͺπΊ Formerly: Cornell, Google, UCSD
#ComputerVision #MachineLearning
Research at Google DeepMind. Ex-Physicist. Controllable World Simulators (GNNs, Structured World Models, Neural Assets). TLM Veo Capabilities (Ingredients & more).
π San Francisco, CA
Official account for the IEEE/CVF International Conference on Computer Vision. #ICCV2025 Honolulu πΊπΈ Co-hosted by @natanielruiz @antoninofurnari @yaelvinker @CSProfKGD
Official account for IEEE/CVF Conference on Computer Vision & Pattern Recognition. Hosted by @deblinaml @jbhaurum & @CSProfKGD
ππ π cvpr.thecvf.com π June 19, 1983
Official Account for the European Conference on Computer Vision (ECCV) #ECCV2026, Malmo πΈπͺ Hosted by @jbhaurum and @CSProfKGD
Computer Vision & Machine Learning
π Pioneer Centre for AI, University of Copenhagen
π https://www.belongielab.org
Bot. I daily tweet progress towards machine learning and computer vision conference deadlines. Maintained by @chriswolfvision.bsky.social
PhD-ing at UMD. Knows a little about multimodal generative models. Check out my website to know more - https://somepago.github.io/
Associate Professor @ Cornell, Computer vision & machine learning
Professor of Computer Vision, @BristolUni. Senior Research Scientist @GoogleDeepMind - passionate about the temporal stream in our lives.
http://dimadamen.github.io
creations with code and networks
Generative AI and computer graphics at Aalto University & NVIDIA Research. @ellis.eu Fellow. https://users.aalto.fi/~lehtinj7
Associate Professor at UMD CS. YouTube: https://youtube.com/@jbhuang0604
Interested in how computers can learn and see.
Director, Max Planck Institute for Intelligent Systems; Chief Scientist Meshcapade; Speaker, Cyber Valley.
Building 3D humans.
https://ps.is.mpg.de/person/black
https://meshcapade.com/
https://scholar.google.com/citations?user=6NjbexEAAAAJ&hl=en&oi=ao