In the mean time, here's a rule of thumb "if your project can be vibecoded in an hour, and amounts to O(10) LoC edits on something existing, or is a convergence proof that o4-mini can do with a bit of guidance, DO NOT write a paper about it":D
16.05.2025 14:57 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
I think that the current most bullet proof peer review has been "people will read/try your stuff, and if it works they build on it". But because it's not attached to a formal process on openreview we discard it as being non-scientific.
16.05.2025 14:57 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
It seems to me that is totally misaligned with scientific discovery and progress. I don't believe this is a result of bad actors btw. It's just that huge, and complex systems that are O(100) years old take a long time to change, and readjust to new realities. We'll eventually figure it out.
16.05.2025 14:57 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
it seems to me that mostly ML academia (i am part of it!) is a proponent of keeping peer review and mega ML conferences going & the bean counter running. We've not found a solution to reviews converging to random coin tosses, at a huge expense of human work hours.
16.05.2025 14:57 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
If that's indeed the case (i believe we can measure that), and their key function is social, and a way for people to connect (that's great!), what's the point of having peer review, and using # neurips papers as a bean counter?
16.05.2025 14:57 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
my post is a direct criticism to the 100k neurips submissions issue. It's beyond clear that research dissemination--for the most part--does not happen through conferences any more.
16.05.2025 14:57 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
What if for most of your findings you just post a thread and share a GitHub repo, rather than submitting a 15 page NeurIPS paper with < 1/100 the reach?
16.05.2025 14:57 โ ๐ 4 ๐ 0 ๐ฌ 3 ๐ 0
LLMs learn world models, beyond a reasonable doubt. It's been the case since GPT-3, but now it should be even more clear. Without them "Guess and Check" would not work.
The fact that these "world models" are approximate/incomplete does not disqualify them.
12.05.2025 18:38 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Working on the yapping part :)
08.05.2025 03:58 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
hmm.. temp has to be 0.6-0.8, this looks like very low temp outputs
08.05.2025 02:31 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
I donโt see at all how this is intellectually close to what Shannon wrote. Can you clarify? I read it as computing statistics and how these are compatible with theoretical conjectures. Thereโs no language generation implicit in the article. Am I misreading it?
07.05.2025 23:02 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
can you share the paper?
07.05.2025 22:05 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Dimitris Papailiopoulos on X: "For historical context, here are some pictures of the Greek islands in the late 40s https://t.co/WJsM83IHW1" / X
For historical context, here are some pictures of the Greek islands in the late 40s https://t.co/WJsM83IHW1
BTW for historical context, 1948, is very very very early to have these thoughts. So i actually think that every single sentence written is profound. This is kinda random, but here is how Greece looked back then. IT WAS SO EARLY :) x.com/DimitrisPapa...
07.05.2025 18:44 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
it's not that profound. it just says, there's no wall, if all stars are aligned. it's an optimistic read of the setting.
07.05.2025 17:24 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Is 1948 widely acknowledged as the birth of language models and tokenizers?
In "A Mathematical Theory of Communication", almost as an afterthought Shannon suggests the N-gram for generating English, and that word level tokenization is better than character level tokenization.
07.05.2025 12:05 โ ๐ 11 ๐ 0 ๐ฌ 2 ๐ 2
๐The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.
01.05.2025 00:50 โ ๐ 21 ๐ 8 ๐ฌ 1 ๐ 1
I am afraid to report, RL works.
I think 2-3 years ago, I said I will not work on two ML sub-areas. RL was one of them. I am happy to say that I am not strongly attached to my beliefs.
30.04.2025 20:08 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0
researchers
30.04.2025 16:43 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Re: The Chatbot Arena Illusion
Every eval chokes under hill climbing. If we're lucky, thereโs an early phase where *real* learning (both model and community) can occur. I'd argue that a benchmarkโs value lies entirely in that window. So the real question is what did we learn?
30.04.2025 16:38 โ ๐ 9 ๐ 1 ๐ฌ 1 ๐ 0
Also a sycophant etymologically means "the one who shows the figs"; the origin of the meaning is kinda debated, either refers to illegally importing figs, or to falsely accusing someone of hiding illegally imported figs
28.04.2025 13:58 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0
Fun trivia now that โsycophantโ became common language to describe LLMs flattering users:
In Greek, ฯฯ
ฮบฮฟฯฮฌฮฝฯฮทฯ (sykophรกntฤs) most typically refers to a malicious slanderer, someone spreading lies, not flattery!
Every time you use it, youโre technically using it wrong :D
28.04.2025 13:58 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
Search Jobs | Microsoft Careers
Come work with us at MSR AI Frontiers and help us figure out reasoning!
We're hiring at the Senior Researcher level (eg post phd).
Please drop me a DM if you do!
jobs.careers.microsoft.com/us/en/job/17...
21.02.2025 15:48 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0
x.com
bsky doesn't like GIFs, here they are from the other site x.com/DimitrisPapa...
13.02.2025 13:41 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Oh btw, self improvement can become exponentially faster in some settings, ory when we apply it on pretrained models (again this is all for add/mul/maze etc)
13.02.2025 13:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
An important aspect of the method is that you need to
1) generate problems of appropriate hardness
2) be able to filter our negative examples using a cheap verifier.
Otherwise the benefit of self-improvement collapses.
13.02.2025 13:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We test self-improvement across diverse algorithmic tasks:
- Arithmetic: Reverse addition, forward (yes forward!) addition, multiplication (with CoT)
- String Manipulation: Copying, reversing
- Maze Solving: Finding shortest paths in graphs.
It always works
13.02.2025 13:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Self-improvement is not newโthis idea has been explored in various contexts and domains (like reasoning, mathematics, coding, and more).
Our results suggest that self-improvement is a general and scalable solution to length & difficulty generalization!
13.02.2025 13:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
What if we leverage this?
What if we let the model label slightly harder dataโฆ and then train on them?
Our key idea is to use Self-Improving Transformers , where a model iteratively labels its own train data and learns from progressively harder examples (inspired by methods like STaR and ReST).
13.02.2025 13:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
I was kind of done with length gen, but then I took a closer look at that figure above..
I noticed that there is a bit of transcendence, i.e the model trained on n-digit ADD can solve slightly harder problems, eg n+1, but not much more.
(cc on transcendence and chess arxiv.org/html/2406.11741v1)
13.02.2025 13:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
AI Architect | North Carolina | AI/ML, IoT, science
WARNING: I talk about kids sometimes
Chief Models Officer @ Stealth Startup; Inria & MVA - Ex: Llama @AIatMeta & Gemini and BYOL @GoogleDeepMind
I am an assistant professor of robotics at Michigan. I run the fluent robotics lab with the mission of enabling robots to fluently work with and around people. fluent.robotics.umich.edu
Researcher @ Microsoft | ex. PhD @ CISPA | Neurodivergent ๐ง ๐ฆ | AI safety & security | life and peace for all โฎ๏ธ, permanent ceasefire ๐
Opinions my own.
prev: @BrownUniversity, @uwcse/@uw_wail phd, ex-@cruise, RS @waymo. 0.1x engineer, 10x friend.
spondyloarthritis, cars ruin cities, open source
Senior Lecturer #USydCompSci at the University of Sydney. Postdocs IBM Research and Stanford; PhD at Columbia. Converts โ into puns: sometimes theorems. He/him.
Assist. Prof. at Cornell University.
I like cycling, mathematics, statistics and computer science.
๐ฅ LLMs together (co-created model merging, BabyLM, textArena.ai)
๐ฅ Spreading science over hype in #ML & #NLP
Proud shareLM๐ฌ Donor
@IBMResearch & @MIT_CSAIL
Machine Learning Researcher | PhD Candidate @ucsd_cse | @trustworthy_ml
chhaviyadav.org
Associate Professor, CMU. Researcher, Google. Evaluation and design of information retrieval and recommendation systems, including their societal impacts.
Sterling Professor of Social and Natural Science at Yale University. Sociologist. Network Scientist. Physician. Author of Apollo's Arrow; Blueprint; Connected; and Death Foretold. Director of the Human Nature Lab: https://humannaturelab.net
Building generative models for high-dimensional science and engineering.
Assistant prof. @CarnegieMellon & affiliated faculty @mldcmu, previously instructor @NYU_Courant, PhD jointly @Harvard and @MIT
https://nmboffi.github.io
Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google. Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
Number theorist at UMontreal, author of โThe Distribution of Prime Numbersโ http://bookstore.ams.org/gsm-203