Lukasz Kaiser

Lukasz Kaiser

@lukaszkaiser.bsky.social

53 Followers 11 Following 14 Posts Joined Nov 2024
1 week ago
Post image Post image

If you ever want to see a really interesting AI thinking trace, push it really hard on literature or poetry suggestions.

Here is Claude 4.6 Opus working through poetry in its reasoning when I asked it to find something that captures the feeling of AI while avoiding its usual favorites (eg Rilke)

48 4 4 0
1 month ago

Don't tell anyone but such courses is the one place where I find AI browsers like Atlas from OpenAI very useful, it may take it for you ;)

0 0 0 0
2 months ago
Post image

Benchmarks from historians show that AI transcription from handwriting is now better than human, and a very cheap model is as good as people.

There are now massive troves of documents that could be made available for research that would have been impossible or prohibitive to transcribe before.

93 10 3 3
3 months ago
Preview
rl-wrong-about-rewards.md GitHub Gist: instantly share code, notes, and snippets.

I complain a lot about RL lately, and here we go again.

The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.

More at length here:

gist.github.com/yoavg/3eb3e7...

14 2 2 0
4 months ago
YouTube
Geordie Williamson: Neural Networks for Mathematical Discovery (October 29, 2025) YouTube video by Simons Foundation

How can we use neural networks to bolster mathematical discovery? Geordie Williamson's @simonsfoundation.org Presidential Lecture is online, catch up now:
www.youtube.com/watch?v=Uxr_...

7 2 0 0
5 months ago
Post image

Fresh on the arXiv: @booleananalysis.bsky.social, Kewen Wu, and I present new classical algorithms for the Short Integer Solution problem (under infinity norm) that outperform the elegant Chen-Liu-Zhandry quantum algorithm, showing that there is no exponential quantum speed up anymore.

18 3 3 0
6 months ago
Post image Post image Post image

We are starting to see some nuanced discussions of what it means to work with advanced AI in its current state

In this case, GPT-5 Pro was able to do novel math, but only when guided by a math professor (though the paper also noted the speed of advance since GPT-4)

The reflection is worth reading.

91 14 3 1
9 months ago

A fully autonomous robot which, every morning, sets plates on the table, fetches ingredients in the kitchen, and prepares avocado toast.

"Move things and breakfast."

53 6 3 1
9 months ago

(In case you hadn't been following, the environmental impact of current AI models is now much lower, generating 100,000 words with AI uses less power than watching Netflix for 45 minutes on your TV)

30 7 2 0
10 months ago

If you haven't done this with o3, you haven't really seen what these models can do.

30 1 1 0
10 months ago
Skeet from Ann Leckie reading: "Say it after me: Chat GPT is not a search engine. It does not scan the web for information, it just generates statistically likely sentences. You cannot use it a search engine, or as a substitute for searching.

Now. Please never use an LLM for information searches ever again."

This is one of the most-shared posts on Bluesky in the past day and it's just completely false. You might think ChatGPT is a *bad* search engine, or prefer another search engine. But it has had integrated web search since last year.

1,891 202 88 261
10 months ago
Post image Post image Post image

"o3, You are a consultant hired by the Dark Lord, analyze the org chart of Mordor. How would you improve it for today's changing Middle Earth"

o3 does some actual satire, ending with: “One Org to rule them all, One Org to find them, One Org to bring them all, And in the darkness, align them.”

151 17 9 6
10 months ago
Preview
GPT Finally Jumped Out of the System OpenAI’s o4-mini-high Model Solves the MU Puzzle and Demonstrates Why

For years I've been throwing the same puzzle challenge at new GPT models. Every one has failed, until now.

matthodges.com/posts/2025-0...

45 7 4 1
11 months ago

Oh, I see!! Yes, the id is totally unnecessary in this place. Probably a leftover or compatibility for the other API where you don't repeat everything on each call. Sorry for the confusion!!

1 0 0 0
11 months ago
Preview
OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Which API exactly is this? Is it function calling in OpenAI Responses API? Do you really need to send the whole history? The weather example doesn't seem to do it? platform.openai.com/docs/guides/function-calling?api-mode=responses

0 0 1 0
11 months ago

Now that's a good reason to ask why...

1 0 0 0
11 months ago

Isn't it because it's happening asynchronously over the network on different machines possibly for many tool innovations and chats in parallel and the id makes sure you find the path to the right place?

0 0 1 0
1 year ago
Post image

Exciting news: @waymo.bsky.social is beginning public service on the Peninsula, starting with Palo Alto, Mountain View, and Los Altos! Initial service area below.

71 6 4 3
1 year ago
Preview
Opinion | There Is a Liberal Answer to Elon Musk (Gift Article) Right-wing populism thrives on scarcity. The answer is abundance. But a politics of abundance will work only if Democrats confront where their approach has failed.

www.nytimes.com/2025/03/09/o...

305 58 31 19
1 year ago
Preview
Dn Dguild Hall Simulator

Try it or improve it: chatgpt.com/canvas/share...

11 1 0 0
1 year ago
Video thumbnail

This was fun: "o1, build a simulator of a D&D guild hall. Persistent characters come in, get quests, interact with each other, leave & return, make it procedurally generated"

I kept asking it to add other ideas (relationships, etc) 8 times, got no errors, just worked each time. Desire-based coding!

62 5 10 0
1 year ago

The NIH overhead cut doesn't just hurt universities.

It's deadly to the US economy.

The US is a world leader in tech due to the ecosystem that NIH and NSF propel. It drives innovation for tech transfer, creates a highly-skilled sci/tech workforce, and fosters academic/industry crossfertilization.

1,345 512 30 20
1 year ago
Line graph time series of 2025's daily Arctic sea ice extent compared to decadal averages from the 1980s to the 2010s. The decadal averages are shown with different colored lines with purple for the 1980s, blue for the 1990s, green for the 2000s, and white for the 2010s. Thin white lines are also shown for each year from 2000 to 2024. 2025 is shown with a thick gold line. There is a long-term decreasing trend in ice extent for every day of the year shown on this graph between January and April by looking at the decadal average line positions.

Saturday ice update - #Arctic sea ice extent is currently the *lowest* on record (JAXA data)

• about 790,000 km² below the 2010s mean
• about 1,450,000 km² below the 2000s mean
• about 2,040,000 km² below the 1990s mean
• about 2,430,000 km² below the 1980s mean

Plots: zacklabe.com/arctic-sea-i...

241 158 11 15
1 year ago

Neither read nor wrote, no illegal access at all!!

1 0 0 0
1 year ago
Post image Post image Post image Post image

OpenAI’s deep research is very good. Unlike Google’s version, which is mostly a good summarizer of many sources, OpenAI is more like engaging an opinionated (often almost PhD-level!) researcher who follows lead.

Look at how it hunts down a concept in the literature (& works around problems)

86 9 7 2
1 year ago

In all seriousness how batshit is it that a Chinese AI bot is censoring a book THAT HASN'T EVEN BEEN PUBLISHED YET. What dystopia are we all living in.

188 25 0 5
1 year ago

this post is trending in my feed but it does not make sense. i don't see any reasonable interpretation by which DeepSeek demonstrate that model scaling is not the best way to develop AI. their model is very large, and their training corpus is very large. they were just scaling more efficiently.

54 3 8 0
1 year ago

This post mostly argues about variants of training on test - maybe only a verifier, maybe only validation in test. None of that happened. The other point is more generally that hiding funding is a bad idea - and I personally agree very much, unsure why it happened as it's an especially bad idea here

0 0 0 0
1 year ago

It's certainly a weird one - but I only learned about it from the press, as I did about that dataset, I didn't realize OpenAI was involved until after they published their first paper. As I said - researchers may not agree (or even know) about many things, but that doesn't mean we train on test.

0 0 1 0
1 year ago

Also, as far as I can tell (I'm not a lawyer) there's nothing very non-standard in OpenAI work contracts. I have one and certainly have never agreed to lie or deceive. Not only that, but I actually find the culture internally very open to debate and criticism and very opposed to cheating of any kind

0 0 1 0