Levi Lelis's Avatar

Levi Lelis

@programsynthesis.bsky.social

Associate Professor - University of Alberta Canada CIFAR AI Chair with Amii Machine Learning and Program Synthesis he/him; ele/dele ๐Ÿ‡จ๐Ÿ‡ฆ ๐Ÿ‡ง๐Ÿ‡ท https://www.cs.ualberta.ca/~santanad

305 Followers  |  478 Following  |  81 Posts  |  Joined: 18.11.2024  |  2.114

Latest posts by programsynthesis.bsky.social on Bluesky

Eugene Vinitsky

Was talking to a student who wasn't sure about why one would get a PhD. So I wrote up a list of reasons!
www.eugenevinitsky.com/posts/reason...

27.07.2025 19:30 โ€” ๐Ÿ‘ 49    ๐Ÿ” 11    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 0
Post image

Previous work has shown that programmatic policiesโ€”computer programs written in a domain-specific languageโ€”generalize to out-of-distribution problems more easily than neural policies.

Is this really the case? ๐Ÿงต

02.07.2025 22:12 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Common Benchmarks Undervalue the Generalization Power of Programmatic Policies Algorithms for learning programmatic representations for sequential decision-making problems are often evaluated on out-of-distribution (OOD) problems, with the common conclusion that programmatic pol...

Sometimes, neural networks (with little tweaks) are enough. Other times, solving the task requires a programmatic representation to capture algorithmic structure.

Preprint: arxiv.org/abs/2506.14162

02.07.2025 22:12 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

1. Is the representation expressive enough to find solutions that generalize?
2. Can our search procedure find a policy that generalizes?

02.07.2025 22:12 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

So, when should we use neural vs. programmatic policies for OOD generalization?

Rather than treating programmatic policies as the default, we should ask:

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

As an illustrative example, we changed the grid-world task so that a solution policy must use a queue or stack to solve a navigation task. FunSearch found a Python program that provably generalizes. As one would expect, neural nets couldnโ€™t solve the problem.

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Are neural and programmatic policies similar in terms OOD generalization? We don't think so. We think that benchmark problems used in previous work actually undervalue what programmatic representations can do.

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Programmatic policies appeared to generalize better in previous work because they never learned to go fast in the easy training tracks. Neural nets optimized speed well, which made it difficult to generalize to tracks with sharp curves.

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

In a car-racing task, we adjusted the reward to encourage cautious driving. Neural nets generalized just as well as programmatic policies.

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We had to perform simple changes to the neural policies' training pipeline to attain similar OOD generalization to that exhibited by programmatic ones.

In a grid-world problem, we used the same sparse observation space as used with the programmatic policies augmented with the agent's last action.

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
arXiv.org e-Print archive

In a preprint, led by my Master's student Amirhossein Rajabpour, we revisit some of these OOD generalization claims and show that neural policies generalize just as well as programmatic ones on benchmark problems used in previous work.

Preprint: arxiv.org/abs/2506.14162

02.07.2025 22:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Previous work has shown that programmatic policiesโ€”computer programs written in a domain-specific languageโ€”generalize to out-of-distribution problems more easily than neural policies.

Is this really the case? ๐Ÿงต

02.07.2025 22:12 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

If like me your Discover feed has been even worse lately and you are here for ML/AI news and discussion, check out these two feeds:

- Paper Skygest
- ML Feed: Trending

Links below ๐Ÿ‘‡

29.06.2025 01:26 โ€” ๐Ÿ‘ 33    ๐Ÿ” 4    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Video thumbnail

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing.

But how do we discover such temporal structure?

Hierarchical RL provides a natural formalism-yet many questions remain open.

Here's our overview of the field๐Ÿงต

27.06.2025 20:15 โ€” ๐Ÿ‘ 34    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3
Preview
The Heat Dome Wants a Word With Climate-Change Deniers The temperatures gripping the US this week were made up to five times more likely by the fact that the atmosphere is simply hotter.

As hot as this summer is, itโ€™s also one of the coolest weโ€™ll ever enjoy again.

Just how much hotter and deadlier summers will get is still up to us. Right now weโ€™re working hard to make them worse

๐ŸŽ link to my @opinion.bloomberg.com column:

www.bloomberg.com/opinion/arti...

24.06.2025 12:32 โ€” ๐Ÿ‘ 86    ๐Ÿ” 42    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Preview
Robust Autonomy Emerges from Self-Play Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic drivi...

Hiring a postdoc to scale up and deploy RL-based planning onto some self-driving cars! We'll be building on arxiv.org/abs/2502.03349 and learn what the limits and challenges of RL planning are. Shoot me a message if interested and help spread the word please!

Full posting to come in a bit.

21.06.2025 17:14 โ€” ๐Ÿ‘ 60    ๐Ÿ” 24    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1

In addition to Sat's pointers, I would also take a look at the following recent paper by @swarat.bsky.social:

www.cs.utexas.edu/~swarat/pubs...

Also, the following paper covers most of the recent works on neuro-guided bottom-up synthesis algorithms:

webdocs.cs.ualberta.ca/~santanad/pa...

17.06.2025 03:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Weโ€™re extending the AIIDE deadline! Partially due to author requests, partially due to a significant increase in submissions meaning I need to increase the PC!

16.06.2025 18:21 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I wanted to thank the folks who reviewed our paper. Your feedback helped us improve our work, especially by asking us to include experiments on more difficult instances and the TSP. Thank you!

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Still, many important problems with real-world applications, such as the TSP and program synthesis, share some of the properties we assume in this work.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The work has a few limitations. The policy learning scheme was evaluated only on needle-in-the-haystack deterministic problems. Also, since we are using tree search algorithms, we assume the agent has access to an efficient forward model.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

In other cases, where clustering seems unable to find relevant structure, the subgoal-based policies do not seem to harm the search, as in Sokoban problems.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The empirical results are strong when clustering effectively detects the problem's underlying structure.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

To illustrate, this approach allows the agent to learn how to navigate between cities in a variant of the traveling salesman problem before the agent solves any TSP problemโ€”the agent learns from failed searches.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The approach learns policies for solving subgoals and a policy to mix the subgoal policies. Ultimately, we have a policy that can be used with any Levin-based algorithm, thus retaining their strong guarantees.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Instead, Jake invented a method that learns subgoals from the discarded data. It builds a graph from the expanded search trees and uses a clustering algorithm to break the problem into subproblems, forming subgoals.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Jake's paper offers a solution to something that has always bugged me in previous works combining learning and search. Previous approaches to learning a policy and/or a heuristic discarded failed searches. This includes the original 2011 Bootstrap paper.

www.sciencedirect.com/science/arti...

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

If you are not familiar with policy-guided tree search, here are some papers. Read these, and you might never use MCTS again to solve single-agent problems. ;-)

webdocs.cs.ualberta.ca/~santanad/pa...
webdocs.cs.ualberta.ca/~santanad/pa...
webdocs.cs.ualberta.ca/~santanad/pa...

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

LTS is a tree search algorithm with guarantees on the number of nodes it expands before finding a solution. This property is cool because it allows us to learn policies that minimize the tree size.

13.06.2025 03:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿงต1/ New paper! ๐Ÿ“„ Subgoal-Guided Policy Heuristic Search with Learned Subgoals, led by my PhD student @tuero.ca.

arxiv.org/pdf/2506.07255

This paper follows the Levin tree search (LTS) research line and focuses on learning subgoal-based policies.

13.06.2025 03:29 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@programsynthesis is following 20 prominent accounts