Andreas Zeller (@andreaszeller) — Bluesky Profile

4 days ago

FLAT: Formal Languages as Types
And Their Applications in Testing
FENGMIN ZHU, CISPA Helmholtz Center for Information Security, Germany
ANDREAS ZELLER, CISPA Helmholtz Center for Information Security, Germany
Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses. They are
conceptually different, but existing mainstream programming languages treat them as the same string type. This is problematic:
the type system allows, for instance, malicious HTML text to be passed to a function expecting an email address.
To distinguish conceptually different string types and to avoid potential vulnerabilities, we regard formal languages as types
(FLAT), thereby restricting the set of valid strings using context-free grammars and, if needed, semantic constraints. Applying
this type-based approach, we offer a unified solution for string API documentation, input validation, malicious input detection,
language-based fuzzing, and test oracles, all at once, based on user-annotated formal language types and, if necessary, pre-
and post-conditions. We implement this idea and present FLAT-PY, a testing framework for Python. By attaching annotations
directly to Python code, FLAT-PY automatically performs runtime type checking via code instrumentation and reports any
detected type errors as soon as possible. We conducted case studies on real Python code fragments: FLAT-PY can detect
logical bugs from random inputs generated by a language-based fuzzer, relying on a reasonable number of user annotations.

In a call "retrieve(account: string)", nobody checks the contents of "account". What if we could specify its type not just as a string, but as a formal language - say, a regex "[0-9]+"? In our new paper, we do exactly this - for better type checking and even test generation: doi.acm.org?doi=3799978

5 0 0 0

5 days ago

Brad Pitt in front of a classroom (AI-generated)

My successor as a professor will be some AI video tutor with the appearance of Brad Pitt, available 24/7, unlimited patience, personalized towards each student, the ability to teach any subject ever discussed in a textbook, and a cost of < 1$/hour. Good thing I can still do research! (Now wait...)

3 0 0 0

1 week ago

IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller
Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research

IEEE Computer Society Harlan D. Mills Award and Talk by @andreaszeller.bsky.social

Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research

More information at conf.researchr.org/details/icse...

3 2 0 0

1 week ago

IEEE Computer Society Harlan D. Mills Award and Talk by Andreas Zeller: Should Computer Scientists Experiment Less? On the past, present, and future of software engineering research (ICSE 2026 - Main ... This year, ICSE 2026 innovates with an expanded Main Plenaries program—bringing a total of four exceptional keynote talks to the main conference stage. Across Wednesday to Friday, these sessions gathe...

"Should Computer Scientists Experiment Less?" This is the title of my upcoming Harlan D. Mills Award Talk at ICSE 2026 on the past, present, and future of Software Engineering research. Looking forward to lots of productive discussions!
conf.researchr.org/details/icse...

7 1 0 0

1 week ago

Mining metrics to predict component failures | Proceedings of the 28th international conference on Software engineering

Impact award! I am happy to report that my ICSE 2006 paper "Mining metrics to predict component failures," with Nachi Nagappan and Thomas Ball, has been selected to receive a retrospective ICSE SEIP Most Influential Paper Award. Read it here: dl.acm.org/doi/10.1145/...

12 0 0 0

1 week ago

Over the past decade, the automated generation of test inputs has made significant advances. Modern fuzzers and test generators easily
produce complex input formats that do systematically cover the input and execution space. Testing protocols, though, has remained a
frontier for automated testing, as a test generator has to interact with the program under test, producing messages that conform to
the current state of the system.
In this paper, we introduce language-based protocol testing, the first approach to specify, automatically test, and systematically cover
the full state and input space of protocol implementations. We specify protocols as interaction grammars—an extension of context-free
grammars that tag each message element with the communication party that is in charge of producing it. Interaction grammars embed
classical state models by unifying states, messages, and transitions all into nonterminals, and can be used for producing interactions as
well as parsing them, making them ideally suited for testing protocols. Additional constraints over grammar elements allow us to
specify and test semantic features such as binary message formats, checksums, encodings, and the many ways that message features
induce states and vice versa.
To evaluate the effectiveness of language-based protocol testing, we have implemented it as part of the FANDANGO test generator. We
specify several protocols as interaction grammars, including features such as human-readable interactions (SMTP), bit-level encodings
(DNS), and dynamic port assignments (FTP), and use them to test the corresponding protocol implementations. By systematically
covering the interaction grammar and solving the associated constraints, FANDANGO achieves comprehensive coverage of the protocol
interactions, resulting in high code coverage and a thorough assessment of the program under test.

With more and more AI-generated code, comprehensive system testing becomes more important than ever. Our new paper "Language-Based Protocol Testing" (with Alexander Liggesmeyer and Pepe Zamudio), shows how to specify and test all details of how programs interact: arxiv.org/abs/2509.20308

10 2 1 0

1 week ago

On my way to Savannah, Georgia to an IFIP WG 4.3 meeting, where I’ll present our work on Parameterized Compiler Testing (a joint work with my fantastic co-workers Addison Crump and Alexi Turcotte)

2 0 0 0

2 weeks ago

#Fandango 1.1 is now available! With this release, #Fandango becomes a full-fledged _protocol fuzzer_, happily exploring states and messages of protocols such as FTP or DNS. Thanks to José. Valentin, Alexander, and Marius for their hard work!
Find Fandango at fandango-fuzzer.github.io

6 1 1 0

3 weeks ago

About time: A multi-celebration for becoming a member of Academia Europaea, my SIGSOFT Influential Educator Award, my 60th birthday, becoming an IEEE Fellow, _and_ getting the 2026 IEEE Harlan D. Mills Award. With cake and fizzy drinks!

13 1 0 0

1 month ago

Reviewer-Author Collusion Rings and How to Fight Them In 2012, I attended a physical meeting of the program committee responsible for selecting the best scientific papers for the ESEC/FSE 2013 conference in Saint Petersburg, Russia. This meeting was part...

Starting this year, I will only review for conferences that get rid of a "bidding" phase, as allowing reviewers to bid on papers they want to review opens too many opportunities for manipulation and collusion. For details, see andreas-zeller.info/2025/12/07/R... #nobidding

9 0 1 0

1 month ago

I am happy to report that I have been named the recipient of the

2026 Harlan D. Mills award

"For sustained contributions to software debugging, program analysis, mining software repositories, and automated test generation." This is a big award – thanks to all!
www.computer.org/volunteering...

22 1 1 0

1 month ago

Fault localization aims to identify code regions responsible for failures. Traditional techniques primarily correlate statement
execution with failures; however, program behavior involves diverse execution features, including variable values, branch
conditions, and definition-use pairs, which can provide richer diagnostic insights.
This paper comprehensively investigates execution features for fault understanding, addressing two complementary goals.
First, we conduct an empirical study of 310 bugs across 20 projects, analyzing 17 execution features and assessing their
correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution
features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions
executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to
relying on individual features.
Second, building on these insights, we introduce a debugging approach that learns relevant features from labeled test
outcomes. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing
runs. The trained model generates fault diagnoses that explain the underlying causes of failures.
Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy. These interpretable diagnoses
empower developers to debug software efficiently by providing deeper insights into failures.

How do execution features relate to failures? In this new ACM TOSEM paper, Marius Smytzek, Martin Eberlein, Lars Grunske, and I analyze which execution features beyond code coverage correlate best with failures and lead to accurate explanations of failure causes: dl.acm.org/doi/10.1145/...

7 0 0 1

1 month ago

Four hours later, I _think_ I have fixed things again - reinstalled Python and all its packages, rebuilt Spotlight and Mail indexes, cleared macOS caches, subscribed to Creator Studio, and now back to these lost mails… Today I hate you, Apple.

0 0 0 0

1 month ago

* Mail has lost all my emails sent since Monday
* Mail search is broken too
* Search in reminders cannot find anything
* New Keynote is full of ads!?
* Invoke Python-3.13, get 3.14 instead - venvs are messed up
* LaTeX "minted" crashes (likely b/c Python)

So glad I'm an expert in debugging /sarcasm

4 0 1 0

1 month ago

Inferring Input Grammars from Code with Symbolic Parsing | ACM Transactions on Software Engineering and Methodology Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification ...

Fuzzing software becomes much more effective if you can generate _valid_ inputs. We have now built the first approach to _statically_ extract complete and precise input grammars from parser code, producing syntactically valid and diverse inputs by construction. Enjoy! dl.acm.org/doi/10.1145/...

12 4 0 0

1 month ago

After a visit to Max Planck Institute for Security and Privacy (MPI-SP) in Bochum, seeing my awesome colleagues @thorstenholz.bsky.social, @mboehme.bsky.social, Mathias Payer, and many more, now on my way to Paris to celebrate ten years of @softwareheritage.org with the great Roberto Di Cosmo

5 0 0 0

2 months ago

Correction: It's 2,000+ *en*-dashes ("--"), but actually 5,800 *em*-dashes ("---")

5 0 0 0

2 months ago

$ cd ~/Papers/
$ grep -e '[ ~]-- ' */*.tex | wc -l
2258
$

A researcher used more than 2,000 em-dashes in his papers, revealing AI-based manipulation in 400+ papers since 1985. Professor Zeller claims he "typed" these dashes into the paper by using "two hyphens" and a "typesetting" system.

19 1 1 0

2 months ago

Fun fact: This is my tenth test of time award :-) We will give a keynote at the FSE 2026 conference. @acm.org @sigsoft.bsky.social

1 0 0 0

2 months ago

When do changes induce fixes? | ACM SIGSOFT Software Engineering Notes As a software system evolves, programmers make changes that sometimes cause problems. We analyze CVS archives for fix-inducing changes---changes that lead to problems, indicated by fixes. We show how ...

Happy New Year! I am thrilled to report that Jacek Śliwerski, Tom Zimmermann, and I won the ACM SIGSOFT 2026 Impact Award 🏆 for "When do changes induce fixes?" (MSR 2005). The paper introduced the popular SZZ algorithm for linking change histories and bug databases: dl.acm.org/doi/10.1145/...

17 0 1 0

2 months ago

Problem: Reviewers did not read the paper.
Solution: Write a detailed rebuttal and point to all the places in the paper that answer their questions.
New problem: Reviewers did not read the rebuttal.

16 1 4 1

2 months ago

IPN Colloquium 15 12 2025 Andreas Zeller YouTube video by IPN (ICT Research Platform Nederland)

The talk is now online:

* Video: www.youtube.com/watch?v=tBO_...
* Slides: andreas-zeller.info/assets/Shoul...

Enjoy! -- Andreas

2 0 0 0

2 months ago

IPN Colloquium 15 12 2025 Andreas Zeller YouTube video by IPN (ICT Research Platform Nederland)

In an IPN vision talk last Monday, I sketched how future AI "super-coders" would learn from their own experiments with software to far surpass current LLM-based AI coders.

The talk is now online. Enjoy!

* Recording: www.youtube.com/watch?v=tBO_...
* Slides: andreas-zeller.info/assets/Shoul...

1 0 0 0

2 months ago

IPN Colloquium 6: Should AI Coders Experiment More? – ICT Research Platform Netherlands

Today at 16:00 CET, I'll give a vision talk "Should AI Coders Experiment More?", paving the way to AI “super coders” that may become way more competent than the most experienced programmers - and also way more competent than any LLM-based coders. Details here: ict-research.nl/2025/11/ipn-...

3 0 2 0

3 months ago

Reviewer-Author Collusion Rings and How to Fight Them In 2012, I attended a physical meeting of the program committee responsible for selecting the best scientific papers for the ESEC/FSE 2013 conference in Saint Petersburg, Russia. This meeting was part...

Time to get serious again. New blog post "Reviewer-Author Collusion Rings and How to Fight Them": andreas-zeller.info/2025/12/07/R...

11 4 0 0

3 months ago

Oops - Of course, Helmut Kohl was chancellor until *1998*, not 1988. Apologies!

0 0 0 0

3 months ago

A description of the items shown in the LaTeX Korrektor background:

* Diomidis Spinellis , Author of  “Advice for writing LaTeX documents”
* LaTeX 2ε  Cheat Sheet
* Helmut Kohl , German Chancellor  1982–1988
* A fictitious  event poster "Lack Leder LaTeX, Hamburg"
* A fictitious  LaTeX  propaganda  poster

Bonus material for The LaTeX Korrektor! Some of you asked: "What are these photos and posters in the background?" Here they come, enlarged and with some details. Enjoy! #LaTeX #LaTeXKorrektor

In case you missed it, watch all six episodes of the LaTeX Korrektor here: www.youtube.com/watch?v=EhsM...

6 2 1 0

3 months ago

The LaTeX Korrektor 6/6 - Ten Commandments YouTube video by Andreas Zeller

Series finale! The LaTeX Korrektor 6/6 - Ten Commandments www.youtube.com/shorts/HAodi... #LaTeX #LaTeXKorrektor

Read the LaTeX advice by Diomidis Spinellis (@coolsweng.bsky.social): github.com/dspinellis/l...

All six episodes of the LaTeX Korrektor: www.youtube.com/watch?v=EhsM...

5 1 0 0

3 months ago

The LaTeX Korrektor 5/6 - Citations YouTube video by Andreas Zeller

Why, oh why does your bibliography have all titles in lowercase? WHY? The LaTeX Korrektor 5/6 - Citations: www.youtube.com/shorts/0nk72... #LaTeX #LaTeXKorrektor

Missed previous episodes? This playlist has them all: www.youtube.com/watch?v=EhsM...

5 3 1 0

3 months ago

Ah, so it was you who rejected my paper!? 🤔

0 0 1 0