But it goes to show that you can’t *only* rely on source code…looks can be deceiving!
Here’s the paper if you want to give it a read:
arxiv.org/pdf/2111.00169
But it goes to show that you can’t *only* rely on source code…looks can be deceiving!
Here’s the paper if you want to give it a read:
arxiv.org/pdf/2111.00169
Java, interestingly, rejected the researcher’s suggested modifications, suggesting it’s a code editor issue, not a language one.
Of course, for a reverse engineer (like myself), it would be very trivial to detect this behavior with any decompiler, the logic is pretty transparent.
Thankfully, GCC detects Bidi misuse by default now (-Wbidi-chars).
GitHub will also display a warning banner if it detects bidi characters, and visual studio code shows the invisible chars prominently.
Once the attack vector was proven, the researchers immediately started searching for evidence of abuse.
Out of ~1 Billion commits on Github in 2021…7,444 Bidi control characters were found.
98.8% were false positives.
Amazingly, no backdoors, just some obfuscators.
The abuse starts with Bidi control characters.
It’s an invisible override to switch character grouping.
If you hide them in comments or string literals…the compiler doesn’t complain.
You can squeeze in some very sneaky stuff.
Read an amazingly clever paper today I have to share:
Imagine sourcecode that *looks* correct…but compiles to different logic!
Unicode has to support left-to-right, and right-to-left languages.
Visual order and logical order can be completely different!
there’s an interesting little quirk that goes along with that…
auto-gcas for example, is purposefully limited to 5G, because in testing, more aggressive maneuvers decreased pilot “trust”, even if it was arguably technically superior.
Here's kind of a fun public paper about the subject of ML-assisted missile evasion in general:
arxiv.org/pdf/2511.05828
That test aircraft is structurally limited to just 6Gs, but it’s interesting to think how these models would be applied to other manned/unmanned aircraft.
It’s an unusual ethical software problem.
Do you purposefully force a human unconscious…in an attempt to save their life?
Last year, Lockheed demonstrated “Have Remy”.
Yes, Remy is a direct Ratatouille joke.
Apparently, they ran “billions of simulated engagements” on a GPU cluster to refine behavior, then flew it on the real X-62 VISTA.
Auto-GCAS is somewhat the precursor.
Implemented on the F-16 since 2014, it will automatically pull a ~5G recovery if you’re about to hit the ground.
Famously saved a student’s life (see video), after they blacked out into a full afterburner vertical dive!
Ignoring other countermeasures, research consistently shows the optimal evasion is a two part “bang-bang” structure.
A long, sustained G barrel roll to deplete missile energy, followed by a last-second maximum effort reversal.
Optimal timing is…milliseconds.
I’ve always wondered; could a software algorithm (ML or otherwise) evade an incoming missile better than a human pilot?
Perhaps even at the expense of a blackout.
*You* might be able to only handle 9G…but what if the airframe can take 12?
…it (sorta) exists.
here's my full breakdown / reverse engineering of some of the last contest's winners (go to ioccc . org to submit an entry yourself!)
www.youtube.com/watch?v=by53...
There’s just about ~10 days left to make a submission to one of my favorite programming contests:
The International Obfuscated C Code Contest!
Highly encourage you to take a peek and enter, it really brings out some of the best programmers (and compiler wizards).
hmm, interesting, but it does seem like a bit of funny marketing ha.
L0 seems to be a renamed, more efficient L1
The logic was essentially, hey system fonts are pretty good now…why not just default to what’s native?
Apple get’s apple fonts. Windows get’s windows fonts.
There’s a great blogpost from Mark Otto, GitHub’s director of design about the switch:
markdotto.com/blog/github-...
Hence, the early web was very…Times New Roman-y.
Github was arguably one of the first major players to go *against* the custom font / FOUT hell of the mid 2010s.
In mid 2017, they essentially re-adopted the 90s method of using direct system fonts!
FOUTs were essentially unheard of in the 90s.
The entire world basically defaulted to web-safe fonts.
In the rare instance someone got fancy with something non-standard, the browser would just fallback to a default.
This didn’t really change until 2010!
First, ignore JavaScript for a second.
Even with plain HTML+CSS, it’s quite common to get FOUTs these days.
FOUT = Flash of Unstyled Text.
AKA temporarily load a system-native font, then when the custom font finally rolls in, "snap" to the new font.
Websites today load wildly differently than in the 90s.
Arguably, worse.
The HTML spec was designed to be read sequentially, so text used to stream in, then display instantaneously. Basically, read -> paint.
A lot of today’s modern weirdness comes from…fonts.
From AMD, more on the performance side:
“Improving the Utilization of Micro-operation Caches in x86 Processors”
The other is more security angle + some interesting timing attacks:
“UC-Check: Characterizing Micro-operation Caches in x86 Processors and Implications in Security and Performance”
The smaller pieces are thus able to fit entirely in the uOP cache, avoiding thrashing the decoder constantly.
There are quite a few papers on the subject, but these two give a really nice overview:
99% of programmers shouldn’t care; but those who squeeze the absolute maximum last bit of performance out of x86 pay attention.
Loop Fission is an interesting technique, where you spit up a complex loop into multiple smaller sequential ones.
x86 “looks” CISC, but all of the engine is RISC underneath.
You don’t *want* to wake up the decoder if you don’t have to. It wastes about ~6 cycles + extra power.
Usually, the compiler aligns everything for you...as long as your loop is small enough.
There is one problem though.
You can’t see it.
Well, not directly at least. You’ll never find uOPs in the binary.
But! You can see the “shape” of it with performance tools…and there are subtle tells in the binary as well (hint, some nops).
Most programmers are taught that L1 is the “top level” cache on x86.
It’s not quite true anymore!
Intel calls it the Decoded Stream Buffer (DSB), AMD the OpCache.
Only enough room for ~4,000 micro-ops, but there are interesting ways to take advantage of it.
hahaha
25.02.2026 22:13 — 👍 1 🔁 0 💬 0 📌 0
(side note: most rand() implementations moved on to other LCGs, or mersenne twisters and such…but it’s arguable that 16807 is still quite ubiquitous!)
Original Paper if you’d like to read:
dl.acm.org/doi/10.1145/...
It’s kind of funny that so few listened. FreeBSD was still using 16807 in rand() all the way until 2021!
So if you ever see that constant in disassembled code…now you know :)