I'd love yo heqr about your solutions ๐
13.02.2025 19:34 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0@cxx.dev.bsky.social
C++ developer specializing in source and binary program analysis and transformation.
I'd love yo heqr about your solutions ๐
13.02.2025 19:34 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0I'm delighted to announce that in the new year I'll be joining @hex-rays.bsky.social as a C++ developer! IDA Pro and the Hex Rays decompiler are indispensable tools for reverse engineers -- I can't wait to work on these products and join another top notch engineering team.
20.12.2024 19:39 โ ๐ 12 ๐ 0 ๐ฌ 0 ๐ 0How do you see the balance of value brought to the table between new models just being better vs. whatever smarts are encoded in the harnessing of those models?
07.12.2024 06:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0๐
07.12.2024 06:24 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0๐
06.12.2024 18:50 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 016/16 GRR and microx were my first major contributions at Trail of Bits, and represented a continuity of my DBT research from my M.Sc from the University of Toronto. They were super fun projects to create and work on, and I'm extremely proud of both of them.
04.12.2024 17:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 015/16 What was humbling was that UDB itself was a record/replay x86-64 dynamic binary translator (DBT). So while I was toiling away trying to get my DBT working for DECREE, I was relying on their much more general system to debug mine!
04.12.2024 17:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 014/16 As you can imagine, debugging a dynamic binary translator can be tricky; when things go wrong, your debugger isn't as helpful because there's no debug information for just-in-time translated code. UndoDB's time-travelling debugger, UDB, was a productivity multiplier (undo.io/products/udb).
04.12.2024 17:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 013/16 At the time, the Unicorn engine didn't provide fine-grained information about instruction dependencies, and it was very crashy. Our attempts to use it had us concretizing any symbolic bytes in big swaths of the stack, artificially limiting the futures that the symbolic executor could explore.
04.12.2024 17:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 012/16 Fun segue: our pysymemu fork used microx (github.com/lifting-bits...), my fourth binary translator, to *natively* execute instructions that didn't have symbolic python models. Microx allowed us to minimize how much symbolic state had to be concretized when executing instructions natively.
04.12.2024 17:28 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 011/16 GRR's snapshots could also be shared with a custom CGC-specific fork of pysymemu (github.com/feliam/pysym...). Fun fact: pysymemu evolved into the Manticore symbolic executor (github.com/trailofbits/...). This sharing allowed the fuzzer and symbolic execution components to "blindly" cooperate.
04.12.2024 17:28 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 010/16 Another cool thing was that GRR was deterministic and could produce and resume from program snapshots. The original motivation of this feature was to skip to the first read(2) system call, avoiding deterministic program setup costs.
04.12.2024 17:28 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 09/16 GRR was a fairly effective fuzzer, but the fuzzer logic wasn't nearly as smart as its contemporaries such as AFL. Where the GRR fuzzer was good was that it could operate on the whole input or individual system calls, doing things like repeating or swapping inputs at a finer granularity.
04.12.2024 17:28 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 08/16 Faithfully emulating DECREE meant doing a lot of weird testing. One fun discovery was that write(2) will avoid returning an EFAULT as long as a minimum number of bytes have been read (github.com/lifting-bits...).
04.12.2024 17:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 07/16 DECREE, as released by DARPA, was implemented a Linux kernel fork that loaded CGC binaries (really: slightly tweaked ELFs) that used a custom system call personality table that restricted loaded binaries to just their few system calls.
04.12.2024 17:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 06/16 To get Radamsa to work as a function meant compiling the Scheme to C using the OWL Lisp compiler, then patching that horrible output so that I could track its memory allocations and network calls, and invoke its main function as though it were any other normal function in a program.
04.12.2024 17:26 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 05/16 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.
04.12.2024 17:26 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 04/16 One cool thing is that GRR could handle self-modifying DECREE binaries, which made opening the code cache in IDA Pro or Binary Ninja fun, because you could browse the evolution of those code modifications.
04.12.2024 17:26 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 03/16 GRR translated x86 into x86-64, so that one or more DECREE binaries in 4 GiB (32 bit) memory spaces within its own much larget 64 bit address space. Translated code could be instrumented for code coverage, and cached to disk to amortize translation costs across GRR runs.
04.12.2024 17:26 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 02/16 DECREE programs are basically simplified 32-bit, x86 Linux programs -- they can use only six or so system calls. GRR's used dynamic binary translation, a just-in-time translation technique that rewrote the target program machine code while it was running!
04.12.2024 17:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 01/16 One of my first creations at Trail of Bits was GRR (github.com/lifting-bits...), an all-in-one emulator and fuzzer for programs running on the DECREE operating system used in DARPA's Cyber Grand Challenge.
04.12.2024 17:25 โ ๐ 11 ๐ 1 ๐ฌ 1 ๐ 05/15 Another fun thing about GRR was that Radamsa (gitlab.com/akihe/radamsa) was embedded as a callable function. If you're familiar with Radamsa then this may come as a surprise -- Radamsa is written in a Scheme dialect, and normally sends inputs over the network.
04.12.2024 17:23 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 015/15 Thanks (and sorry!) to the team of people who helped/suffered along the way! Also thanks to DARPA for funding this research through Sergey Bratus' Assured Micro-Patching (AMP) program.
03.12.2024 19:30 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 014/15 In summary, Dr. Lojekyll was one of the most fun projects I created at Trail of Bits. It was also the most brutal to debug. I learned that debugging declarative languages is hard, and debugging in-progress/broken compilers for declarative languages is harder.
03.12.2024 19:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 013/15 I always saw automated database factorization and nesting as the ultimate solution to the intermediate relation explosion problem, but we never had the time to address it, and Dr. Lojekyll's codebase was not flexible enough to make experimental extensions easy.
03.12.2024 19:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 012/15 Using micro-databases on both sides of a client/server architecture ended up being fun: the server could do the heavyweight computations, then keep a thin client up-to-date with its differentials. Micro-databases could also be used inside stateful functors, allowing for database nesting.
03.12.2024 19:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 011/15 Micro-databases were originally motivated to help a human solve Dr. Lojekyll's intermediate relation explosion problem. In Dr. Lojekyll, the need to do top-down execution meant a subset of intermediate relations (along with the named ones) had to be saved. That caused a lot of redundancy.
03.12.2024 19:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 010/15 Another learning was micro-databases -- my codegen produced C++/Python classes, after all. I could separate out a small part of my Datalog program, compile it to a C++/Python class, and instantiate and destroy those on-demand.
03.12.2024 19:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 09/15 There were other learnings for me on this project, like how typical compiler optimizations like common subexpression elimination can't be just be applied to same-shaped dataflow system operators, because operations over values and sets of values have different semantics!
03.12.2024 19:29 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 08/15 This required codegen to produce traditional bottom-up, fixpoint-style procedural code, as well as top-down "double checking" code. The same IR was used to represent both cases, allowing me to retarget codegen to languages like C++ and Python. See slide 41: www.petergoodman.me/docs/dr-loje...
03.12.2024 19:28 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0