Week end fun: snooping inter-process messaging (ksbasend) in Oracle with bpftrace. π€
t.ly/_yDy7
@christophlutz.bsky.social
My drinking club has a skydiving problem
Week end fun: snooping inter-process messaging (ksbasend) in Oracle with bpftrace. π€
t.ly/_yDy7
This, so much! π
16.10.2025 14:17 β π 3 π 0 π¬ 0 π 0When you plan to geek out over some oracle internals, but end up ftraceβing bpf the entire week end to chase a funny bug that only occurs on exadata with capacity on demand ...
05.10.2025 17:30 β π 1 π 0 π¬ 0 π 0There's a problem on the oracle-l listserver at present about an insert taking far too much time (and CPU). It's a known issue and there are 47 statistics in v$sysstat (19.11) with names like 'ASSM%' to help diagnose it.
How many do you think are described in the database reference manual?
None.
Interesting little detail: every loop iteration calls the "pause" instruction, providing a hint to the cpu that the code is in a spin-wait loop. This allows the cpu to improve spin-loop efficiency and power consumption.
Observed on 19.26, running on Exadata X10, version 25.1.7.
Yet another adaptive lgwr optimization: on Exadata X10+, pipelined log writes may defer redo writes until a suitably sized write batch has accumulated in the log buffer.
The deferral can involve spinning in a tight loop up to 25 times (maximum hard-coded in kcrfw_defer_write).
Nested loops, baby π
18.09.2025 08:40 β π 1 π 0 π¬ 0 π 0SOUGDay 2025 ZΓΌrich - Anmeldung unter soug.ch mΓΆglich
In den nΓ€chsten Tagen verΓΆffentlichen wir nicht nur die Agenda, sondern am Tag nach dem SOUG Day planen wir noch ein Special fΓΌr Euch! Schaut rein und meldet Euch an unter soug.ch.
10.09.2025 11:26 β π 3 π 3 π¬ 0 π 0So glad that all new features are documented so well... NOT π
Manually enabling and disabling adaptive lgwr evaluation trace for pipelined / overlapped redo writes:
POUG journey startedβ¦ not even at the airport and lufthansaβs delay notification leaves no hope of making the connecting flight in MUC π
03.09.2025 06:50 β π 3 π 0 π¬ 0 π 0It's still there, but it seems unused, meaning there are no direct calls to it from other functions (not sure about indirect calls, but I doubt).
Reason for checking was the updexe code path (in version 23.6), where errors are now signalled by kseseclv when interesting things happen. π
So 23ai replaces ksesecl0(func, loc, err) with kseseclv(err, func, loc, ...) ... Why is it always just a few days before POUG that this kind of low-level discoveries surface? π
31.08.2025 12:18 β π 3 π 0 π¬ 2 π 0Wouldnβt a bottle opener be more appropriate for POUG? π
31.08.2025 07:18 β π 2 π 0 π¬ 1 π 0A new tool in 0x.tools family:
xtop - Top for Wall-Clock Time. It uses eBPF/xcapture v3 and gives you "x-ray vision" into Linux system activity.
It will be available on next Tuesday 19 Aug at 1pm EDT when I also run a live demo webinar!
tanelpoder.com/posts/xtop-t...
"Slide n of 142".. this is getting out of control... π
12.08.2025 19:57 β π 1 π 0 π¬ 0 π 0Very nice! We're planning to do that as well in some environments. What were the numbers before the change?
12.08.2025 19:34 β π 0 π 0 π¬ 1 π 0How many nodes does it have now?
07.08.2025 19:11 β π 0 π 0 π¬ 1 π 0Hell yeah... but then Oracle gives me more, and I get even paid for it π
06.08.2025 17:09 β π 3 π 0 π¬ 0 π 0Today's discovery (19.26): Oracle derives different log parallelism defaults, depending on platform. Max number of public redo strands is:
Exadata:
CPU_COUNT <= 256: 16
CPU_COUNT> 256: CPU_COUNT/16
Non-Exadata:
CPU_COUNT <= 32: 2
CPU_COUNT > 32: CPU_COUNT/16
Max limit: 256
Enough adrenaline for the day?
06.08.2025 12:51 β π 1 π 0 π¬ 2 π 0Things are confusing, in many places timing information comes from sltrgftime64, which on Linux is a wrapper around clock_gettime with ns resolution, but the results get normalized to us.
I guess not all platforms had high-resolution timers and that's part of the reason why things are a bit messy.
Yeah, I can imagineβ¦ but that click moment when it finally makes sense? Pure magic, no? π
05.08.2025 20:58 β π 1 π 0 π¬ 0 π 0I spent hours looking for milliseconds π
05.08.2025 20:40 β π 0 π 0 π¬ 1 π 0Oh my, looks like the "ms" in "kso_sched_delay_avg_ms" actually means "microseconds" ... π€·ββοΈ
05.08.2025 19:46 β π 3 π 0 π¬ 1 π 06/6
As always, bpftrace is very useful for observing and studying undocumented behavior:
Trace write info array updates (LGWR/LGnn): t.ly/VV--a
Trace write info array scans (FG): t.ly/67R-J
5/6
The "redo synch time overhead" is used to calculate the "redo synch long waits" statistic, a key metric the Adaptive Log File Sync (ALFS) mechanism uses to decide whether to perform a mode switch from post/wait to polling (more on that another day ...)
4/6
Concurrent updates to the write info array are protected by the "log write info" latch, which LGWR and LG workers acquire in no-wait mode. If they fail to acquire the latch, they skip the write info array update and proceed without recording the redo write completion time.