So if you are currently involved with ISA-level decisions about inclusion of any pext/pdep-like instructions:
Please consider including SAG/inverse-SAG with bit-reversal of the goats.
No matter which of the two implementation methods you are using: All you need to do is not mask the goat bits.
25.07.2025 23:30 β π 3 π 3 π¬ 0 π 0
It looks like the patent expires at the end of 2028. The earliest I could see a RVI extension ratified at this point is 2027, so it's definitely worth evaluating.
Also, the new diagrams are cool.
24.07.2025 18:03 β π 1 π 0 π¬ 1 π 0
I've only watched the last hour this far, but I quite liked your take on null-terminated strings.
C really has to be understood with its history in mind.
20.07.2025 20:12 β π 0 π 0 π¬ 0 π 0
YouTube video by Ventana Micro
Ventanaβs Second Gen RISC V Processor for Data Center and Other High Performance | Greg Favor
www.youtube.com/watch?v=OPgj...
11.07.2025 21:00 β π 1 π 0 π¬ 0 π 0
Their V2 slides say, that they have a macro-op cache equivalent in size to a regular 32 KiB icache.
It can store variable length entries of up to 48 macro ops, which can be fuses from non-sequential instruction runs by collapsing taken branches.
11.07.2025 20:59 β π 0 π 0 π¬ 1 π 0
RWT Forums - Real World Tech
content overridden
TIL about Trace Cache: www.realworldtech.com/forum/?threa... (thread on Apples Trace Cache)
Ventanas Veyron V2/V3 seem to also use something like a trace cache.
11.07.2025 20:59 β π 0 π 0 π¬ 1 π 0
YouTube video by Rami Sheikh
CBP2025 - Opening Remarks - Rami Sheikh
Ohh, the talk recordings are on YouTube: www.youtube.com/watch?v=1lwz...
28.06.2025 09:22 β π 1 π 0 π¬ 0 π 0
The sixth Championship of Branch Prediction (CBP2025) happened a week ago: ericrotenberg.wordpress.ncsu.edu/cbp2025-work...
28.06.2025 06:36 β π 5 π 2 π¬ 1 π 0
>>> lut=np.array([ord('a'),0,ord('e'),0,ord('i'),0,0,ord('o'),0,0,ord('u'),0,0,0,0,0], dtype=np.uint8)
>>> inp=np.frombuffer(b"test128aeiou72761xjs",dtype=np.uint8)
>>> lut[(inp&31)>>1]==inp
16.06.2025 14:19 β π 0 π 0 π¬ 0 π 0
4x 16-bit: 120 u^2 63% utilized, 5GHz met (49 slack)
2x 32-bit: 120 u^2 65% utilized, 5GHz met (52 slack)
1x 64-bit: 153 u^2 64% utilized, 5GHz met (14 slack)
So subsetting on SEW really doesn't make much sense compared to a .vx subset.
12.06.2025 11:46 β π 1 π 0 π¬ 1 π 0
I got OpenROAD working and tested the bfly part of your implementation (so without decode) in a SIMD setup.
asap7, targeting 5GHz, 75% placement density and 50% utilization:
12.06.2025 11:46 β π 1 π 0 π¬ 1 π 0
RVV benchmark SiFive X280
SiFive X280 RVV benchmarks: camel-cdr.github.io/rvv-bench-re...
Civil was so nice run my RVV benchmark on the SiFive X280 cores on the Tenstorrent Blackhole.
06.06.2025 22:57 β π 0 π 0 π¬ 0 π 0
I just had this problem on RISC-V where I didn't clobber the vector registers and some autovectorized surrounding code broke on a newer kenel version.
06.06.2025 18:31 β π 0 π 0 π¬ 0 π 0
TIL you can't do forward compatible syscalls with inline assembly because the kernel can decide to clobber architectural state that was added after you wrote the code.
If you use svc with inline assembly, you have to explicitly clobber SVE registers.
Good luck doing this back in 2015 when you wrote
06.06.2025 18:31 β π 0 π 0 π¬ 1 π 0
I suppose, the instruction encoding space has to be considered as well.
06.06.2025 12:48 β π 0 π 0 π¬ 1 π 0
Ah, I understand my mistake now.
My mental model had the element order between the stages as fixed, which is why I didn't see the equivalence of the graphs.
06.06.2025 12:44 β π 1 π 0 π¬ 0 π 0
Guess I'll step up: github.com/camel-cdr/bf...
And, yes, I wasn't the first person to write an optimizing brainfuck interpreter in the c preprocessing, that honor goes to kotha.
06.06.2025 11:26 β π 1 π 0 π¬ 0 π 0
Also, I think there should be a compress_right_flip and compress_right instruction, because most cases would want the mask and it's basically free to add in hardware.
06.06.2025 11:08 β π 0 π 0 π¬ 1 π 0
I (not a hardware person) would expect a regular ibfly to have 2+4+8 parallel wires, while this approach has 10+10+10. (both +4 for every layer, if you include the control signals)
For 8-bit this is quite tame, but for 64-bit this may be a significant factor.
06.06.2025 11:04 β π 0 π 0 π¬ 1 π 0
Ah, I think I understand this better now.
The unshuffle approach is quite neat, but
isn't it a lot worse in terms of wire crossings/area/delay for a single cycle implementation?
06.06.2025 11:03 β π 0 π 0 π¬ 2 π 0
Bit permutations
An essay about bit permutations in software
I thought a butterfly network can't do SAG in one pass: programming.sirrida.de/bit_perm.htm...
> [β¦] sheep-and-goats operation generally cannot be performed on a butterfly network; [β¦] a variant thereof can which gathers the [β¦] bits on one end in order, but also the remaining ones mirrored [β¦]
05.06.2025 22:06 β π 0 π 0 π¬ 1 π 0
I think something like vrgather128ei4.vx would be useful.
It would do a 16-byte shuffle in all 128-bit lanes controlled by the nibbles in the GPR.
This could use the same shuffle hardware as vrgather, but scales better with LMUL/large vlen and saves a vector register allocation.
04.06.2025 05:31 β π 0 π 0 π¬ 0 π 0
So like xperm4? That would require new, presumably expensive, bit shuffle hardware.
With vbmatflip+vrgather+vbmatflip you could shuffle bits in bytes and using vrgather+vbmatflip+vrgather+vbmatflip+vrgather gives you bits in 16b/32b/64b/... permute (the limit depends on VLEN).
04.06.2025 05:28 β π 0 π 0 π¬ 1 π 0
On that topic. I think a 16-bit bit permute would be extremly cheap to implement.
You just chain the bfly and ibfly networks and need 2x 4x8-bit = 64 control bits, which fit in a GPR.
Something like vb16perm.vx vrd, vs, rs.
I don't see any chances getting that through RVI though, to impl specific
03.06.2025 16:40 β π 0 π 0 π¬ 2 π 0
Looking forward to it.
03.06.2025 16:36 β π 0 π 0 π¬ 0 π 0
SAG would be nice, if it's cheap with the pext/pdep hardware, but I'm not sure I follow the "not mask the unselected bits".
Wouldn't you run the bfly and ibfly in parallel and combine the results, which would require a new shifter.
Or implement compress-flip, which needs a partial bit reverse.
03.06.2025 16:36 β π 0 π 0 π¬ 2 π 0
Sidenote: My pseudocode for the LEB128 decoder using RVV pext/pdep instructions isn't completely correct.
I'll revisit it properly, with spike/qemu implementation, once I finish my project.
03.06.2025 16:22 β π 0 π 0 π¬ 0 π 0
Lastly, do you know any other bitmanip instructions that were considered and may be useful for RVV?
This also goes to everybody else reading; I'd love to hear interesting SIMD instruction suggestions.
03.06.2025 16:18 β π 0 π 0 π¬ 1 π 0
This would mean implementations of that subset would only have to decode one mask to control signals; it might even be possible to do the prefix sum on the viota.m hardware.
Is implementing bmatflip via chaining the bfly and ibfly network uses in the pdep/pext implementation reasonable?
03.06.2025 16:17 β π 1 π 0 π¬ 1 π 0
C programming, creativity and game development.
https://www.twitch.tv/shaun_vids
https://www.youtube.com/@Shaun_vids
https://www.patreon.com/shaunfromyoutube
the only good anime pfp acct Β· http://gpfault.net Β· http://github.com/nicebyte Β· http://nice.graphics
Gfx coder and chip designer. He/him/3Dlabs/Muckyfoot/RAD/Valve/Oculus/Intel/Rec Room.
mastodon.gamedev.place/web/@TomF
(twitter.com/tom_forsyth if you're nasty)
unmappable territory
please leave a message after the beep
Official account of the C Programming Language, invented by Immanuel Kant in 1799 at Bell Labs KΓΆnigsberg
https://msyksphinz.hatenablog.com/
Neurodiverse Trans Geek Girl π§ββοΈ
Queer Kinky Poly Mess π³οΈβπ π³οΈββ§οΈ
CTO @YosysHQ πΊ RISC-V, SMT π©βπ»
Opinions are my Ceti eel's π
ACAB BLM β I am Antifa π΄π©
Vienna, Austria π she/her π§ββοΈ
Programming: computerenhance.com
Comics: meowtheinfinite.com
Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.
---
Independent tech analyst focused on semiconductors, patent analysis and emerging technologies.
π³οΈβπ ally, BLM, gamedev (Thief, Promesst), Indie Game Jam cofounder, popularized C header-file-only libs w/stb. he/him
Larry, I'm on DuckTales.
https://nothings.org
Open Source Sex Toy Control Software.
Supports over 700 different devices!
posts by @buttplug.engineer
Give money: https://patreon.com/qdot
No really give money: https://github.com/sponsors/qdot
banner: @doe.gay
hall of fame for tags that express the *struggles*
+18