Police have deployed an armoured vehicle in Hong Kong's commercial heart, amidst an ongoing heavy security presence on the 36th anniversary of the Tiananmen Square crackdown. In full: buff.ly/f4hVB50
04.06.2025 10:30 โ ๐ 14 ๐ 9 ๐ฌ 0 ๐ 3@lukel97.bsky.social
LLVM at Igalia
Police have deployed an armoured vehicle in Hong Kong's commercial heart, amidst an ongoing heavy security presence on the 36th anniversary of the Tiananmen Square crackdown. In full: buff.ly/f4hVB50
04.06.2025 10:30 โ ๐ 14 ๐ 9 ๐ฌ 0 ๐ 3Picture of a presenter showing a slide that details outcomes of RISE funded RISC-V software ecosystem projects.
I'm delighted to see two of @igalia.com's projects for RISE highlighted at the RISC-V Summit Europe.
Find out more about our work on both LLVM optimisation and testing/CI on the RISE blog (with more to come in the future!):
riseproject.dev/2025/05/08/p...
riseproject.dev/2024/10/15/w...
@camel-cdr.bsky.social rvv-bench is used here!
18.04.2025 10:33 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 0We're looking forward to EuroLLVM next week in Berlin. Be sure to check out talks from my colleague @lukel97.bsky.social and myself on:
* Work to further improve RISC-V vector codegen (extending the VL Optimizer), and
* Work done with the support of RISE to improve RISC-V LLVM testing.
What if I told you 3DNow! square root recรญprocals are defined for negative numbers?... Also the amazing FEX 2503 is out. Read about some of my work and the work of other FEX maintainers' in the release notes: fex-emu.com/FEX-2503/ #fex #igalia #gaming #linux #arm64
06.03.2025 15:50 โ ๐ 4 ๐ 2 ๐ฌ 1 ๐ 0Some notes on ccache+LLVM. Summary: if you do a lot of builds across different checkouts/worktrees/builddirs, be sure to set the base_dir option and -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON muxup.com/2025q1/ccach...
27.02.2025 18:39 โ ๐ 9 ๐ 4 ๐ฌ 0 ๐ 0Hello you fine Internet folks,
Today's article is on SiFive's P550 microarchitecture. The P550 core is one of the fastest RISC-V cores available currently and is claimed to be comparable to ARM's Cortex A75.
Hope y'all enjoy!
old.chipsandcheese.com/2025/01/26/i...
open.substack.com/pub/chipsand...
New blog post covering the mysterious 10ms startup regression of Node.js on macOS, the journey of investigating the issue with various performance tools, and figuring out the fix (which also helped making the binary smaller).
joyeecheung.github.io/blog/2025/01...
A Simple ELF 4zm.org/2024/12/25/a...
27.12.2024 11:18 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0After two months of chasing, finally found out what's happening behind this mysterious startup time regression on macOS from Node.js v20.x - it's missing -fvisibility=hidden ๐ (I guess that's what happens when the build configs become dusty enough) github.com/nodejs/node/...
16.12.2024 21:55 โ ๐ 59 ๐ 8 ๐ฌ 3 ๐ 2Recently I came across this treatise by Stephen Dolan
github.com/ocaml/ocaml/...
256 loads, since itโs an LMUL 8 load with VLEN=256! Iโm not sure how it compares to the scalar equivalent, but my guess is that the vlse8.v is loading one element at a time under the hood
11.12.2024 11:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0A screenshot of a terminal: luke@bananapif3:~/slowest-instr$ cat main.S .section .rodata str: .asciz "Cycles: %d\n" foo: .zero 256 * STRIDE .section .text .global main main: addi sp, sp, -8 sd ra, 0(sp) rdcycle s1 rdcycle s2 sub s3, s2, s1 # rdcycle overhead la a0, foo li a1, STRIDE vsetvli t0, zero, e8, m8, tu, mu rdcycle s1 vlse8.v v8, (a0), a1 rdcycle s2 sub s1, s2, s1 sub s1, s1, s3 la a0, str mv a1, s1 call printf ld ra, 0(sp) addi sp, sp, 8 ret luke@bananapif3:~/slowest-instr$ clang main.S -DSTRIDE=65536 -march=rv64gv luke@bananapif3:~/slowest-instr$ perf stat -e cycles:u ./a.out Cycles: 66640979 Performance counter stats for './a.out': 78,064,581 cycles:u 0.049648957 seconds time elapsed 0.000000000 seconds user 0.049907000 seconds sys
Trying to find the slowest possible RISC-V instruction. This single vlse8.v with a stride of 65536 bytes takes 66 million cycles on a Banana Pi F3. That's 0.04 seconds @1.6GHz
#risc-v
The maximum possible vl is 2^16 I think, so that would fit in XLEN=32?
06.12.2024 16:28 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0With that said I forgot how confusing the V extension hierarchy can be. After thinking about about EEW=64 on XLEN=32 I think I need to go lie down a bit ๐ตโ๐ซ
06.12.2024 16:21 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Otherwise EEW=64 is supported as usual, since thereโs also this bit at the bottom:
> The V extension requires the scalar processor implements the F and D extensions
Is it this bit here?
> The V extension supports all vector load and store instructions (Section Vector Loads and Stores), except the V extension
does not support EEW=64 for index values when XLEN=32.
Iโm interpreting that as index values I.e only indices passed to vluxei64.v and friends
Are you talking about zve32x? That doesnโt include any fp support, but zve32f should mandate f and zve64f should mandate d I think
06.12.2024 04:48 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0'RVV mask tricks' # broadcast nth bit vmand.mm v8, in, mNth vcpop.m t0, v8 sub t0, x0, t0 vmv.v.x v8, t0 # prefix xor viota.m v8, in vand.vi v8, v8, 1 vmsne.vi v8, v8, 0 vmor.mm v0, v8, in # can often be omitted # move nth bit to first vmand.mm v8, in, mNth vcpop.m t0, v8 vmv.v.x v8, t0 vmsof.m v0, v8 # move mask to GPR vmv.x.s t0, v0 # move GPR to mask vmv.s.x v0, t0 # assuming vl<=64, set SEW=64 before # these two should really be dedicated instructions # shift mask up by 1 vslide1up.vx v8, in, x0 vsrl.vi v8, v8, 7 vmadd.vx v0, 2, v8 # shift mask up by 1 vslide1down.vx v8, in, x0 vadd.vv v0, in, in vmacc.vx v0, 128, v8
Here are some slightly tricky RVV mask patterns.
03.12.2024 21:37 โ ๐ 7 ๐ 3 ๐ฌ 1 ๐ 0Even better is being able to measure the numbers yourself without the need for vendor tables. RISC-V support for llvm-exegesis is landing soon IIUC, with RVV not too far behind either.
03.12.2024 03:02 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0The RVV Agner Fog is camel-cdr.github.io/rvv-bench-re..., itโs an incredibly useful resource. We use it all the time for LLVM!
03.12.2024 00:52 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0