Claire Xen  ๐Ÿณ๏ธโ€โšง๏ธ ๐Ÿง™๐Ÿปโ€โ™€๏ธ ๐Ÿ’–๐Ÿ’›๐Ÿ’™'s Avatar

Claire Xen ๐Ÿณ๏ธโ€โšง๏ธ ๐Ÿง™๐Ÿปโ€โ™€๏ธ ๐Ÿ’–๐Ÿ’›๐Ÿ’™

@clairexen.bsky.social

Neurodiverse Trans Geek Girl ๐Ÿง™โ€โ™€๏ธ Queer Kinky Poly Mess ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿณ๏ธโ€โšง๏ธ CTO @YosysHQ ๐Ÿ˜บ RISC-V, SMT ๐Ÿ‘ฉโ€๐Ÿ’ป Opinions are my Ceti eel's ๐Ÿ˜› ACAB BLM โœŠ I am Antifa ๐Ÿด๐Ÿšฉ Vienna, Austria ๐Ÿ“Œ she/her ๐Ÿงšโ€โ™€๏ธ

810 Followers  |  102 Following  |  130 Posts  |  Joined: 21.10.2023  |  1.8114

Latest posts by clairexen.bsky.social on Bluesky

So if you are currently involved with ISA-level decisions about inclusion of any pext/pdep-like instructions:

Please consider including SAG/inverse-SAG with bit-reversal of the goats.

No matter which of the two implementation methods you are using: All you need to do is not mask the goat bits.

25.07.2025 23:30 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

So if you are currently involved with ISA-level decisions about inclusion of any pext/pdep-like instructions:

Please consider including SAG/inverse-SAG with bit-reversal of the goats.

No matter which of the two implementation methods you are using: All you need to do is not mask the goat bits.

25.07.2025 23:30 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

At some point that patent will expire, and until then there's my implementation.

(And I think my approach will still stay relevant after, because it makes it very simple to build multi-cycle SAG cores.)

25.07.2025 23:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think the really important takeaway is that they, my method and theirs, are both functionally the same, i.e. they both implement an SAG with bit-reversal of the goats.

This means, as far as the ISA is concerned, it's a safe decision to include an SAG instruction.

25.07.2025 23:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wordle 1,498 4/6*

๐ŸŸจโฌœโฌœโฌœโฌœ
โฌœ๐ŸŸฉ๐ŸŸจโฌœ๐ŸŸจ
๐ŸŸฉ๐ŸŸฉโฌœโฌœ๐ŸŸฉ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

... back to normal ^__^

25.07.2025 23:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Nah! No luck. ๐Ÿ™ˆ I was unsure which of the 4 options I could come up with to pick, and as it turned out, it was none of them... ๐Ÿ˜‚

But I'm still at 99% and that's all that really matters to me. (But tbh, it does matter to me way more than it probably should..)

25.07.2025 08:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wordle 1,497 X/6*

โฌœโฌœ๐ŸŸจ๐ŸŸฉโฌœ
โฌœ๐ŸŸฉโฌœ๐ŸŸฉ๐ŸŸฉ
โฌœ๐ŸŸฉโฌœ๐ŸŸฉ๐ŸŸฉ
โฌœ๐ŸŸฉโฌœ๐ŸŸฉ๐ŸŸฉ
โฌœ๐ŸŸฉโฌœ๐ŸŸฉ๐ŸŸฉ

... Wahhh! I have ~23 hours to decide on a last guess... /o\

24.07.2025 23:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Here is the code:
github.com/clairexen/ed...

I can't see any obvious reason why that identity should not extend beyond 8-bit units. But I have not actually tested that hypothesis yet.

24.07.2025 12:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I have to make a correction regarding โ˜๏ธ. I've now implemented the Hilewitz-Lee method as well in my edu-sag repository. And it implements the bit-reflecting-SAG as-is. All you have to do is to remove the '&ci' from the data input, thus it's always more area to implement PEXT than bit-reflecting-SAG.

24.07.2025 12:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Here is the code:
github.com/clairexen/ed...

I can't see any obvious reason why that identity should not extend beyond 8-bit units. But I have not actually tested that hypothesis yet.

24.07.2025 12:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I have to make a correction regarding โ˜๏ธ. I've now implemented the Hilewitz-Lee method as well in my edu-sag repository. And it implements the bit-reflecting-SAG as-is. All you have to do is to remove the '&ci' from the data input, thus it's always more area to implement PEXT than bit-reflecting-SAG.

24.07.2025 12:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wordle 1,483 4/6*

๐ŸŸฉโฌœ๐ŸŸฉโฌœโฌœ
๐ŸŸฉโฌœ๐ŸŸฉ๐ŸŸจ๐ŸŸฉ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉโฌœ๐ŸŸฉ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

11.07.2025 09:22 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Wordle 1,482 6/6*

โฌœโฌœโฌœโฌœโฌœ
โฌœโฌœโฌœโฌœโฌœ
โฌœโฌœ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ
โฌœ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ
โฌœ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

.. that was a bit unusual.
but I still got it in the end ^__^

09.07.2025 22:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Wordle 1,474 4/6*

โฌœ๐ŸŸจโฌœ๐ŸŸจโฌœ
๐ŸŸจ๐ŸŸจ๐ŸŸจโฌœโฌœ
๐ŸŸฉ๐ŸŸฉโฌœ๐ŸŸจโฌœ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

02.07.2025 05:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Wordle 1,473 3/6*

โฌœโฌœโฌœ๐ŸŸจโฌœ
๐ŸŸฉ๐ŸŸฉโฌœ๐ŸŸจ๐ŸŸจ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

01.07.2025 08:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Wordle 1,462 5/6*

โฌœโฌœ๐ŸŸฉโฌœโฌœ
โฌœโฌœ๐ŸŸฉโฌœ๐ŸŸฉ
โฌœโฌœ๐ŸŸฉ๐ŸŸจ๐ŸŸฉ
โฌœ๐ŸŸจ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

20.06.2025 17:19 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
edu-sag/param.v at main ยท clairexen/edu-sag Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP - clairexen/edu-sag

I wrote a reference implementation for a SAG without bit reflection: github.com/clairexen/ed..., and I wrote a parametric SAG core for any bit width: github.com/clairexen/ed...

20.06.2025 16:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
edu-sag/param.v at main ยท clairexen/edu-sag Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP - clairexen/edu-sag

I wrote a reference implementation for a SAG without bit reflection: github.com/clairexen/ed..., and I wrote a parametric SAG core for any bit width: github.com/clairexen/ed...

20.06.2025 16:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Wordle 1,460 3/6*

โฌœโฌœโฌœโฌœ๐ŸŸจ
๐ŸŸฉโฌœ๐ŸŸจ๐ŸŸจโฌœ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

... that one was fun

18.06.2025 17:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

To have trans friends is to hold candles in the wind.

I canโ€™t say if the storm is fate or man-made.

I only know Iโ€™m losing light.

09.06.2025 20:32 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Wordle 1,449 4/6*

๐ŸŸจโฌœ๐ŸŸจโฌœโฌœ
โฌœ๐ŸŸฉ๐ŸŸจโฌœ๐ŸŸจ
๐ŸŸฉ๐ŸŸฉ๐ŸŸจ๐ŸŸจโฌœ
๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ

07.06.2025 09:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

For R-type instructions there's not much pressure there...

06.06.2025 18:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The wire crossings are there either way. It's literally the same network graph, just organized differently in the HDL, so that it's easier to see how similar the three stages of circuit are.

06.06.2025 12:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

..cost of the mask operation. At least IMO.

Thought-Experiment:

Would you rather have

1) SAG plus MaskedSAG, or

2) SAG plus all 4 PACK Instructions?

I'd certainly pick 2) and for most architectures it's even cheaper in terms of hardware cost than 1)... ๐Ÿค”

06.06.2025 12:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

But isn't it also "basically free" in software? It's literally just one instruction. ;)

On a superscalar OOO machine I think it'd be unlikely that one is bottle-necked by the additional mask op.

For an instruction like this one, what really matters is the avoided cost of ~500 cycles, not the..

06.06.2025 12:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
GitHub - clairexen/edu-sag: Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP - clairexen/edu-sag

Yes, I mean the definition of SAG with the mirroring. The non-mirrored version is way more expensive (and IMO actually less desirable anyway).

I just wrote this:
github.com/clairexen/ed...

I've been meaning to write this code for quite some time...

06.06.2025 09:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

And those symmetries then can be exploited for things from re-use of completed layouts for functional units, to creating different multi-cycle versions of SAG HDL cores.

06.06.2025 09:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think it's better to use just the first stage of the (inverse) butterfly network and unshuffle, like I did it here, than to use an actual (inv) butterfly network, because this way it's much easier to understand all the symmetries within such an SAG HW implementation.

06.06.2025 09:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
GitHub - clairexen/edu-sag: Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP - clairexen/edu-sag

Yes, I mean the definition of SAG with the mirroring. The non-mirrored version is way more expensive (and IMO actually less desirable anyway).

I just wrote this:
github.com/clairexen/ed...

I've been meaning to write this code for quite some time...

06.06.2025 09:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

If you don't do the (infamously patented) pruning of the prefix sum network, then SAG is the natural behavior of that kind of bfly/ibfly based pext/pdep Implementation.

Afair the Hilewitz06 paper only adds that you can get rid of some of the half adders if you don't care about goat bits order.. ๐Ÿค”

04.06.2025 21:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

@clairexen is following 20 prominent accounts