8/14/2023 0 Comments Determine active bits in a byteThis bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. If you can use builtins or intrinsics for this, do so to give the compiler a chance to do target-specific optimizations. This is what GCC uses for _builtin_popcountll on x86 systems when the hardware popcnt instruction isn't enabled. So it doesn't take any extra steps, just wider constants. Multiply is done by left-shifting and adding, so a multiply of x * 0x01010101 results in x + (x>56. Once we have few enough "elements", a multiply with a magic constant can sum all the elements into the top element. But there is a more efficient way on machines with fast hardware multiply: Continuing on with the same pattern for 2 more steps can widen to 2x 16-bit then 1x 32-bit counts. So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations. 4+4 = 8 which still fits in 4 bits, so carry between nibble elements is impossible in i + (i > 4). It masks after adding instead of before, because the maximum value in any 4-bit accumulator is 4, if all 4 bits of the corresponding input bits were set. The final shift-and-add step of (i + (i > 4)) & 0x0F0F0F0F widens to 4x 8-bit accumulators. before shifting is a good thing when compiling for ISAs that need to construct 32-bit constants in registers separately. optimization isn't possible this time so it does just mask before / after shifting. The next step takes the odd/even eight of those 16x 2-bit accumulators and adds again, producing 8x 4-bit sums. This effectively does 16 separate additions in 2-bit accumulators ( SWAR = SIMD Within A Register). The first step is an optimized version of masking to isolate the odd / even bits, shifting to line them up, and adding. How this SWAR bithack works: i = i - ((i > 1) & 0x55555555)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |