-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX512 vbool16f/vbool16d are incorrect #5
Comments
@johguenther feel free to add any discussion here if the above is incomplete or incorrect for solving the AVX512 mask problem. I also just realized we could always use the 8-wide fallback for |
Although I already went in proposed direction I think we should step back for a sec and think of what the main goal of
The main difference will be how to handle the mixture of types of different size ( With option 1 there will be different With option 2 there will be one dominant |
For AVX512VL this issue of "vector of bools" vs. "bitmask" also comes up, since it backports a lot of the added mask intrinsics from AVX512 to the 128/256 bit vectors. These are able to use I started looking at doing some further template specialization on this route in a branch https://github.com/Twinklebear/tsimd/tree/avx512-vl , but I think (and it sounds like some discussion was had along this route too?) is that instead of going the nasty For @johguenther comment, my thought of tsimd was that it was aiming to do option 2, to give us something more in the style of ISPC (though with explicit control flow masking). |
We discussed this last year and the outcome was that we actually want both, a low-level and thin intrinsic abstraction and building upon that a high-level ISPC-style abstraction (where there is a fixed number of parallel "tasks" computed in parallel with SIMD instructions, for all basic types). |
This should be fixed now, please re-open if any issues come up. There are some corner cases that are still being ironed out (e.g. there are bugs in gcc?), but the fundamental changes for getting AVX512 masking right are now complete. |
Currently all 16-wide masks (i.e.
pack<bool32_t>
andpack<bool64_t>
), when compiling with AVX512, assume that all elements in the mask are 32/64 byte integers. However, AVX512 changes these to be bit masks instead of int masks, changing the underlying representation.Ideas (thus far) to make this work with minimal duplication:
arr
data member (in the union) based on a trait that would map to eitherstd::array<T, N>
orstd::array<bool32_t, 2>
for the special case of AVX512 vbool.operator[]()
returning a reference or returning by valueoperator[]()
always return by valueinsert()
/extract()
methods to get/set values in individual lanesbegin()
/end()
to return conforming iterators if they can't be just pointers (as they are now)Also note that technically
vboold16
doesn't exist yet...but this will be directly relevant to that implementation when the time comes.The text was updated successfully, but these errors were encountered: