-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xxhash: support SVE by intrinsic code #752
Conversation
@Cyan4973 Could you help to share any comments on this pull request? There's only SVE intrinsic code in this pull request. |
Hi @hzhuang1 , I'm essentially concerned about how it could be continuously tested. |
It seems
Perhaps |
@@ -3197,12 +3209,16 @@ enum XXH_VECTOR_TYPE /* fake enum */ { | |||
# define XXH_ACC_ALIGN 16 | |||
# elif XXH_VECTOR == XXH_AVX512 /* avx512 */ | |||
# define XXH_ACC_ALIGN 64 | |||
# elif XXH_VECTOR == XXH_SVE /* sve */ | |||
# define XXH_ACC_ALIGN 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question :
This variant seems to directly target 512-bit width.
My understanding of SVE
is that it's more flexible with regards to vector width, compared to Intel's AVX
.
That being said, the qemu
documentation implies that there are multiple hardware implementations of SVE
to expect, from 128 to 512 bits.
Will this vector extension work with all of them, or only specifically on the 512-bit one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. SVE ranges from 128-bit to 2048-bit. Should I create a macro to calculate it for SVE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've configured CI to run code from SVE128 to SVE2048. All of them could work with this XXH_ACC_ALIGN.
@@ -3178,6 +3184,12 @@ enum XXH_VECTOR_TYPE /* fake enum */ { | |||
# endif | |||
#endif | |||
|
|||
/* __ARM_FEATURE_SVE is only supported by GCC & Clang. */ | |||
#if (XXH_VECTOR == XXH_SVE) && !defined(__ARM_FEATURE_SVE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note : maybe minor, but surely worth mentioning.
This transparent fallback method
will make compilation succeed when SVE
is selected, but actually not present on compilation target.
But it also means that a developer may think that the library is compiled with SVE
enabled, because it's instructed to, but actually it's not and uses SCALAR
instead, and the developer would be none the wiser.
I'm wondering if it would be preferable to let compilation fail instead.
Intermediate position, maybe a warning (though unfortunately, I believe #warning
is not part of C90
syntax, and #pragma warning
depends on compiler support. But then, so does __ARM_FEATURE_SVE
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll update it.
SVE could be tested in QEMU. I'm using QEMU to test code. And I only test performance on real platform. But I don't know how to configure QEMU for SVE in your CI script. |
Yes, you're right. I'm using the config |
Thanks for the info!
You can add your aarch64 SVE test to the following lines xxHash/.github/workflows/ci.yml Lines 381 to 385 in 267ba94
For example:
Note that Lines 189 to 213 in 267ba94
|
Implement intrinsic code for ARM SVE. In this patch, it seems that there is no improvement. Further optimization will be contained in the later patch set. SCALAR implementation (default) === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 2718, 3093, 3151, 3187, 3217, 3232, 3238, 3223, 3190, 3240, 3241, 3222, 3238, 3049, 2253, 2113, 2102, 2117, 2132 XXH32 , 1499, 1520, 1538, 1543, 1543, 1545, 1547, 1535, 1486, 1487, 1484, 1468, 1484, 1435, 1231, 1183, 1184, 1185, 1187 XXH64 , 2632, 2874, 3017, 3092, 3132, 3143, 3155, 3122, 2990, 2994, 2998, 3000, 3006, 2842, 2121, 1838, 1833, 1811, 1839 XXH128 , 2435, 2872, 3024, 3121, 3187, 3219, 3234, 3222, 3221, 3225, 3225, 3227, 3227, 3034, 1962, 1789, 1866, 1871, 1874 Intrinsic SVE512 implementation === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 1903, 2313, 2469, 2568, 2631, 2663, 2683, 2662, 2673, 2678, 2679, 2678, 2676, 2584, 2236, 2173, 2173, 2163, 2183 XXH32 , 1326, 1436, 1495, 1523, 1535, 1543, 1547, 1536, 1505, 1506, 1504, 1507, 1508, 1446, 1246, 1193, 1194, 1194, 1167 XXH64 , 2510, 2802, 2977, 3071, 3121, 3136, 3156, 3126, 3041, 3047, 3050, 3050, 3048, 2890, 2043, 1944, 1944, 1945, 1955 XXH128 , 1867, 2295, 2462, 2542, 2627, 2663, 2679, 2656, 2661, 2675, 2673, 2647, 2671, 2574, 2218, 2144, 2169, 2175, 2181 Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>
Since the vector size of ARM64 SVE ranges from 128-bit to 2048-bit, configure CI to run with each vector size. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hzhuang1 , this looks good to me
Maybe I was too fast here : I don't see Github Action test results in the test summary. |
Signed-off-by: Haojian Zhuang haojian.zhuang@linaro.org
Signed-off-by: Devin Hussey easyaspi314@users.noreply.github.com