Upstream of AOCL 2.2.1 changes. #448

dzambare · 2020-09-22T04:26:37Z

Following features/optimizations were added in this release.

Added support for BLIS on Windows, functional parity with Linux BLIS 2.2.1 feature set.
Added support for BLIS extension API ?gemmt.
Debug and trace logs
Optimized SGEMV kernel
Improve Complex GEMM performance, Added Support of N SUP kernel for complex float and complex double, Removed prefetching in M SUP kernels for complex float and complex double.
Implemented dot Product Kernels in SGEMM SUP for transpose cases.
DGEMM Packing Kernels for Native DGEMM implementation
Selective Packing changes are implemented in sgemm sup

Please do the needful and let us know if you have any questions.

Details: - Made extra explicit the fact that: (a) multithreading in BLIS is disabled by default; and (b) even with multithreading enabled, the user must specify multithreading at runtime in order to observe parallelism. Thanks to M. Zhou for suggesting these clarifications in flame#292. - Also made explicit that only the environment variable and global runtime API methods are available when using the BLAS API. If the user wishes to use the local runtime API (specify multithreading on a per-call basis), one of the native BLIS APIs must be used.

Details: - Replaced the existing --enable-export-all / --disable-export-all configure option with --export-shared=[public|all], with the 'public' instance of the latter corresponding to --disable-export-all and the 'all' instance corresponding to --enable-export-all. Nothing else semantically about the option, or its default, has changed.

Details: - Adjusted the zen sub-configuration's cache blocksizes for float, scomplex, and dcomplex based on the existing values for double. (The previous values were taken directly from the haswell subconfig, which targets Intel Haswell/Broadwell/Skylake systems.)

Details: - Added a new markdown document, docs/Performance.md, which reports performance of a representative set of level-3 operations across a variety of hardware architectures, comparing BLIS to OpenBLAS and a vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs, in pdf and png formats, reside in docs/graphs. - Updated README.md to link to new Performance.md document. - Minor updates to CREDITS, docs/Multithreading.md. - Minor updates to matlab scripts in test/3/matlab.

Details: - Fixed a few broken section links in the Contents section.

Details: - Fixed some incorrect labels associated with the pdf/png graphs, apparently the result of copy-pasting.

Details: - Updated ReleaseNotes.md in preparation for next version.

Details: - Defined GFLOPS as billions of floating-point operations per second, and reworded the sentence after about normalization.

Details: - Added targets to test/3/Makefile that link against a BLAS library build by Eigen. It appears, however, that Eigen's BLAS library does not support multithreading. (It may be that multithreading is only available when using the native C++ APIs.) - Updated runme.sh with a few Eigen-related tweaks. - Minor tweaks to docs/Performance.md.

Details: - Modified bli_blas.h so that: - By default, if the BLAS layer is enabled at configure-time, BLAS prototypes are also enabled within blis.h; - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including blis.h, BLAS prototypes are skipped over entirely so that, for example, the application or some other header pulled in by the application may prototype the BLAS functions without causing any duplication. - Updated docs/BuildSystem.md to document the feature above, and related text.

clang -dumpversion gives 4.2.1 for all clang versions as clang was originally compatible with gcc 4.2.1 Apple clang version and clang version are two different things and the real clang version cannot be deduced from apple clang version programatically. Rely on wikipedia to map apple clang to clang version Also fixes assembly detection with clang clang 3.8 can't build knl as it doesn't recognize zmm0

Details: - Use compile-time implementations of Eigen in test_gemm.c via new EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS library is not necessary.) However, as of Eigen 3.3.7, Eigen only parallelizes the gemm operation and not hemm, herk, trmm, trsm, or any other level-3 operation. - Fixed a bug in trmm and trsm drivers whereby the wrong function (bli_does_trans()) was being called to determine whether the object for matrix A should be created for a left- or right-side case. This was corrected by changing the function to bli_is_left(), as is done in the hemm driver. - Added support for running Eigen test drivers from runme.sh.

Details: - Adjusted test/3/Makefile so that the test drivers are linked against Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do this since Eigen's headers don't define implementations to the standard BLAS APIs. - Simplified #included headers in hemm, herk, trmm, and trsm source driver files, since nothing specific to Eigen is needed at compile-time for those operations.

Export macros can't support both shared and static at the same time. When blis is built with both shared and static, headers assume that shared is used at link time and dllimports the symbols with __imp_ prefix. To use the headers with static libraries a user can give -DBLIS_EXPORT= to import the symbol without the __imp_ prefix

Details: - Fixed the Makefile in test/3 so that it no longer incorrectly labels the matlab output variables from Eigen-linked hemm, herk, trmm, and trsm driver output as "vendor". (The gemm drivers were already correctly outputing matlab variables containing the "eigen" label.)

Details: - Updated matlab scripts in test/3/matlab to optionally plot/display Eigen performance curves. Whether Eigen is plotted is determined by a new boolean function parameter, with_eigen. - Updated runme.m scratchpad to reflect the latest invocations of the plot_panel_4x5() function (with Eigen plotting enabled).

Details: - Updated the Haswell, SkylakeX, and Epyc performance graphs in docs/graphs to report on Eigen implementations, where applicable. Specifically, Eigen implements all level-3 operations sequentially, however, of those operations it only provides multithreaded gemm. Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are omitted. Thanks to Sameer Agarwal for his help configuring and using Eigen. - Updated docs/Performance.md to note the new implementation tested. - CREDITS file update.

Details: - Added/updated a few more details, mostly regarding Eigen.

Details: - Updated the level-3 performance graphs in docs/graphs with new Eigen results, this time using a development version cloned from their git mirror on March 27, 2019 (version 3.3.90). Performance is improved over 3.3.7, though still noticeably short of BLIS/MKL in most cases. - Very minor updates to docs/Performance.md and matlab scripts in test/3/matlab.

Details: - Renamed kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c to kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c. This follows the naming convention used by other kernel sets, most notably haswell.

Change void*-typed function pointers to void_fp. - Updated all instances of void* variables that store function pointers to variables of a new type, void_fp. Originally, I wanted to define the type of void_fp as "void (*void_fp)( void )"--that is, a pointer to a function with no return value and no arguments. However, once I did this, I realized that gcc complains with incompatible pointer type (-Wincompatible-pointer-types) warnings every time any such a pointer is being assigned to its final, type-accurate function pointer type. That is, gcc will silently typecast a void* to another defined function pointer type (e.g. dscalv_ker_ft) during an assignment from the former to the latter, but the same statement will trigger a warning when typecasting from a void_fp type. I suspect an explicit typecast is needed in order to avoid the warning, which I'm not willing to insert at this time. - Added a typedef to bli_type_defs.h defining void_fp as void*, along with a commented-out version of the aborted definition described above. (Note that POSIX requires that void* and function pointers be interchangeable; it is the C standard that does not provide this guarantee.) - Comment updates to various _oapi.c files.

Details: - Added more details and clarifying language to implications of 1m and the recycling of microkernels between microarchitectures.

Details: - Fixed a minor bug in flatten-headers.py whereby the script, upon encountering a #include directive for the root header file, would erroneously recurse and inline the conents of that root header. The script has been modified to avoid recursion into any headers that share the same name as the root-level header that was passed into the script. (Note: this bug didn't actually manifest in BLIS, so it's merely a precaution for usage of flatten-headers.py in other contexts.)

Details: - Changed the default installation prefix from $HOME/lib to /usr/local. - Modified the way configure internally handles the prefix, libdir, includedir, and sharedir (and also added an --exec-prefix option). The defaults to these variables are set as follows: prefix: /usr/local exec_prefix: ${prefix} libdir: ${exec_prefix}/lib includedir: ${prefix}/include sharedir: ${prefix}/share The key change, aside from the addition of exec_prefix and its use to define the default to libdir, is that the variables are substituted into config.mk with quoting that delays evaluation, meaning the substituted values may contain unevaluated references to other variables (namely, ${prefix} and ${exec_prefix}). This more closely follows GNU conventions, including those used by GNU autoconf, and also allows make to override any one of the variables *after* configure has already been run (e.g. during 'make install'). - Updates to build/config.mk.in pursuant to above changes. - Updates to output of 'configure --help' pursuant to above changes. - Updated docs/BuildSystem.md to reflect the new default installation prefix, as well as mention EXECPREFIX and SHAREDIR. - Changed the definitions of the UNINSTALL_OLD_* variables in the top-level Makefile to use $(wildcard ...) instead of 'find'. This was motivated by the new way of handling prefix and friends, which leads to the 'find' command being run on /usr/local (by default), which can take a while almost never yielding any benefit (since the user will very rarely use the uninstall-old targets). - Removed periods from the end of descriptive output statements (i.e., non-verbose output) since those statements often end with file or directory paths, which get confusing to read when puctuated by a period. - Trival change to 'make showconfig' output. - Removed my name from 'configure --help'. (Many have contributed to it over the years.) - In configure script, changed the default state of threading_model variable from 'no' to 'off' to match that of debug_type, where there are similarly more than two valid states. ('no' is still accepted if given via the --enable-debug= option, though it will be standardized to 'off' prior to config.mk being written out.) - Minor variable name change in flatten-headers.py that was intended for 32812ff. - CREDITS file update.

Details: - Somehow the variable name change (root_file_name -> root_inputname) in flatten-headers.py mentioned in the commit log entry for 89a70cc didn't make it into the actual commit. This commit applies that change.

Details: - Added preprocessor branches to test/3/test_gemm.c to explicitly support row-stored matrices. Column-stored matrices are also still supported (and is the default for now). (This is mainly residual work leftover from initial integration of Eigen into the test drivers, so if we ever want to test Eigen with row-stored matrices, the code will be ready to use, even if it is not yet integrated into the Makefile in test/3.)

…mats and non Transpose/Conjugate Matrices Failure was seen in libflame function (FLASH_UDdate_UT_inc) Due to typecasting double complex pointer as double pointer Change-Id: If6e2f4663575450a13a9a07dddd5622628f5c6b0

This will ensure early return in case full gemm processing is not needed. Based on dimension which is found to be zero following actions will be taken: If 'c' has zero dimension, no further processing is requried If alpha is zero or if 'a' or 'b' has zero diemension, we perform scalm operation instead of gemm. (c = alpha*a + beta*b) Change-Id: Icc031944fc4e80138adf991974547f2d57ab570b AMD-Internal: [CPUPL-904]

Change-Id: Icad0ff1c1858c1762792ba8f2c5c3e846909cbb5

…o amd-staging-rome-2.2

Details: - Optimized saxpyf kernel with fuse_factor=5 and iter_unroll=2. - Modified framework files of sgemv to remove dependency on cntx variable. - Updated cntx_init file of zen2 to choose optimized kernels. - Modified BLAS interface call for SGEMV to reduce framework overhread. - Currently these changes are applicable for zen2 configuration. Change-Id: Iabc36ae640e82e65f8764f3c6dee513ad64b22fd Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-707]

Added traces from blas/cblas API's till kernels for dgemm and sgemm. By default the traces will be disabled, user need to enable them in their local workspace, please check aocl_dtl/aocldtlcf.h file. AMD Internal : CPUPL-806 Change-Id: I83b310509fb1a599c114387192bcf882ef0480f9

Change-Id: I0f902e32085058ec618d08470793f5e5e49719b3

…rements Multiple trace levels will allow user to set the nested call levels up to which the traces to be limited. It will also reduce file size requirements. Also optimized auto trace output to reduce file size by removing thread ID's from individual lines. AMD Internal: [CPUPL-806] Change-Id: I28e08a5bdf1b147469d8ce290ff7cde7f74481bd

Added BLIS specific extension to AOCL DTL, in this added support to print the input matrix sizes from BLIS library. AMD Internal: [CPUPL-806] Change-Id: I80ed779d65f9b1c48466137fc2f05629fa2fb561

This library ported on Windows 10 using CMake scripts and Visual Studio 2019 with clang compiler AMD internal:[CPUPL-657] Change-Id: Ie701f52ebc0e0585201ba703b6284ac94fc0feb9

Details: - Fixed an innocuous bug that manifested when running the testsuite on extremely small matrices with randomization via the "powers of 2 in narrow precision range" option enabled. When the randomization function emits a perfect 0.0 to fill a 1x1 matrix, the testsuite will then compute 0.0/0.0 during the normalization process, which leads to NaN residuals. The solution entails smarter implementaions of randv, randnv, randm, and randnm, each of which will compute the 1-norm of the vector or matrix in question. If the object has a 1-norm of 0.0, the object is re-randomized until the 1-norm is not 0.0. Thanks to Kiran Varaganti for reporting this issue (flame#413). - Updated the implementation of randm_unb_var1() so that it loops over a call to the randv_unb_var1() implementation directly rather than calling it indirectly via randv(). This was done to avoid the overhead of multiple calls to norm1v() when randomizing the rows/columns of a matrix. - Updated comments. Change-Id: I0e3d65ff97b26afde614da746e17ed33646839d1

Details: - Added new API Which Computes a matrix-matrix product with general matrices but updates only the upper or lower triangular part of the result matrix. cblas_?gemmt() and ?gemmt_(). - These routines are similar to the ?gemm routines, but they only access and update a triangular part of the square result matrix. - Added DGEMMT functionality by reusing GEMM kernels. - Created a new folder for GEMMT under l3, and added GEMMT specific framework code. - Modified cntl_create routine to choose different macro kernel for GEMMT. - Added routines to copy lower/upper triangular part of a block to the buffer. - Defined BLIS, BLAS and CBLAS interface APIs for GEMMT. - Added test_gemmt.c to test folder and Updated the Makefile. - Added a macro 'CBLAS' in test_gemm.c to call CBLAS APIs. Change-Id: Ie00c1a15b9c654b65c687a9ca781cbc6f9641791

…lso supports complex data types. Details: - Added framework code for GEMMT SUP. - Implemented SUP for GEMMT using similar techniques as native path. - Moved update routines to frame/util folder. - Ported update routines for complex datatypes. Change-Id: I17adfd0586d07f5a23dca6a07b2d48f4c9fcf71c Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>, Dipal M Zambare <DipalMadhukar.Zambare@amd.com>, Mangala V <managala.v@amd.com>

…vironment. 1) Added dcomplex based zdotc_ version as a function with additional parameter. 2) The datatypes (single , double, Complex) functions retained as the macros. 3) This modification handles the ZDOTC_ invocation from Fortran based application for 'double complex' datatypes. 4) The modifications are placed under macro 'AOCL_F2C'. 5) Blis, Blas Test suites verified ALL PASS with GCC and Flang + with and without 'AOCL_F2C' macro on Ubuntu machine. 6) Adding BLIS_EXPORT_BLAS to make the APIs visible when linking dll. Change-Id: I4ada39a73f416e3794708f5b55e947342c261117 Signed-off-by: Meghana <Meghana.Vankadari@amd.com>, Nagendra <Nagendra.PrasadM@amd.com> AMD-Internal: [SWLCSG-177]

Details: - Since GEMM kernel prefers row-storage, if input C matrix is in col-major order, entire operation is transposed. In that case uplo(c) needs to be toggled before kernel-variant selection. - disabled "bli_gemmsup_ref_var1n2m_opt_cases" inside gemmtsup. - Updated version number to 2.2.1 Change-Id: I0a85df1141fc4a98d98ea4e0c3d42db8602fa69b

Details: - BLIS test application throws an error when built with dynamic library as "Undefined reference to bli_abort". This happens because bli_abort is hidden and cannot be linkable from outside. Annotating prototype with BLIS_EXPORT_BLAS to make it public. Change-Id: I0d7aec046e8871ba6491024694ed06f883b005ac AMD Internal: [CPUPL-1030]

…nels Change-Id: Ib309aba0cb08161877fd1a720ed65222d3b303f3

Details: - Since C is triangular, in order to maintain load balance among threads, we need to use weighted range partitioning. Change-Id: I03d8ff71ac7af843acd787f1389b5907b56453ee

Details: - Unlike default path, storage scheme of C is not always row-major in SUP. - Whenever C is col-major, the temporary buffer 'ct' is also chosen to be col-major. - Since update routines only support row-major order, a transpose is induced for c and ct buffers before passing them to update routine. Change-Id: I3fea10860f39632df7540c9399786e7aa1cfba37

Details: - If there are any zero rows or columns along the edges of MCxNC block of C, shrink the dimensions to avoid "no-op" iterations. - For lower-triangle kernel variant, Added a flag to determine if a block that is strictly below triangle is reached. Once such block is reached, the flag is set and all the blocks that are below it are strictly below the diagonal and flag is used to make decision. - For upper-triangle kernel-variant, whenever a block that is strictly below the triangle is reached, break the for loop and go for next iteration of JR loop because all the blocks below it will also be strictly below diagonal and are filled with zeroes which requires no computation. Change-Id: I606b0f900509aab6ed7ff30cefee9d7207b7b010

…ing-rome-2.2.1

The testsuite coveres all combinations of upper, lower, transpose and API formats. AMD Internal: [CPUPL-1021] Change-Id: I2a1d79eba1dcaf4217fd9c2c346bd6173b80a782

Details: - Problem: If row major, first four elements of last column on output matrix C was not updated If col major, first four elements of last row on output matrix C was not updated - Solution: Updating elements after computation is done on right offset in bli_dgemmsup_rv_haswell_asm_5x8() Change-Id: I588c60f2f3cd5f51e475cfc140e3bf0e9d5a4dae

…ixed" This reverts commit 725bf5a. Reason for revert: <INSERT REASONING HERE> Change-Id: I7dd6b84731f091c8b39080ed9321a708fa5f11d8

GEMMT changes porting on to Windows AMD Internal : [CPUPL-1061] Change-Id: I587d1789cd29ea18b04f8ab43e5742b4d902067a

Details: - Removed a few flags that slipped into the recent merge of #448 which *may* be causing breakage. This commit moves amd_config.mk back to the state it is in, more or less, in the 'master' branch.

Details: - Found the likely cause of at least some CI troubles: the commit that merged #448 (33f75df) did not include a reversion of the sup thresholds for scomplex and dcomplex as I intended. - Preprocessed out an extra macro in bli_gemm_small.c that I missed previously.

Merged contributions from AMD's AOCL BLIS (#448). Details: - Added support for level-3 operation gemmt, which performs a gemm on only the lower or upper triangle of a square matrix C. For now, only the conventional/large code path will be supported (in vanilla BLIS). This was accomplished by leveraging the existing variant logic for herk. However, some of the infrastructure to support a gemmtsup is included in this commit, including - A bli_gemmtsup() front-end, similar to bli_gemmsup(). - A bli_gemmtsup_ref() reference handler function. - A bli_gemmtsup_int() variant chooser function (with variant calls commented out). - Added support for inducing complex domain gemmt via the 1m method. - Added gemmt APIs to the BLAS and CBLAS compatiblity layers. - Added gemmt test module to testsuite. - Added standalone gemmt test driver to 'test' directory. - Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md. - Added a C++ template header (blis.hh) containing a BLAS-inspired wrapper to a set of polymorphic CBLAS-like function wrappers defined in another header (cblas.hh). These two headers are installed if running the 'install' target with INSTALL_HH is set to 'yes'. (Also added a set of unit tests that exercise blis.hh, although they are disabled for now because they aren't compatible with out-of-tree builds.) These files now live in the 'vendor' top-level directory. - Various updates to 'zen' and 'zen2' subconfigurations, particularly within the context initialization functions. - Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and various minor updates to dotv and scalv kernels. Also added various sup kernels contributed by AMD to kernels/zen/3. However, these kernels are (for now) not yet used, in part because they caused AppVeyor clang failures, and also because I have not found time to review and vet them. - Output the python found during configure into the definition of PYTHON in build/config.mk (via build/config.mk.in). - Added early-return checks (A, B, or C with zero dimension; alpha = 0) to bli_gemm_front.c. - Implemented explicit beta = 0 handling in for the sgemm ukernel in bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent bug surfaced because the gemmt module verifies its computation using gemm with its beta parameter set to zero, which, on a cortexa15 system caused the gemm kernel code to unconditionally multiply the uninitialized C data by beta. The C matrix likely contained non-numeric values such as NaN, which then would have resulted in a false failure. - Fixed a bug whereby the implementation for bli_herk_determine_kc(), in bli_l3_blocksize.c, was inadvertantly being defined in terms of helper functions meant for trmm. This bug was probably harmless since the trmm code should have also done the right thing for herk. - Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in kernels/zen/3/bli_gemm_small.c since those macros are not used in vanilla BLIS. - Added cpp guard to definition of bli_mem_clear() in bli_mem.h to accommodate C++'s stricter type checking. - Added cpp guard to test/*.c drivers that facilitate compilation on Windows systems. - Various whitespace changes.

fgvanzee and others added 30 commits August 23, 2019 14:18

Fixed broken section links in docs/Performance.md.

25db903

Details: - Fixed a few broken section links in the Contents section.

Minor fixes to docs/Performance.md.

6385a3e

Details: - Fixed some incorrect labels associated with the pdf/png graphs, apparently the result of copy-pasting.

Very minor tweaks to Performance.md.

cd81a6a

ReleaseNotes.md update in advance of next version.

14bc42f

Details: - Updated ReleaseNotes.md in preparation for next version.

CHANGELOG update (0.5.2)

38e2180

More minor tweaks to docs/Performance.md.

366c4b1

Details: - Defined GFLOPS as billions of floating-point operations per second, and reworded the sentence after about normalization.

Minor text updates (Eigen) to docs/Performance.md.

cb45eb9

Details: - Added/updated a few more details, mostly regarding Eigen.

Use pthreads on MinGW and Cygwin (flame#307)

231a4b7

Renamed armv8a gemm kernel filename.

f061c75

Details: - Renamed kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c to kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c. This follows the naming convention used by other kernel sets, most notably haswell.

Minor update to docs/HardwareSupport.md document.

959d8d9

Details: - Added more details and clarifying language to implications of 1m and the recycling of microkernels between microarchitectures.

Applied forgotten variable rename from 89a70cc.

f205eea

Details: - Somehow the variable name change (root_file_name -> root_inputname) in flatten-headers.py mentioned in the commit log entry for 89a70cc didn't make it into the actual commit. This commit applies that change.

make unix friendly archives on appveyor (flame#310)

e253bfc

managalv and others added 25 commits June 2, 2020 22:27

Enabled AOCC specific flags for all versions of AOCC compiler

f4d2bb2

Change-Id: Icad0ff1c1858c1762792ba8f2c5c3e846909cbb5

Merge "Checking for zero dimension is moved to bli_gemm_xx call." int…

8a367c9

…o amd-staging-rome-2.2

Replace back major version number variable in Makefile

3620e47

Change-Id: I0f902e32085058ec618d08470793f5e5e49719b3

Added support for logging gemm input values.

80b3127

Added BLIS specific extension to AOCL DTL, in this added support to print the input matrix sizes from BLIS library. AMD Internal: [CPUPL-806] Change-Id: I80ed779d65f9b1c48466137fc2f05629fa2fb561

BLIS library porting on to Windows:

ccf0772

This library ported on Windows 10 using CMake scripts and Visual Studio 2019 with clang compiler AMD internal:[CPUPL-657] Change-Id: Ie701f52ebc0e0585201ba703b6284ac94fc0feb9

set the gemmt slot to the default gemmt sup handler for reference ker…

89245a7

…nels Change-Id: Ib309aba0cb08161877fd1a720ed65222d3b303f3

Using weighted thread range partitioning for GEMMT

3d35af3

Details: - Since C is triangular, in order to maintain load balance among threads, we need to use weighted range partitioning. Change-Id: I03d8ff71ac7af843acd787f1389b5907b56453ee

Merge "Added some optimizations for gemmt default path" into amd-stag…

25f5a4e

…ing-rome-2.2.1

Added testsuite for gemmt APIs.

12b1215

The testsuite coveres all combinations of upper, lower, transpose and API formats. AMD Internal: [CPUPL-1021] Change-Id: I2a1d79eba1dcaf4217fd9c2c346bd6173b80a782

Revert "CPUPL-1059: Failures seen in DGEMM SUP for specific size is f…

ac90bac

…ixed" This reverts commit 725bf5a. Reason for revert: <INSERT REASONING HERE> Change-Id: I7dd6b84731f091c8b39080ed9321a708fa5f11d8

BLIS library porting on to Windows:

434b018

GEMMT changes porting on to Windows AMD Internal : [CPUPL-1061] Change-Id: I587d1789cd29ea18b04f8ab43e5742b4d902067a

jeffhammond changed the title ~~Upstream of AOCL 2.2.1 chagnes.~~ Upstream of AOCL 2.2.1 changes. Sep 23, 2020

fgvanzee merged commit 33f75df into flame:amd Nov 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream of AOCL 2.2.1 changes. #448

Upstream of AOCL 2.2.1 changes. #448

dzambare commented Sep 22, 2020

Upstream of AOCL 2.2.1 changes. #448

Upstream of AOCL 2.2.1 changes. #448

Conversation

dzambare commented Sep 22, 2020