Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed brokenness when sba is disabled. #777

Merged
merged 1 commit into from
Oct 2, 2023
Merged

Conversation

fgvanzee
Copy link
Member

Details:

  • Previously, disabling the sba via --disable-sba-pools resulted in a segfault due to a sanity-check-triggering abort(). The problem was that the sba, as currently used in the l3 thread decorators, did not yet (fully) support pools being disabled. The solution entailed creating wrapper function, bli_sba_array_elem(), which either calls bli_apool_array_elem() (when sba pools are enabled at configure time) or returns a NULL sba_pool pointer (when sba pools are disabled), and calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note that the NULL pointer returned by bli_sba_array_elem() when the sba pools are disabled does no harm since in that situation the pointer goes unreferenced when acquiring and releasing small blocks. Thanks to John Mather for reporting this bug.
  • Guarded the bodies of bli_sba_init() and bli_sba_finalize() with #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary to fix the aforementioned bug, but it seems like good practice.
  • Moved the code in bli_l3_thrinfo_create() that checked that the array* pointer is non-NULL before calling bli_sba_array_elem() (previously bli_apool_array_elem()) into the definition of bli_sba_array_elem().
  • Renamed various instances of pool variables and function parameters to sba_pool to emphasize what kind of pool it represents.
  • Whitespace changes.

cc: @jmather-sesi

Details:
- Previously, disabling the sba via --disable-sba-pools resulted in a
  segfault due to a sanity-check-triggering abort(). The problem was
  that the sba, as currently used in the l3 thread decorators, did not
  yet (fully) support pools being disabled. The solution entailed
  creating wrapper function, bli_sba_array_elem(), which either calls
  bli_apool_array_elem() (when sba pools are enabled at configure time)
  or returns a NULL sba_pool pointer (when sba pools are disabled), and
  calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
  that the NULL pointer returned by bli_sba_array_elem() when the sba
  pools are disabled does no harm since in that situation the pointer
  goes unreferenced when acquiring and releasing small blocks. Thanks to
  John Mather for reporting this bug.
- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
  #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
  to fix the aforementioned bug, but it seems like good practice.
- Moved the code in bli_l3_thrinfo_create() that checked that the array*
  pointer is non-NULL before calling bli_sba_array_elem() (previously
  bli_apool_array_elem()) into the definition of bli_sba_array_elem().
- Renamed various instances of 'pool' variables and function parameters
  to 'sba_pool' to emphasize what kind of pool it represents.
- Whitespace changes.
@fgvanzee fgvanzee merged commit c2099ed into master Oct 2, 2023
3 checks passed
@fgvanzee fgvanzee deleted the disabled_sba_fix branch October 2, 2023 19:56
fgvanzee added a commit that referenced this pull request May 29, 2024
Details:
- Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V
  instruction set architecture. The subconfig registers kernels from a
  correspondingly new kernel set, also named 'sifive_x280'.
- Added the aforementioned kernel set, which includes intrinsics- and
  assembly-based implementations of most level-1v kernels along with
  level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,
  gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).
- Registered the 'sifive_x280' subconfig as belonging to a singleton
  family by the same name.
- Added an entry to '.travis.yml' to test the new subconfig via qemu.
- Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'
  subconfig and to reflect updated tarball names.
- Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz
  for their engagement on this commit.
- (cherry picked from commit 05388dd)

Fixed HPX barrier synchronization (#783)

Details:
- Fixed hpx barrier synchronization. HPX was hanging on larger cores
  because blis was using non-hpx synchronization primitives. But when
  using hpx-runtime only hpx-synchronization primitives should be used.
  Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx
  barrier operations.
- Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with
  hpx::barrier on n_threads greater than actual hardware thread count
  causes synchronization issues making hpx hanging. This can be avoided
  by using hpx::futures, which are relatively very lightweight, robust
  and scalable.
- (cherry picked from 7a87e57)

Fixed bug in sup threshold registration. (#782)

Details:
- Fixed a bug that resulted in BLIS non-deterministically calling the
  gemmsup handler, irrespective of the thresholds that are registered
  via bli_cntx_set_blkszs().
- Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup
  thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no
  operation ever matched the criteria for gemmsup (unless specific sup
  thresholds are registered). HOWEVER, these thresholds are set via
  bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was
  only coping the thresholds into the gks' cntx_t if the values were
  strictly positive. Thus, the zero values passed into
  bli_cntx_set_blkszs() were being ignored and those threshold slots
  within the gks were left uninitialized. The upshot of this is that the
  reference gemmsup handler was being called for gemm problems
  essentially at random (and as it turns out, very rarely the reference
  gemmsup implementation would encounter a divide-by-zero error).
- The problem was fixed by changing bli_blksz_copy_if_pos() so that it
  copies values that are non-negative (values >= 0 instead of > 0). The
  function was also renamed to bli_blksz_copy_if_nonneg()
- Also needed to standardize use of -1 as the sole value to embed into
  blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register
  a value for that slot (and instead let whatever existing values
  remain). This required updates to the bli_cntx_init_*() functions for
  bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some
  of these codes were using 0 instead of -1.
- Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and
  proposing a fix for this issue.
- (cherry picked from 8fff1e3)

Update zen3 subconfig to support NVHPC compilers. (#779)

Details:
- Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file.
- Minor refactor to accommodate above edit.
- CREDITS file update.
- (cherry picked from 1e264a4)

Fixed brokenness when sba is disabled. (#777)

Details:
- Previously, disabling the sba via --disable-sba-pools resulted in a
  segfault due to a sanity-check-triggering abort(). The problem was
  that the sba, as currently used in the l3 thread decorators, did not
  yet (fully) support pools being disabled. The solution entailed
  creating wrapper function, bli_sba_array_elem(), which either calls
  bli_apool_array_elem() (when sba pools are enabled at configure time)
  or returns a NULL sba_pool pointer (when sba pools are disabled), and
  calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
  that the NULL pointer returned by bli_sba_array_elem() when the sba
  pools are disabled does no harm since in that situation the pointer
  goes unreferenced when acquiring and releasing small blocks. Thanks to
  John Mather for reporting this bug.
- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
  #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
  to fix the aforementioned bug, but it seems like good practice.
- Moved the code in bli_l3_thrinfo_create() that checked that the array*
  pointer is non-NULL before calling bli_sba_array_elem() (previously
  bli_apool_array_elem()) into the definition of bli_sba_array_elem().
- Renamed various instances of 'pool' variables and function parameters
  to 'sba_pool' to emphasize what kind of pool it represents.
- Whitespace changes.
- (cherry picked from c2099ed)

Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)

Details:
- Expanded existing BLAS compatibility APIs to provide interfaces to
  [cz]symv_(), [cz]syr_(). This was easy since those operations were
  already implemented natively in BLIS; the APIs were previously
  omitted only because they were not formally part of the BLAS.
- Implemented [cz]rot_() by feeding code from LAPACK 3.11 through
  f2c.
- Thanks to James Foster for pointing out that LAPACK contains these
  additional symbols, which prompted these additions, as well as for
  testing the [cz]rot_() functions from Julia's test infrastructure.
- CREDITS file update.
- (cherry picked from 37ca4fd)

Fixes to HPC runtime code path. (#773)

Details:
- Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX
  runtime was initialized using hpx::start, but the hpx::for_each
  function was being called on a non-hpx runtime (i.e standard BLIS
  runtime - single main thread). To run hpx::for_each on HPX runtime
  correctly, the code now uses hpx::run_as_hpx_thread(func, args...).
- Replaced hpx::for_each with hpx::for_loop, which eliminates use of
  hpx::util::counting_iterator.
- Employ hpx::execution::chunk_size(1) to make sure that a thread
  resides on a particular core.
- Replaced hpx::apply() with updated version hpx::post().
- Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread
  and is needed for writing results to output file.
- By default, if not specified, the HPX runtime uses all N threads/cores
  available in the system. But, if we want to only specify n_threads out
  N threads, we use hpx::execution::experimental::num_cores(n_threads).
- (cherry picked from a4a6329)

Fixed broken link in Multithreading.md. (#774)

Details:
- Replaced 404'd link in docs/Multithreading.md with an archive from
   The Wayback Machine.
- CREDITS file update.
- (cherry picked from c6546c1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant