Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8320069: RISC-V: Add Zcb instructions #17122

Closed
wants to merge 5 commits into from
Closed

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Dec 15, 2023

Hi, this is the instructions for zcb.

Due to over lack of infrastructure having multiple extension dependent instruction does not fit well.
Some of these compressed instructions are also missing 1 to 1 mapping, e.g. now we have a compressed not, but the corresponding instruction in uncompressed is still xor.
I think we need to do some rework here.

I also I don't like the macro expansion as it hopeless in debugger and 'IDE's (vim+rtags for me).
(macro stuff was originally done when templates where blacklisted in hotspot)

And I don't want an option for this, as zcb is coming in hwprobe, if you have compressed on you get them if they are supported (may depend on e.g. zbb).

I have done some modification since it passed tier1, so I'm running stuff over the weekend.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8320069: RISC-V: Add Zcb instructions (Enhancement - P5)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17122/head:pull/17122
$ git checkout pull/17122

Update a local copy of the PR:
$ git checkout pull/17122
$ git pull https://git.openjdk.org/jdk.git pull/17122/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17122

View PR using the GUI difftool:
$ git pr show -t 17122

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17122.diff

Webrev

Link to Webrev Comment

@robehn robehn changed the title zcb instruction set 8320069: RISC-V: Add Zcb instructions Dec 15, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2023

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 15, 2023

@robehn The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review labels Dec 15, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 15, 2023

Webrevs

@VladimirKempik
Copy link

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

@robehn
Copy link
Contributor Author

robehn commented Dec 19, 2023

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.

So here I just try to follow the current code, see how lw is changed to c_lw.

@VladimirKempik
Copy link

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.

So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu.
it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

@robehn
Copy link
Contributor Author

robehn commented Dec 19, 2023

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

Yes, I have seen similar things.

  0x00002aaabc9464fc:   addiw   ra,ra,-1365 # 0x00000000000aaaab
  0x00002aaabc946500:   slli    ra,ra,0xd
  0x00002aaabc946502:   addi    ra,ra,-929
  0x00002aaabc946506:   slli    ra,ra,0xd
  0x00002aaabc946508:   addi    ra,ra,456
  0x00002aaabc94650c:   jalr    ra

As "111001000" would fit in the signed 12imm to jalr I think this is sub-optimal.

I can go over and fix them, I'll create jira.

@RealFYang
Copy link
Member

RealFYang commented Dec 19, 2023

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.

@VladimirKempik
Copy link

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu
an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.

If you take this PR https://github.com/openjdk/jdk/pull/17046/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR3717 and change explicit Assembler::lwu() to lwu() then you are likely to see this issue

@zifeihan
Copy link
Member

zifeihan commented Dec 20, 2023

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu
an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.

If you take this PR https://github.com/openjdk/jdk/pull/17046/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR3717 and change explicit Assembler::lwu() to lwu() then you are likely to see this issue

Hi, I have tried to use MacroAssembler::lwu instead, and I see no difference in stub code emitted.
I have added comment [1]. let's discuss on that PR.
[1] https://github.com/openjdk/jdk/pull/17046/files#r1432182715

@robehn
Copy link
Contributor Author

robehn commented Jan 2, 2024

Passes t1+t2 fastdebug (with some expected timeouts).

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine. I only have some minor comments.

src/hotspot/cpu/riscv/macroAssembler_riscv.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/assembler_riscv.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/assembler_riscv.hpp Outdated Show resolved Hide resolved
@robehn
Copy link
Contributor Author

robehn commented Jan 5, 2024

Seems fine. I only have some minor comments.

Thank you!

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated change looks good. Thanks.

@openjdk
Copy link

openjdk bot commented Jan 7, 2024

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8320069: RISC-V: Add Zcb instructions

Reviewed-by: fyang, vkempik

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 15 new commits pushed to the master branch:

  • faa9c69: 8322846: Running with -Djdk.tracePinnedThreads set can hang
  • ace010b: 8319757: java/nio/channels/DatagramChannel/InterruptibleOrNot.java failed: wrong exception thrown
  • be4614e: 8323016: Improve reporting for bad options
  • 35a1b77: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile
  • 46965a0: 8322981: Fix 2 locations in JDI that throw IOException without using the "Caused by" exception
  • 700c25f: 8322954: Shenandoah: Convert evac-update closures asserts to rich asserts
  • 631a9f6: 8323073: ProblemList gc/g1/TestSkipRebuildRemsetPhase.java on linux-aarch64
  • ed9f324: 8322985: [BACKOUT] 8318562: Computational test more than 2x slower when AVX instructions are used
  • ade21a9: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate
  • f0cfd36: 8322532: JShell : Unnamed variable issue
  • ... and 5 more: https://git.openjdk.org/jdk/compare/2a9c3589d941d9a57e536ea0b3d7919c6ddb82dc...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 7, 2024
@robehn
Copy link
Contributor Author

robehn commented Jan 8, 2024

Thank you @VladimirKempik @RealFYang

@robehn
Copy link
Contributor Author

robehn commented Jan 9, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Jan 9, 2024

Going to push as commit 30f93a2.
Since your change was applied there have been 45 commits pushed to the master branch:

  • 4cf131a: 8319716: RISC-V: Add SHA-2
  • 7286f52: 8322829: Refactor nioBlocker to avoid blocking while holding Thread's interrupt lock
  • 07fce8e: 8320864: Serial: Extract out Full GC related fields from ContiguousSpace
  • 176606d: 8310995: missing @SInCE tags in 36 jdk.dynalink classes
  • 8ae309e: 8318971: Better Error Handling for Jar Tool When Processing Non-existent Files
  • 841ab48: 8322657: CDS filemap fastdebug assert while loading Graal CE Polyglot in isolated classloader
  • 61ebe3b: 8323032: OptimizedModuleHandlingTest failed in dynamic CDS archive mode
  • ca9635d: 8322759: Eliminate -Wparentheses warnings in compiler code
  • 8a4dc79: 8274300: Address dsymutil warning by excluding platform specific files
  • d78e8da: 8322545: Declare newInsets as static in ThemeReader.cpp
  • ... and 35 more: https://git.openjdk.org/jdk/compare/2a9c3589d941d9a57e536ea0b3d7919c6ddb82dc...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 9, 2024
@openjdk openjdk bot closed this Jan 9, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 9, 2024
@openjdk
Copy link

openjdk bot commented Jan 9, 2024

@robehn Pushed as commit 30f93a2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn robehn deleted the zcb branch April 30, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants