8320069: RISC-V: Add Zcb instructions #17122

robehn · 2023-12-15T13:50:14Z

Hi, this is the instructions for zcb.

Due to over lack of infrastructure having multiple extension dependent instruction does not fit well.
Some of these compressed instructions are also missing 1 to 1 mapping, e.g. now we have a compressed not, but the corresponding instruction in uncompressed is still xor.
I think we need to do some rework here.

I also I don't like the macro expansion as it hopeless in debugger and 'IDE's (vim+rtags for me).
(macro stuff was originally done when templates where blacklisted in hotspot)

And I don't want an option for this, as zcb is coming in hwprobe, if you have compressed on you get them if they are supported (may depend on e.g. zbb).

I have done some modification since it passed tier1, so I'm running stuff over the weekend.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8320069: RISC-V: Add Zcb instructions (Enhancement - P5)

Reviewers

Fei Yang (@RealFYang - Reviewer)
Vladimir Kempik (@VladimirKempik - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17122/head:pull/17122
$ git checkout pull/17122

Update a local copy of the PR:
$ git checkout pull/17122
$ git pull https://git.openjdk.org/jdk.git pull/17122/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17122

View PR using the GUI difftool:
$ git pr show -t 17122

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17122.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-12-15T13:51:32Z

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-12-15T13:52:53Z

@robehn The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-12-15T14:02:48Z

Webrevs

VladimirKempik · 2023-12-18T15:10:13Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

robehn · 2023-12-19T07:58:52Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.

So here I just try to follow the current code, see how lw is changed to c_lw.

VladimirKempik · 2023-12-19T09:25:19Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.

So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu.
it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example

  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3

Using Assembler::lwu directly resulted in a correctly generated lwu

robehn · 2023-12-19T09:58:15Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example
  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3
Using Assembler::lwu directly resulted in a correctly generated lwu

Yes, I have seen similar things.

  0x00002aaabc9464fc:   addiw   ra,ra,-1365 # 0x00000000000aaaab
  0x00002aaabc946500:   slli    ra,ra,0xd
  0x00002aaabc946502:   addi    ra,ra,-929
  0x00002aaabc946506:   slli    ra,ra,0xd
  0x00002aaabc946508:   addi    ra,ra,456
  0x00002aaabc94650c:   jalr    ra

As "111001000" would fit in the signed 12imm to jalr I think this is sub-optimal.

I can go over and fix them, I'll create jira.

RealFYang · 2023-12-19T14:10:13Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu

an example
  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3
Using Assembler::lwu directly resulted in a correctly generated lwu

Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.

VladimirKempik · 2023-12-19T14:25:27Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu
an example
  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3
Using Assembler::lwu directly resulted in a correctly generated lwu
Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.

If you take this PR https://github.com/openjdk/jdk/pull/17046/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR3717 and change explicit Assembler::lwu() to lwu() then you are likely to see this issue

zifeihan · 2023-12-20T02:34:44Z

We already have "macroses" for load and stores in macroAssembler_riscv.hpp, what's the reason to do compression decision in assembler_riscv.hpp instead ( not saying it's wrong) ?

jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

Line 880 in 38d9472

INSN(lb);

No, you are correct I also think this is not optimal. I don't know the background, but it seems like this is the easiest way to add compressed transparently. But to fully utilize C instruction we should favor the x8->x15, we often don't get C due to e.g. BCP is in x22. I think to be able to better utilize C we can't have it so transparent.
So here I just try to follow the current code, see how lw is changed to c_lw.

Not exactly related to this PR, but I also saw a strange behaviour from MacroAssembler's lwu. it was generating lw + and ( a kind of lwu emulation) instead of lwu
an example
  0.44%  ?  0x0000003fa46a86c8:   slli    t3,t3,0x20
   0.48%  ?  0x0000003fa46a86ca:   addi    t3,t3,-1
  ....
   3.11%  ?  0x0000003fa46a86dc:   lw    a0,0(t1)
   5.34%  ?  0x0000003fa46a86e0:   and    a0,a0,t3
Using Assembler::lwu directly resulted in a correctly generated lwu
Interesting. This does not seem to reflect on the code of MacroAssembler's lwu. I wonder how could that happen.
If you take this PR https://github.com/openjdk/jdk/pull/17046/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR3717 and change explicit Assembler::lwu() to lwu() then you are likely to see this issue

Hi, I have tried to use MacroAssembler::lwu instead, and I see no difference in stub code emitted.
I have added comment [1]. let's discuss on that PR.
[1] https://github.com/openjdk/jdk/pull/17046/files#r1432182715

robehn · 2024-01-02T06:47:11Z

Passes t1+t2 fastdebug (with some expected timeouts).

RealFYang

Seems fine. I only have some minor comments.

src/hotspot/cpu/riscv/macroAssembler_riscv.hpp

src/hotspot/cpu/riscv/assembler_riscv.hpp

robehn · 2024-01-05T09:52:48Z

Seems fine. I only have some minor comments.

Thank you!

RealFYang

Updated change looks good. Thanks.

openjdk · 2024-01-07T08:34:26Z

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8320069: RISC-V: Add Zcb instructions

Reviewed-by: fyang, vkempik

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 15 new commits pushed to the master branch:

faa9c69: 8322846: Running with -Djdk.tracePinnedThreads set can hang
ace010b: 8319757: java/nio/channels/DatagramChannel/InterruptibleOrNot.java failed: wrong exception thrown
be4614e: 8323016: Improve reporting for bad options
35a1b77: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile
46965a0: 8322981: Fix 2 locations in JDI that throw IOException without using the "Caused by" exception
700c25f: 8322954: Shenandoah: Convert evac-update closures asserts to rich asserts
631a9f6: 8323073: ProblemList gc/g1/TestSkipRebuildRemsetPhase.java on linux-aarch64
ed9f324: 8322985: [BACKOUT] 8318562: Computational test more than 2x slower when AVX instructions are used
ade21a9: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate
f0cfd36: 8322532: JShell : Unnamed variable issue
... and 5 more: https://git.openjdk.org/jdk/compare/2a9c3589d941d9a57e536ea0b3d7919c6ddb82dc...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

robehn · 2024-01-08T07:25:08Z

Thank you @VladimirKempik @RealFYang

robehn · 2024-01-09T07:33:31Z

/integrate

openjdk · 2024-01-09T07:34:51Z

Going to push as commit 30f93a2.
Since your change was applied there have been 45 commits pushed to the master branch:

4cf131a: 8319716: RISC-V: Add SHA-2
7286f52: 8322829: Refactor nioBlocker to avoid blocking while holding Thread's interrupt lock
07fce8e: 8320864: Serial: Extract out Full GC related fields from ContiguousSpace
176606d: 8310995: missing @SInCE tags in 36 jdk.dynalink classes
8ae309e: 8318971: Better Error Handling for Jar Tool When Processing Non-existent Files
841ab48: 8322657: CDS filemap fastdebug assert while loading Graal CE Polyglot in isolated classloader
61ebe3b: 8323032: OptimizedModuleHandlingTest failed in dynamic CDS archive mode
ca9635d: 8322759: Eliminate -Wparentheses warnings in compiler code
8a4dc79: 8274300: Address dsymutil warning by excluding platform specific files
d78e8da: 8322545: Declare newInsets as static in ThemeReader.cpp
... and 35 more: https://git.openjdk.org/jdk/compare/2a9c3589d941d9a57e536ea0b3d7919c6ddb82dc...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-01-09T07:34:57Z

@robehn Pushed as commit 30f93a2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

zcb instruction set

38d9472

robehn changed the title ~~zcb instruction set~~ 8320069: RISC-V: Add Zcb instructions Dec 15, 2023

openjdk bot added hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review labels Dec 15, 2023

robehn added 2 commits December 20, 2023 07:47

Merge branch 'master' into zcb

f0206e5

Merge branch 'master' into zcb

4fa46f3

luhenry mentioned this pull request Dec 20, 2023

8317721: RISC-V: Implement CRC32 intrinsic #17046

Closed

3 tasks

RealFYang reviewed Jan 3, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.hpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/assembler_riscv.hpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/assembler_riscv.hpp Outdated Show resolved Hide resolved

robehn added 2 commits January 5, 2024 10:03

Merge branch 'master' into zcb

c30caa0

Review fixes

f677ca3

VladimirKempik approved these changes Jan 5, 2024

View reviewed changes

RealFYang approved these changes Jan 7, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 7, 2024

openjdk bot added the integrated Pull request has been integrated label Jan 9, 2024

openjdk bot closed this Jan 9, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 9, 2024

RealFYang mentioned this pull request Feb 27, 2024

8320646: RISC-V: C2 VectorCastHF2F #17698

Closed

3 tasks

robehn deleted the zcb branch April 30, 2024 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8320069: RISC-V: Add Zcb instructions #17122

8320069: RISC-V: Add Zcb instructions #17122

robehn commented Dec 15, 2023 •

edited by openjdk bot

Loading

bridgekeeper bot commented Dec 15, 2023

openjdk bot commented Dec 15, 2023

mlbridge bot commented Dec 15, 2023 •

edited

Loading

VladimirKempik commented Dec 18, 2023

robehn commented Dec 19, 2023 •

edited

Loading

VladimirKempik commented Dec 19, 2023

robehn commented Dec 19, 2023 •

edited

Loading

RealFYang commented Dec 19, 2023 •

edited

Loading

VladimirKempik commented Dec 19, 2023

zifeihan commented Dec 20, 2023 •

edited

Loading

robehn commented Jan 2, 2024

RealFYang left a comment

robehn commented Jan 5, 2024

RealFYang left a comment

openjdk bot commented Jan 7, 2024

robehn commented Jan 8, 2024

robehn commented Jan 9, 2024

openjdk bot commented Jan 9, 2024

openjdk bot commented Jan 9, 2024

8320069: RISC-V: Add Zcb instructions #17122

8320069: RISC-V: Add Zcb instructions #17122

Conversation

robehn commented Dec 15, 2023 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Dec 15, 2023

openjdk bot commented Dec 15, 2023

mlbridge bot commented Dec 15, 2023 • edited Loading

Webrevs

VladimirKempik commented Dec 18, 2023

robehn commented Dec 19, 2023 • edited Loading

VladimirKempik commented Dec 19, 2023

robehn commented Dec 19, 2023 • edited Loading

RealFYang commented Dec 19, 2023 • edited Loading

VladimirKempik commented Dec 19, 2023

zifeihan commented Dec 20, 2023 • edited Loading

robehn commented Jan 2, 2024

RealFYang left a comment

Choose a reason for hiding this comment

robehn commented Jan 5, 2024

RealFYang left a comment

Choose a reason for hiding this comment

openjdk bot commented Jan 7, 2024

robehn commented Jan 8, 2024

robehn commented Jan 9, 2024

openjdk bot commented Jan 9, 2024

openjdk bot commented Jan 9, 2024

robehn commented Dec 15, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 15, 2023 •

edited

Loading

robehn commented Dec 19, 2023 •

edited

Loading

robehn commented Dec 19, 2023 •

edited

Loading

RealFYang commented Dec 19, 2023 •

edited

Loading

zifeihan commented Dec 20, 2023 •

edited

Loading