Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev][Archery][Integration] Reduce needless test matrix #44062

Closed
kou opened this issue Sep 11, 2024 · 11 comments
Closed

[Dev][Archery][Integration] Reduce needless test matrix #44062

kou opened this issue Sep 11, 2024 · 11 comments

Comments

@kou
Copy link
Member

kou commented Sep 11, 2024

Describe the enhancement requested

Archery generates product of all testers:

for producer, consumer in itertools.product(
filter(lambda t: t.PRODUCER, self.testers),
filter(lambda t: t.CONSUMER, self.testers)):

If we enable C++, Java and Rust, we use the following patterns:

Producer Consumer
C++ C++
C++ Java
C++ Rust
Java C++
Java Java
Java Rust
Rust C++
Rust Java
Rust Rust

In apache/arrow, the following patterns are redundant because they should be done in apache/arrow-rs:

Producer Consumer
Rust Rust

In apache/arror-rs, the following patterns are redundant because they should be done in apache/arrow:

Producer Consumer
C++ C++
C++ Java
Java C++
Java Java

Can we remove redundant patterns?

Component(s)

Developer Tools, Integration

@kou
Copy link
Member Author

kou commented Sep 11, 2024

@alamb @tustvold @paleolimbot @zeroshade @pitrou What do you think about this?

@pitrou
Copy link
Member

pitrou commented Sep 11, 2024

No opinion from me. I suppose it will make the test a bit shorter, but not by much since it's still O(n**2) where n is the number of implementations.

@alamb
Copy link
Contributor

alamb commented Sep 11, 2024

Seems like a good idea to me. Thanks @kou

@zeroshade
Copy link
Member

Also seems like a good idea to me too, thanks @kou

@paleolimbot
Copy link
Member

I would also love this for the integration run in nanoarrow (i.e., I would rather only run the nanoarrow-related scenarios!).

@kou
Copy link
Member Author

kou commented Sep 12, 2024

Thanks for sharing your opinions.
I'll implement this.

kou added a commit to kou/arrow that referenced this issue Sep 13, 2024
If we enable C++, Java and Rust, we use the following patterns:

| Producer | Consumer |
|----------|----------|
| C++      | C++      |
| C++      | Java     |
| C++      | Rust     |
| Java     | C++      |
| Java     | Java     |
| Java     | Rust     |
| Rust     | C++      |
| Rust     | Java     |
| Rust     | Rust     |

In apache/arrow, the following patterns are redundant because they
should be done in apache/arrow-rs:

| Producer | Consumer |
|----------|----------|
| Rust     | Rust     |

In apache/arror-rs, the following patterns are redundant because they
should be done in apache/arrow:

| Producer | Consumer |
|----------|----------|
| C++      | C++      |
| C++      | Java     |
| Java     | C++      |
| Java     | Java     |

Add `--target-language` option. We can specify target languages by
this. (We can specify `--target-language` multiple times.) Here are
expected usages:

In apache/arrow:
* `--target-language=cpp`
* `--target-language=csharp`
* `--target-language=go`
* `--target-language=java`
* `--target-language=js`

In apache/arrow-rs
* `--target-language=rust`

Here is an example in apache/arrow-rs:

T: Languages specified by `--target-language`
   * rust
O: Languages not specified by `--target-language`
   * cpp
   * csharp
   * go
   * java
   * js
   * nanoarrow

Used matrix:

| Producer | Consumer |
|----------|----------|
| Rust     | Rust     |
| Rust     | C++      |
| Rust     | C#       |
| Rust     | Go       |
| Rust     | Java     |
| Rust     | JS       |
| Rust     | nanoarrow|
| C++      | Rust     |
| C#       | Rust     |
| Go       | Rust     |
| Java     | Rust     |
| JS       | Rust     |
| nanoarrow| Rust     |
@alamb
Copy link
Contributor

alamb commented Sep 13, 2024

In apache/arror-rs, the following patterns are redundant because they should be done in apache/arrow:

Can you give me some hint about how to reduce the test matrix in arrow-rs to remove the redundancies?

Here is how the job is run

https://github.com/apache/arrow-rs/blob/889a90436ffd69e94975049b047c413c53b2dfe3/.github/workflows/integration.yml#L56-L67

Perhaps the idea is to wait for #44099 to merge and then add something equivalent to

: ${ARROW_INTEGRATION_TARGET_LANGUAGES:=cpp,csharp,go,java,js}

@kou
Copy link
Member Author

kou commented Sep 13, 2024

Yes. The following change is needed with the current #44099 implementation:

diff --git a/.github/workflows/integration.yml b/.github/workflows/integration.yml
index 41edc1bb19..dcaf03c44a 100644
--- a/.github/workflows/integration.yml
+++ b/.github/workflows/integration.yml
@@ -64,6 +64,7 @@ jobs:
       ARROW_INTEGRATION_GO: ON
       ARROW_INTEGRATION_JAVA: ON
       ARROW_INTEGRATION_JS: ON
+      ARCHERY_INTEGRATION_TARGET_LANGUAGES: "rust"
       ARCHERY_INTEGRATION_WITH_NANOARROW: "1"
       # https://github.com/apache/arrow/pull/38403/files#r1371281630
       ARCHERY_INTEGRATION_WITH_RUST: "1"

But this may be changed.

I'll open a PR in apache/arrow-rs after I complete this. Please wait for a while.

@alamb
Copy link
Contributor

alamb commented Sep 15, 2024

Thank you @kou

kou added a commit that referenced this issue Sep 16, 2024
…4099)

### Rationale for this change

If we enable C++, Java and Rust, we use the following patterns:

| Producer | Consumer |
|----------|----------|
| C++      | C++      |
| C++      | Java     |
| C++      | Rust     |
| Java     | C++      |
| Java     | Java     |
| Java     | Rust     |
| Rust     | C++      |
| Rust     | Java     |
| Rust     | Rust     |

In apache/arrow, the following patterns are redundant because they should be done in apache/arrow-rs:

| Producer | Consumer |
|----------|----------|
| Rust     | Rust     |

In apache/arror-rs, the following patterns are redundant because they should be done in apache/arrow:

| Producer | Consumer |
|----------|----------|
| C++      | C++      |
| C++      | Java     |
| Java     | C++      |
| Java     | Java     |

### What changes are included in this PR?

Add `--target-languages` option. We can specify target languages by this. Here are expected usages:

In apache/arrow:
* `--target-languages=cpp,csharp,go,java,js`

In apache/arrow-rs
* `--target-languages=rust`

Here is an example in apache/arrow-rs:

Used matrix:

| Producer | Consumer |
|----------|----------|
| Rust     | Rust     |
| Rust     | C++      |
| Rust     | C#       |
| Rust     | Go       |
| Rust     | Java     |
| Rust     | JS       |
| Rust     | nanoarrow|
| C++      | Rust     |
| C#       | Rust     |
| Go       | Rust     |
| Java     | Rust     |
| JS       | Rust     |
| nanoarrow| Rust     |

If no `--target-languages` is specified, all enabled languages are the target languages. (The same as the current behavior.)

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #44062

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 18.0.0 milestone Sep 16, 2024
@kou
Copy link
Member Author

kou commented Sep 16, 2024

Issue resolved by pull request 44099
#44099

@kou
Copy link
Member Author

kou commented Sep 17, 2024

apache/arrow-rs#6406

paleolimbot added a commit to apache/arrow-nanoarrow that referenced this issue Sep 17, 2024
…tegration test matrix (#611)

Thanks to apache/arrow#44062 , we can run
fewer jobs in our integration test matrix (and benefit from better
GitHub actions grouping!).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants