Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable large payload stages #14023

Merged
merged 3 commits into from
Feb 6, 2023

Conversation

andre4i
Copy link
Contributor

@andre4i andre4i commented Feb 6, 2023

Description

The *:large CI stages create lots of traffic and take a long time to run. This could cause pipeline delays, as other ODSP/FRS stages are waiting on the stress-(frs|odsp)-lock. This has been running for almost a week, the results have been captured and configs have been adjusted to get to a point where they can be integrated with the regular stress tests run.

Re-enabling chunking and compression in the stress tests will be done by #14000 after testing.

There were multiple problems with the original change and leaving them on was mostly done to expose them. Most notable being the chunk size was too small causing the test to send 1000 times more ops (in some cases).

The other issue is that these stages use a lock, so while they run, other odsp/frs stages cannot. If this stage takes too long, it is going to slow down the other CI runs.

@andre4i andre4i requested review from msfluid-bot and a team as code owners February 6, 2023 15:55
@github-actions github-actions bot added area: build Build related issues area: tests Tests to add, test infrastructure improvements, etc labels Feb 6, 2023
@andre4i andre4i requested a review from alexvy86 February 6, 2023 15:55
@github-actions github-actions bot added the base: main PRs targeted against main branch label Feb 6, 2023
@agarwal-navin
Copy link
Contributor

It seems the large payload tests are removed not disabled. Have we identified why they are running for more than a week or do we have a plan to fix them? Will these be re-enabled later?

@andre4i
Copy link
Contributor Author

andre4i commented Feb 6, 2023

It seems the large payload tests are removed not disabled. Have we identified why they are running for more than a week or do we have a plan to fix them? Will these be re-enabled later?

@agarwal-navin good questions. Yes, they are removed. I used the term 'disabled' because the code making it happen is still going to be in the repo (the code allowing for large custom payloads). They will be re-enabled sort of, by #14000, which would allow the regular test to send a large payload every N ops (right now 500 ops) but I'm still testing it privately with https://dev.azure.com/fluidframework/internal/_build/results?buildId=127797&view=results. Timeline is EOD today if stuff goes as planned.

There were multiple problems with the original change and leaving them on was mostly done to expose them. Most notable being the chunk size was too small causing the test to send 1000 times more ops (in some cases).

The other issue is that these stages use a lock, so while they run, other odsp/frs stages cannot. If this stage takes too long, it is going to slow down the other CI runs.

@agarwal-navin
Copy link
Contributor

It seems the large payload tests are removed not disabled. Have we identified why they are running for more than a week or do we have a plan to fix them? Will these be re-enabled later?

@agarwal-navin Navin Agarwal FTE good questions. Yes, they are removed. I used the term 'disabled' because the code making it happen is still going to be in the repo (the code allowing for large custom payloads). They will be re-enabled sort of, by #14000, which would allow the regular test to send a large payload every N ops (right now 500 ops) but I'm still testing it privately with https://dev.azure.com/fluidframework/internal/_build/results?buildId=127797&view=results. Timeline is EOD today if stuff goes as planned.

There were multiple problems with the original change and leaving them on was mostly done to expose them. Most notable being the chunk size was too small causing the test to send 1000 times more ops (in some cases).

The other issue is that these stages use a lock, so while they run, other odsp/frs stages cannot. If this stage takes too long, it is going to slow down the other CI runs.

That makes sense. Thanks! Can you please add this to the description? Having this context will be helpful.

@andre4i andre4i merged commit 45f38c6 into microsoft:main Feb 6, 2023
daesun-park pushed a commit to daesun-park/FluidFramework that referenced this pull request Feb 8, 2023
## Description

The *:large CI stages create lots of traffic and take a long time to
run. This could cause pipeline delays, as other ODSP/FRS stages are
waiting on the `stress-(frs|odsp)-lock`. This has been running for
almost a week, the results have been captured and configs have been
adjusted to get to a point where they can be integrated with the regular
stress tests run.

Re-enabling chunking and compression in the stress tests will be done by
microsoft#14000 after testing.

There were multiple problems with the original change and leaving them
on was mostly done to expose them. Most notable being the chunk size was
too small causing the test to send 1000 times more ops (in some cases).

The other issue is that these stages use a lock, so while they run,
other odsp/frs stages cannot. If this stage takes too long, it is going
to slow down the other CI runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: build Build related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants