Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on parallelism by default, ensure Commands always run serially at the end #3265

Merged
merged 15 commits into from
Aug 8, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jul 17, 2024

  • Remove the hardcoded setting of threadCount = Some(1)

  • Split Tasks and Commands into two separate batches, to only run Commands serially after all Tasks have run

  • Introduce a MILL_TEST_DEST_FOLDER environment variable in our standard TestModule. This is something we needed to do to avoid collisions between parallel test modules in Mill's own build (e.g. everyone stomping over target/worksources) and will be something that Mill users will need as well

Commands are typically the less well-behaved tasks, whereas Targets and others are meant to be pure functional and thread safe without side effects. So running commands serially at the end is a decent tradeoff the do the right thing by default, and if someone wants parallelism they can always use a task (e.g. .testCached instead of .test)

Commands called from other tasks, e.g. those tested by example/misc/5-module-run-task, are run in parallel together with the main group. We assume that these commands are well behaved, as their results will potentially be cached (as part of downstream tasks), and so I run them as part of the primary task graph.

(An alternative could have been to run these sequentially as well, but given the nature of commands-run-as-part-of-tasks I think that would be unnecessarily conservative)

I disable the thread-number log prefixes in the integration tests, since all they would do there is add clutter.

Pull request: #3265

@lihaoyi lihaoyi changed the title Turn on parallelism by default, ensure Commands always run serially Turn on parallelism by default, ensure Commands always run serially at the end Jul 17, 2024
@lefou
Copy link
Member

lefou commented Jul 17, 2024

Since it's not forbidden, it's possible to depend on commands from other targets, even if this is not recommended. When we now delay command execution to happen serially at the end, it could be problematic. We should have at least one test with a target depending on a command and assert correct behavior.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 17, 2024

We have one such test in example/misc/5-module-run-task, and it's failing as expected. Will need to adjust the code to make it passable

@lihaoyi lihaoyi requested review from lefou and lolgab July 18, 2024 03:00
@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 18, 2024

All tests are green, should be ready for a review. This can go into a binary compatible 0.12.x

@lihaoyi lihaoyi added this to the 0.12.0 milestone Jul 18, 2024
// given but run the commands in linear order
evaluateTerminals(
tasks,
if (sys.env.contains("MILL_TEST_SUITE")) _ => ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with some explanation here?

I guess, time is ripe to have a table with all used environment variables in our readme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This is an internal environment variable, so users should not need to know about it

Copy link
Member

@lefou lefou Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had other developers (myself included) in mind. I thought, our readme is targeting devlopers.

Besides, nothing we need to do in this PR, just some loud thinking.

val (_, tasksTransitive0) = Plan.plan(Agg.from(tasks0.map(_.task)))

val tasksTransitive = tasksTransitive0.toSet
val (tasks, commands) = terminals0.partition {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we either add some comment or rename commands to make it more obvious, it's only those commands that don't have downstream dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it leafCommands. Ideally I would want terminalCommands, but Terminal already has another meaning in this code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC, we don't evaluate all command serially, only those, that don't have downstream dependencies. That's a bit odd, as I was hoping, we could the user some guarantee, that commands are always executed serially.

Now, that I think about it, we probably only want commands, that are interactive to run serially. But we currently can't determite, if a tasks wants to be interactive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just interactive commands, there's probably lots of commands to do side effects, tests that are not well behaved (e.g. writing to shared paths on disk), and commands that print stuff (and probably don't want the [#threadid] prefix). Those are all solvable, but I think "only commands at the end of the evaluation graph run serially" is a reasonable compromise that gets us most of the way there

Copy link
Member

@lefou lefou Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

I don't know how often targets depend on commands, but I could imagine, it would be fine to just run those downstream dependencies serially too. So we could make this guarantee for all commands without implementing multiple parallel/serial blocks.

@lefou
Copy link
Member

lefou commented Jul 18, 2024

As a consequence of this PR, users will no longer be able to run command-based tests in parallel. I guess, we should promote our testCached more and/or review the whole usability of our test target setup in the next breaking window.

@lihaoyi
Copy link
Member Author

lihaoyi commented Jul 18, 2024

Yeah we can add testCached to the docs, so people know it's there

@lihaoyi lihaoyi merged commit 044b2a4 into com-lihaoyi:main Aug 8, 2024
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants