Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34060: Report system load when running test suite for Windows #8357

Merged
merged 4 commits into from
Apr 9, 2019

Conversation

ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Jul 20, 2018

This is (mostly) a pure Python implementation of the other PR. It leverages the typeperf command which monitors performance counters and outputs them at a given interval. So every 5 seconds, typeperf can output the processor queue length into stdout.

subprocess.stdout.readline is a blocking call. Using a thread seemed like an obvious solution, but we can't achieve this with multiprocessing or a thread, because like Victor speculated in the previous bug report on this, it conflicts with test_multiprocessing and test_threading. Hence, I opted to use the asynchronous/overlapped IO API which was designed for async. Most of the diff actually just pertains to using this rather low level API.

This is almost a pure python implementation but there was one edge case where this would fail. Namely, when the python interpreter running the test suite crashes, this leaves an orphaned typeperf process running which refuses to die. This means that when the test suite is run with -j x and this situation happens:

python -m test -j2
├── python <test runner>
│   └── typeperf.exe
└── python *CRASHED*
    └── typeperf.exe

The big test coordinating python process will wait forever on the crashed python and consequently typeperf to terminate, which just doesn't happen by default in Windows. After reading up on the APIs, the right way to fix this is by using a Job Object to ask the OS to kill the child when the parent dies. Hence, there is a change in _winapi to make this happen. Unlike the last PR, this API is actually reusable and fit to be exposed to the public. It could even allow implementing things like bpo-5115 to be a lot easier.

https://bugs.python.org/issue34060

Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! And I'll be happy to have support for job objects in there too :)

Looking at the test runs, the numbers seem to be consistent with other platforms, but since I don't have as good a feel for what to expect here, I'd like someone else who's been involved in this or the previous PR to sign off as well.

Py_UNICODE *job_name)
/*[clinic end generated code: output=ddb90fccf4f363b9 input=f1350a07abdab61c]*/
{
HANDLE job_handle = CreateJobObjectW(NULL, job_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should surround the non-Python parts of this function with _Py_BEGIN_ALLOW_THREADS and _Py_END_ALLOW_THREADS to release the GIL. Similarly for the Assign function.

self.getloadavg = lambda: os.getloadavg()[0]
elif sys.platform == 'win32':
self.load_tracker = WindowsLoadTracker()
self.getloadavg = lambda: self.load_tracker.getloadavg()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be self.getloadavg = self.load_tracker.getloadavg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since there are no later references to self.load_tracker, that doesn't need to be kept on the object (the references via the method will keep it from being deallocated)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks :)

@ammaraskar
Copy link
Member Author

I'd like someone else who's been involved in this or the previous PR to sign off as well.

Yeah I'd definitely like to get @vstinner's take on it since he is likely familiar with normal load values.


if (job_handle == NULL) {
PyErr_SetFromWindowsErr(GetLastError());
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HANDLE_return_converter requires INVALID_HANDLE_VALUE for an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@vstinner
Copy link
Member

subprocess.stdout.readline is a blocking call. Using a thread seemed like an obvious solution, but we can't achieve this with multiprocessing or a thread, because like Victor speculated in the previous bug report on this, it conflicts with test_multiprocessing and test_threading.

Did I say thatI? A thread is fine here. regrtest already uses threads to run tests in subprocesses when using -jN. faulthandler also uses a C thread to implement an hard timeout (dumping the Python traceback on timeout). regrtest is full of threads :-) Overlapped IO may be more complicated than a thread, no?


# Spawn off the load monitor
command = ['typeperf', COUNTER_NAME, '-si', str(SAMPLING_INTERVAL)]
self.p = subprocess.Popen(command, stdout=command_stdout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python still supports Windows 7, which only allows a process to be assigned to one job at a time. If the OS version is prior to NT 6.2 (i.e. sys.getwindowsversion() < (6, 2)), use the creation flag
CREATE_BREAKAWAY_FROM_JOB (0x01000000) to try to break the child process out of the current job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah thank you, I've been testing on Windows 10 and didn't read the documentation carefully enough, is there any particular reason to scope it down to <NT 6.2? Will there be a problem with just using CREATE_BREAKAWAY_FROM_JOB as the creationflags all the time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to break the child out of the job hierarchy in Windows 8+, especially since jobs are used more frequently (e.g. to implement silos, such as for Windows containers).

Also, trying to break away from the parent job complicates creating the process. I forgot to mention that Popen will fail with a PermissionError if the child isn't allowed to break away. This can be handled by retrying without CREATE_BREAKAWAY_FROM_JOB. In this case you could skip calling AssignProcessToJobObject. For example (untested):

CREATE_BREAKAWAY_FROM_JOB = 0x01000000

cflags = 0 if sys.getwindowsversion() >= (6, 2) else CREATE_BREAKAWAY_FROM_JOB
assign_to_job = True

try:
    self.p = subprocess.Popen(command, stdout=command_stdout,
                creationflags=cflags)
except PermissionError:
    if cflags == 0:
        raise
    self.p = subprocess.Popen(command, stdout=command_stdout)
    assign_to_job = False

if assign_to_job:
    _winapi.AssignProcessToJobObject(job_group, self.p._handle)

self.p = subprocess.Popen(command, stdout=command_stdout)

# Attach it to our job group
_winapi.AssignProcessToJobObject(job_group, self.p._handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this in a try/except that handles failure either by ignoring it or with a warning. AssignProcessToJobObject will fail in Windows 7 if the parent process (i.e. the current Python process) is in a job that doesn't allow child processes to break away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll issue a warning since this to handle a rather rare case of the interpreter actually crashing in -jN mode.

// we only support a subset of all the limits possible, can be expanded
// in the future if needed.
limit.BasicLimitInformation.LimitFlags =
(DWORD) getulong(limit_info, "LimitFlags");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should reflect the WinAPI structure, i.e. the BasicLimitInformation field. Also, it would be nice to have PyWin32 interoperability, which does the latter and also uses dicts instead of simple namespaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I based this on CreateProcess which also uses attributes in a passed in object for this information. What do you mean by reflect the structure? Do you mean support all the fields for BasicLimitInformation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here is how subprocess passes uses it with CreateProcess

cpython/Lib/subprocess.py

Lines 129 to 137 in 06ca3f0

class STARTUPINFO:
def __init__(self, *, dwFlags=0, hStdInput=None, hStdOutput=None,
hStdError=None, wShowWindow=0, lpAttributeList=None):
self.dwFlags = dwFlags
self.hStdInput = hStdInput
self.hStdOutput = hStdOutput
self.hStdError = hStdError
self.wShowWindow = wShowWindow
self.lpAttributeList = lpAttributeList or {"handle_list": []}

I just used simple namespace because I was being lazy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes a dict is used, e.g. STARTUPINFO uses a dict for the new lpAttributeList support in 3.7. win32job uses dicts, but it's fine to stick with namespaces instead. I still would rather LimitFlags be set in a BasicLimitInformation attribute instead of at the top level. _winapi is low-level and undocumented, so it's best if it mirrors the actual API, if it's not overly awkward. I didn't meant to support all fields, however. _winapi gets extended only as required.

@ammaraskar
Copy link
Member Author

ammaraskar commented Jul 21, 2018

This is the failure that shows up when using a thread:

0:00:21 load avg: 0.07 [ 16/417] test__xxsubinterpreters
test test__xxsubinterpreters failed -- Traceback (most recent call last):
  File "C:\Users\ammar\workspace\cpython\lib\test\test__xxsubinterpreters.py", line 473, in test_main
    self.assertTrue(interpreters.is_running(main))
RuntimeError: interpreter has more than one thread

@ammaraskar
Copy link
Member Author

ammaraskar commented Jul 21, 2018

So I think I jumped the gun early with the job grouping stuff. There was a much easier solution to dealing with interpreter crashes in -jN mode. Only run the load tracking subprocess in the main interpreter coordinating the children. It's the only one that needs the information since it prints out the progress reports.

This is now actually just pure python and consequently a lot simpler.

self.getloadavg = None
if hasattr(os, 'getloadavg'):
self.getloadavg = lambda: os.getloadavg()[0]
elif sys.platform == 'win32' and (self.ns.slaveargs is None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to import the class there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -533,6 +534,13 @@ def main(self, tests=None, **kwargs):
def _main(self, tests, kwargs):
self.ns = self.parse_args(kwargs)

self.getloadavg = None
if hasattr(os, 'getloadavg'):
self.getloadavg = lambda: os.getloadavg()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike lambda. Would you mind to define a method here using "def" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer lambda for this sort of short stuff but done

@@ -54,3 +58,94 @@ def printlist(x, width=70, indent=4, file=None):
print(textwrap.fill(' '.join(str(elt) for elt in sorted(x)), width,
initial_indent=blanks, subsequent_indent=blanks),
file=file)

BUFSIZE = 8192
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the new code in a new file, like libregrtest/winloadavg.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

COUNTER_NAME = r'\System\Processor Queue Length'

"""
This class asynchronously interacts with the `typeperf` command to read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the docstring inside the class.


def __del__(self):
self.p.kill()
self.p.wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use del() but add a close() method for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you sure about this? It would require us to keep a reference to load_tracker and it needs an:

if self.load_tracker:
    self.load_tracker.stop()

because we don't have a load tracker to stop on non windows systems.

return self.load

# Process the backlog of load values
for line in typeperf_output.splitlines():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the last line is incomplete: doesn't end with a newline character? Maybe you should put it back into a buffer, and concatenate it to the output, next time. Maybe use .splitlines(True) to check if there is a newline character?

Copy link
Member Author

@ammaraskar ammaraskar Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its worth adding the extra complexity to handle this. Worst case is we miss a single point or two of data and the load number is slightly off.


BUFSIZE = 8192
LOAD_FACTOR_1 = 0.9200444146293232478931553241
SAMPLING_INTERVAL = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to document the unit (seconds, no?).

# Close our copy of the write end of the pipe
os.close(command_stdout)

def read_output(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that this function is only called every 5 minutes. I expect a lot of output in this case.

Why not running this function inside a thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failure shows up when running in a thread: #8357 (comment)

Even at 5 minutes, that's 60 points of data. Even if the typeperf command puts out 100 bytes of output per line its not enough to saturate the buffer. And even if it does, those data points will get picked up eventually by the next call.

self.getloadavg = lambda: os.getloadavg()[0]
elif sys.platform == 'win32' and (self.ns.slaveargs is None):
load_tracker = WindowsLoadTracker()
self.getloadavg = load_tracker.getloadavg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems like you spawn a subprocess even in test worker processes, whereas you wrote that it's useless. This code should be moved after handling slaveargs, no? Maybe move the slaveargs handling code earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, we already know if the runner is a slave based on the parsing of the arguments. The arguments have been parsed by this point, so this should be fine. I also tested and under task manager only one typeperf instance shows up under the main python process unlike before.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@ammaraskar
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@zooba, @vstinner: please review the changes made to this pull request.

@csabella
Copy link
Contributor

It looks like there was a lot of interest and activity on this PR a few months ago and @zooba had approved it. @ammaraskar, could you resolve the merge conflict? I think that might be all that is needed, along with @vstinner's approval for merging.

Thanks!

@vstinner vstinner dismissed their stale review January 23, 2019 17:11

It seems like my comments have been addressed.

@vstinner
Copy link
Member

I'm sorry but I don't have the bandwidth right to review this change (test it manually).

@ammaraskar: You have to update your PR, there is now a conflict.

@zware: If you are confident that the change is good, please go ahead and merge it (once CI tests and the conflict is solved).

@zware
Copy link
Member

zware commented Feb 1, 2019

I haven't researched whether this is the best way to do this to any extent, but this looks fine to me.

While Windows exposes the system processor queue length: the raw value
used for load calculations on Unix systems, it does not provide an API
to access the averaged value. Hence to calculate the load we must track
and average it ourselves. We can't use multiprocessing or a thread to
read it in the background while the tests run since using those would
conflict with test_multiprocessing and test_xxsubprocess.

Thus, we use Window's asynchronous IO API to run the tracker in the
background with it sampling at the correct rate. When we wish to access
the load we check to see if there's new data on the stream, if there is,
we update our load values.
@ammaraskar ammaraskar force-pushed the windows_load2 branch 3 times, most recently from fb0a14d to 31a71df Compare February 9, 2019 22:44
@ammaraskar
Copy link
Member Author

Thanks for the reminder Cheryl, appreciate it! :)

@zware Just fixed the merge conflict, please take a look.
@zooba If you're available for a re-review, that would be appreciated as well.

@csabella csabella merged commit e16467a into python:master Apr 9, 2019
@csabella
Copy link
Contributor

csabella commented Apr 9, 2019

Based on @zooba's approval and the other consensus on this, I've merged the PR. Thanks @ammaraskar for the PR and to @zware, @eryksun, @vstinner, and @zooba for the reviews! 🙂

@vstinner
Copy link
Member

Thanks @ammaraskar!

vstinner added a commit that referenced this pull request Apr 26, 2019
* Clean up code which checked presence of os.{stat,lstat,chmod} (GH-11643)

(cherry picked from commit 8377cd4)

* bpo-36725: regrtest: add TestResult type (GH-12960)

* Add TestResult and MultiprocessResult types to ensure that results
  always have the same fields.
* runtest() now handles KeyboardInterrupt
* accumulate_result() and format_test_result() now takes a TestResult
* cleanup_test_droppings() is now called by runtest() and mark the
  test as ENV_CHANGED if the test leaks support.TESTFN file.
* runtest() now includes code "around" the test in the test timing
* Add print_warning() in test.libregrtest.utils to standardize how
  libregrtest logs warnings to ease parsing the test output.
* support.unload() is now called with abstest rather than test_name
* Rename 'test' variable/parameter to 'test_name'
* dash_R(): remove unused the_module parameter
* Remove unused imports

(cherry picked from commit 4d29983)

* bpo-36725: Refactor regrtest multiprocessing code (GH-12961)

Rewrite run_tests_multiprocess() function as a new MultiprocessRunner
class with multiple methods to better report errors and stop
immediately when needed.

Changes:

* Worker processes are now killed immediately if tests are
  interrupted or if a test does crash (CHILD_ERROR): worker
  processes are killed.
* Rewrite how errors in a worker thread are reported to
  the main thread. No longer ignore BaseException or parsing errors
  silently.
* Remove 'finished' variable: use worker.is_alive() instead
* Always compute omitted tests. Add Regrtest.get_executed() method.

(cherry picked from commit 3cde440)

* bpo-36719: regrtest always detect uncollectable objects (GH-12951)

regrtest now always detects uncollectable objects. Previously, the
check was only enabled by --findleaks. The check now also works with
-jN/--multiprocess N.

--findleaks becomes a deprecated alias to --fail-env-changed.

(cherry picked from commit 75120d2)

* bpo-34060: Report system load when running test suite for Windows (GH-8357)

While Windows exposes the system processor queue length, the raw value
used for load calculations on Unix systems, it does not provide an API
to access the averaged value. Hence to calculate the load we must track
and average it ourselves. We can't use multiprocessing or a thread to
read it in the background while the tests run since using those would
conflict with test_multiprocessing and test_xxsubprocess.

Thus, we use Window's asynchronous IO API to run the tracker in the
background with it sampling at the correct rate. When we wish to access
the load we check to see if there's new data on the stream, if there is,
we update our load values.


(cherry picked from commit e16467a)

* bpo-36719: Fix regrtest re-run (GH-12964)

Properly handle a test which fail but then pass.

Add test_rerun_success() unit test.

(cherry picked from commit 837acc1)

* bpo-36719: regrtest closes explicitly WindowsLoadTracker (GH-12965)

Regrtest.finalize() now closes explicitly the WindowsLoadTracker
instance.

(cherry picked from commit 00db7c7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants