Time-Travel diagnostics in Node #164

mrkmarron · 2018-02-15T21:03:30Z

At the Node Diagnostics WG there was discussion around what is needed to make time-travel debugging more generally available. Key work items identified to make this happen include:

Work to refactor certain coding idioms into more record/replay friendly forms. One example includes the AliasedBuffer class.
Discussion around user experience and surfacing of this functionality. We have experimented with several aspects:
- Adding a new domain, TimeTravel, to the debugger API.
- Using this to added time-travel support to the builtin inspect debugger as well as VSCode.
- Adding a new module, trace_mgr.js, to enable simple generation of traces at common error points (unhandled exceptions, abnormal exits, console.error writes, etc.).
- Getting additional feedback on the value and impact of these use cases would be very useful in prioritizing and adjusting the implementation.
Structuring code to make the implementation of the needed record/replay/snapshot code as VM neutral and easy as possible. More below.

Providing TimeTravel as a functionality in Node uniformly instead of as a vendor specific feature is a challenge.

One option is for each vendor to implement it primarily in the VM as is currently done in ChakraCore now. This approach has the upside of requiring minimal changes in Node but is likely to be problematic as it requires extensive duplication of almost identical functionality in each VM providing the support and, in the case of V8 since Node currently calls directly into their raw API's, the addition record/replay support presents potentially large maintainability and performance concerns.

The second approach, as discussed during the WG, involves using N-API to move from direct calls to V8 API's. This has the immediate benefit of a single location to put a large body of common record/replay code currently implemented in Chakra's JsRT host embedding API (for example here, which drives the logger in ChakraCore) with a neutral shared implementation based around the N-API specification. As N-API is at a higher level that JsRT this will also have the benefit of decreasing the overhead of running in record mode. However, there are several items that need investigation/work for this approach:

The record/replay code must track identity for opaque references passed to/from various calls. ChakraCore has a non-moving collector so we can trivially just take pointer ids. Other VM's may use moving collectors which require more sophisticated approaches, e.g., adding an explicit tag field to a napi_value or updating tag info whenever the GC moves it.
A N-API style model needs to be adopted in core so that there are no longer any direct calls through V8 API's.
A smaller API needs to be defined which the VM can implement to support any features which cannot be included in the general Node layer, e.g., record Data.Time calls in the VM, take/restore a snapshot, etc.

mcollina · 2018-02-16T23:23:12Z

Good writeup. I fear the N-API approach is not feasible - adopting that would have a noticeable performance overhead, and I'm not confident if the overall group would accept the tradeoff.

mhdawson · 2018-02-20T02:02:46Z

Is there a way to limit the scope of

"A N-API style model needs to be adopted in core so that there are no longer any direct calls through V8 API's."

to something smaller than "everything". I agree that we are long way from removing all direct references to v8, but if a smaller subset is possible (even if that only gives us a subset of the history), then there might be a way to move towards incremental progress. Otherwise we are probably stuck with the option of it being implemented in the JavaScript engines for at least a number of years.

mrkmarron · 2018-02-23T06:37:45Z

One other option is to take the approach of incrementally moving towards a full N-API implementation via conditional compilation to provide a "diagnostic" build with some overhead and a zero cost production build during the process of N-API'ing code and working out performance issues that arise.

The limited scope TTD implementation could be possible but I think it could be difficult for people to understand when/why TTD doesn't work in some cases.

github-actions · 2020-07-18T00:37:05Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mmarchini · 2020-07-21T01:06:15Z

Was going to let this be closed, but @bnb mentioned something about it on Twitter so maybe there's hope this will be a Node.js/V8 feature some day ^^

bnb · 2020-07-21T01:11:41Z

Extending that, if folks have front-end use cases (obviously not the right forum here but want to be clear on my... path to success) and would like to really see this land in Chromium (leading to it being available in Edge, Chrome, Electron, and other Chromium-based projects) please reach out to me.

I'd like to chat with you about yourself, why you want it, where you want it, and how it could be valuable to you.

github-actions · 2022-08-04T00:26:49Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mhdawson mentioned this issue Feb 16, 2018

Diagnostics Summit - Recap and Actions #162

Closed

11 tasks

mmarchini mentioned this issue Apr 25, 2018

Strategic Initiatives/Champions like we have on TSC #185

Closed

Qard mentioned this issue Aug 5, 2018

[Mentorship Diary] Princiya and Stephen nodejs/mentorship#85

Closed

github-actions bot added the stale label Jul 18, 2020

mmarchini added never stale and removed stale labels Jul 21, 2020

github-actions bot added the stale label Aug 4, 2022

github-actions bot closed this as completed Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time-Travel diagnostics in Node #164

Time-Travel diagnostics in Node #164

mrkmarron commented Feb 15, 2018

mcollina commented Feb 16, 2018

mhdawson commented Feb 20, 2018

mrkmarron commented Feb 23, 2018

github-actions bot commented Jul 18, 2020

mmarchini commented Jul 21, 2020

bnb commented Jul 21, 2020

github-actions bot commented Aug 4, 2022

Time-Travel diagnostics in Node #164

Time-Travel diagnostics in Node #164

Comments

mrkmarron commented Feb 15, 2018

mcollina commented Feb 16, 2018

mhdawson commented Feb 20, 2018

mrkmarron commented Feb 23, 2018

github-actions bot commented Jul 18, 2020

mmarchini commented Jul 21, 2020

bnb commented Jul 21, 2020

github-actions bot commented Aug 4, 2022