Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use State to track the order of MongoDB transactions #1742

Merged

Conversation

kpattaswamy
Copy link
Member

@kpattaswamy kpattaswamy commented Oct 17, 2023

Summary: This PR adds functionality to track the order of transactions as they are parsed. It adds the streamID of each request to the state vector at parsing time which will then be used to iterate over at stitching time. Adding this will mainly help the stitching process when trying to stitch a request with N moreToCome responses.

Motivation behind this change:
The stitching implementation relies on the new interface using absl::flat_hash_map to store the streamID and a deque of request/response frames. We then use a response led matching algorithm where we loop through the response map and stitch the first response frame in a deque with its corresponding request frame. A response pairs with a request when both frames share the same streamID, the response frame's streamID is its responseTo and the request frame's streamID is its requestID.

MongoDB's OP_MSG wire protocol has the concept of more_to_come which means that the server could send N responses back to a singular request by the client. Each frame in the series of the N responses are linked by the responseTo of the frame matching the requestID of the previous response frame, similar to a singly linked list. The head response frame's responseTo will be the requestID of the request frame. Note: the requestID of each frame in the N more_to_come frames is random and unique.

At the time of stitching, if we do not use state to track the order of transactions we would iterate over the response map in a "random" order and could iterate over the more_to_come response frames out of order. We could lose context on how the more_to_come frames are linked due to not knowing the head response frame and if we were to iterate over the end of the more_to_come message before looping through all prior more_to_come frames in the message they would be dropped since we do not know which request those frames are responding to. To solve this issue, tracking the order of transactions' streamIDs to iterate over would ensure that could use the response led stitching approach and find the complete more_to_come message for a given request.

New test case:
The new test case checks to make sure the state's stream_order vector is correctly populated with the order of streamIDs as we parse new request frames (transactions). The test case parses 3 frames and expects that the state's stream_order after parsing the first frame to contain std::pair<917, false> since the first frame's requestID is 917. It expects the stream_order to contain std::pair<917, false>, std::pair<444, false> after parsing the second request frame since that frame's requestID is 444 and so on.

Related issues: #640

Type of change: /kind feature

Test Plan: Modified the existing tests and added another test to make sure the vector is populated correctly.

@kpattaswamy kpattaswamy marked this pull request as ready for review October 17, 2023 18:41
@kpattaswamy kpattaswamy requested a review from a team October 17, 2023 18:41
Copy link
Member

@ddelnano ddelnano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in the process of reviewing the stitcher change. I want to understand how this is leveraged fully before giving the final review, but here are some additional comments I had.

Copy link
Member

@ddelnano ddelnano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor comment, but otherwise lgtm!

Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
@etep
Copy link

etep commented Oct 30, 2023

Could you expand the PR description to explain what the underlying issue is/was and how the state helps address the issue? Also, could you describe the test case in more detail (i.e. in the PR description). Something like: "the test case parses 3 frames. The first frame is a request with XYZ characteristics. After parsing that frame, the test expects to find the state vector in this specific form because X."

Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
@kpattaswamy kpattaswamy requested a review from a team October 30, 2023 20:15
@JamesMBartlett JamesMBartlett merged commit b868d8c into pixie-io:main Nov 2, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants