-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in HostSetImpl::chooseLocality()
#4061
Conversation
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be done at construction time, rather than at pick time; this causes us to spin through the entire queue on each pick.
+1 |
Signed-off-by: James Buckland <jbuckland@google.com>
@@ -208,6 +208,9 @@ void HostSetImpl::updateHosts(HostVectorConstSharedPtr hosts, | |||
locality_scheduler_->add(effective_weight, locality_entries_.back()); | |||
} | |||
} | |||
if (locality_scheduler_->empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused here; in the above loop, we don't add entries to locality_schduler_
if they have zero weight. How do we get into the situation where locality_scheduler_
is populated with only entries with zero weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a scenario where locality_scheduler_
is created, but we don't populate it at all. I don't know the mechanism but I suspect there's a path where effectiveLocalityWeight
returns 0 for all hosts, and no entries get written. This might not get caught by the preconditions in that outer if clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do a lighter weight check for locality_scheduler_.empty()
that doesn't require us to cycle through all hosts, just looking at an empty queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the queue contains localities but they are expired? Checking to see if the queue is empty doesn't guarantee that .pick() won't return a nullptr, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locality_scheduler_
should be brand new in this function, as should any entries, so there should be no change for something to expire. This does raise an interesting bug though; we should probably perform a locality_scheduler_.reset()
above the if
statement, to ensure it is empty unless we explicitly initialize it. updateHosts
is a "here's the entire new state of the world" operation, we should rebuild everything.
…construction. Signed-off-by: James Buckland <jbuckland@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this looks good! Just one nit.
* Implements empty() on the internal queue. Does not attempt to discard expired elements. | ||
* @return bool whether or not the internal queue is empty. | ||
*/ | ||
bool empty() { return queue_.empty(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be bool empty() const
.
Signed-off-by: James Buckland <jbuckland@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Description: There is a bug in
HostSetImpl::chooseLocality()
where the pick()d locality could be nullptr. (and Envoy will crash on nullptr->access() as a result).This is because we won't build a schedule if there are no weighted localities, but we will build a schedule (albiet an empty one) if there are localities with weight zero. This PR introduces an
EdfScheduler<C>::empty()
fn which essentially walks thru the same logic asEdfScheduler<C>::pick()
, except it does not pop theC
at the end. This has the added benefit(?) of pruning expired weakptrs along the way, although it does not promise to prune all of them.Risk Level: Low
Testing: Wrote new now-passing test under
//test/common/upstream:upstream_impl_test
.