Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in HostSetImpl::chooseLocality() #4061

Merged
merged 6 commits into from
Aug 8, 2018

Conversation

ambuc
Copy link
Contributor

@ambuc ambuc commented Aug 6, 2018

Description: There is a bug in HostSetImpl::chooseLocality() where the pick()d locality could be nullptr. (and Envoy will crash on nullptr->access() as a result).

This is because we won't build a schedule if there are no weighted localities, but we will build a schedule (albiet an empty one) if there are localities with weight zero. This PR introduces an EdfScheduler<C>::empty() fn which essentially walks thru the same logic as EdfScheduler<C>::pick(), except it does not pop the C at the end. This has the added benefit(?) of pruning expired weakptrs along the way, although it does not promise to prune all of them.

Risk Level: Low
Testing: Wrote new now-passing test under //test/common/upstream:upstream_impl_test.

Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
@htuch htuch self-assigned this Aug 6, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be done at construction time, rather than at pick time; this causes us to spin through the entire queue on each pick.

@mattklein123
Copy link
Member

I think this should be done at construction time, rather than at pick time; this causes us to spin through the entire queue on each pick.

+1

Signed-off-by: James Buckland <jbuckland@google.com>
@@ -208,6 +208,9 @@ void HostSetImpl::updateHosts(HostVectorConstSharedPtr hosts,
locality_scheduler_->add(effective_weight, locality_entries_.back());
}
}
if (locality_scheduler_->empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused here; in the above loop, we don't add entries to locality_schduler_ if they have zero weight. How do we get into the situation where locality_scheduler_ is populated with only entries with zero weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a scenario where locality_scheduler_ is created, but we don't populate it at all. I don't know the mechanism but I suspect there's a path where effectiveLocalityWeight returns 0 for all hosts, and no entries get written. This might not get caught by the preconditions in that outer if clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do a lighter weight check for locality_scheduler_.empty() that doesn't require us to cycle through all hosts, just looking at an empty queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the queue contains localities but they are expired? Checking to see if the queue is empty doesn't guarantee that .pick() won't return a nullptr, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locality_scheduler_ should be brand new in this function, as should any entries, so there should be no change for something to expire. This does raise an interesting bug though; we should probably perform a locality_scheduler_.reset() above the if statement, to ensure it is empty unless we explicitly initialize it. updateHosts is a "here's the entire new state of the world" operation, we should rebuild everything.

…construction.

Signed-off-by: James Buckland <jbuckland@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this looks good! Just one nit.

* Implements empty() on the internal queue. Does not attempt to discard expired elements.
* @return bool whether or not the internal queue is empty.
*/
bool empty() { return queue_.empty(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should be bool empty() const.

Signed-off-by: James Buckland <jbuckland@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@htuch htuch merged commit 70e9878 into envoyproxy:master Aug 8, 2018
@ambuc ambuc deleted the chooseLocality branch August 9, 2018 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants