Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

fgiloux
Copy link
Contributor

@fgiloux fgiloux commented Mar 14, 2023

Summary

In some rare cases (~1%) there are duplicated Partitions being created during the reconciliation of a PartitionSet. This PR adds a check for a duplicate and removes it if necessary.

Example from e2e tests:

Name:test-partitionset-partition-test-region-1-c2d6x
UID:10830e37-23f1-47b4-a7d9-853898ffeed3
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-1
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Name:test-partitionset-partition-test-region-2-hwdrt
UID:3351ac9d-c3ed-48d9-a4a2-71d8d1e48253
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-2
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Name:test-partitionset-partition-test-region-2-t9qlv
UID:33b6759c-f69e-4070-ae03-97d449b832be
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-2
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Related issue(s)

Fixes #2865

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@ncdc
Copy link
Member

ncdc commented Mar 14, 2023

@fgiloux do you know why duplicates are being created?

@@ -146,7 +146,25 @@ func (c *controller) reconcile(ctx context.Context, partitionSet *topologyv1alph
partitionKey = partitionKey + "+" + key + "=" + oldMatchLabels[key]
}
if _, ok := matchLabelsMap[partitionKey]; ok {
existingMatches[partitionKey] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please use sets.String

@stevekuznetsov
Copy link
Contributor

Seems like the previous logic handled the delete-finally/tombstone case and the create-new case, but not the update case. I would suggest doing a real update or patch - if one reconciliation loop does both the delete & update you can get into lots of goofy race conditions.

@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 14, 2023

@ncdc, no I don't know. They are not created in the same reconciliation cycle. The count in the PartitionSet status is correctly set to 2, which is the length of the map from which Partitions get created.
When the first Partition gets created it triggers a new cycle. The only thing I see is if this cycle starts before the previous one has terminated and the 2nd Partition been created.

@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 14, 2023

Seems like the previous logic handled the delete-finally/tombstone case and the create-new case, but not the update case.

If I correctly understand what you are writing: Partitions don't get updated. If they match the current state of the PartitionSet they are kept otherwise they are removed

@ncdc
Copy link
Member

ncdc commented Mar 14, 2023

I'm not familiar with this code, but this sounds potentially similar to some things I've seen in the past: if you are operating on data from the shared informer cache, it's possible your counts can be wrong, leading to duplicate creations. Here's an example from Cluster API:

  1. Check the cache to see if any control plane machines exist
  2. If count is 0, create the first control plane machine (this one is special and runs kubeadm init instead of kubeadm join)
  3. Otherwise, create an additional control plane machine (kubeadm join)

We ran into a problem where we saw multiple "first" control plane machines being created because the reconcile function was called multiple times when the informer for machines was still showing 0 machines.

We fixed this by doing a live client list to get the machine count, instead of relying on the cache.

Not sure if this is relevant for your code, but maybe?

@ncdc
Copy link
Member

ncdc commented Mar 14, 2023

/hold

for now - would like to continue discussing why we think multiples are being created and if we can avoid that entirely

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 14, 2023

Not sure if this is relevant for your code, but maybe?

Yes that could be it. For PartitionSet this would however require multiple Get calls rather than a single List one. Each Partition is (should be) unique.

@stevekuznetsov
Copy link
Contributor

@fgiloux if the code that you wrote in this PR runs, would the result be a DELETE followed by a CREATE in one reconcile loop? I was considering that to be the missing UPDATE case.

@fgiloux fgiloux force-pushed the flake-2865 branch 2 times, most recently from 702e3d1 to e5a6863 Compare March 14, 2023 20:23
@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 14, 2023

@ncdc I ran the test 100 times and it did not fail.
If the approach meets your expectations I can clean up what I have commented out and remove the indexing by PartitionSet, which is not used anymore.

@ncdc
Copy link
Member

ncdc commented Mar 14, 2023

Sounds like a good plan. Is there ever a situation where it's ok to accidentally create duplicates, or do we really want to avoid them at all costs? I'm guessing we want to avoid?

@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 15, 2023

Is there ever a situation where it's ok to accidentally create duplicates, or do we really want to avoid them at all costs? I'm guessing we want to avoid?

It is surely not desirable to have duplicates. Whether it is ok to accidentally create them in some situation I don't know. As a human interface it may not be an issue if it is corrected quickly enough. I think that it will depend on the upper mechanisms that will be built on top of this API and their tolerance of duplicates. The future may tell.

result := []*topologyv1alpha1.Partition{}
for i := range partitions.Items {
for j := range partitions.Items[i].OwnerReferences {
if partitions.Items[i].OwnerReferences[j].UID == partitionSet.UID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also check API group and kind?

listShards func(selector labels.Selector) ([]*corev1alpha1.Shard, error)
listPartitionSets func() ([]*topologyv1alpha1.PartitionSet, error)
getPartitionSet func(clusterName logicalcluster.Name, name string) (*topologyv1alpha1.PartitionSet, error)
// getPartitionsByPartitionSet func(partitionSet *topologyv1alpha1.PartitionSet) ([]*topologyv1alpha1.Partition, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a commented-out line here

@@ -71,12 +70,22 @@ func NewController(
getPartitionSet: func(clusterName logicalcluster.Name, name string) (*topologyv1alpha1.PartitionSet, error) {
return partitionSetClusterInformer.Lister().Cluster(clusterName).Get(name)
},
getPartitionsByPartitionSet: func(partitionSet *topologyv1alpha1.PartitionSet) ([]*topologyv1alpha1.Partition, error) {
key, err := kcpcache.MetaClusterNamespaceKeyFunc(partitionSet)
// getPartitionsByPartitionSet is calling the API directly as otherwise duplicates are created when the cache is not updated quickly enough.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// getPartitionsByPartitionSet is calling the API directly as otherwise duplicates are created when the cache is not updated quickly enough.
// getPartitionsByPartitionSet calls the API directly instead of using an indexer as otherwise duplicates are created when the cache is not updated quickly enough.

@@ -193,7 +193,7 @@ func TestPartitionSet(t *testing.T) {
if len(partitions.Items) == 1 {
return true, ""
}
return false, fmt.Sprintf("expected 1 partition, but got %d", len(partitions.Items))
return false, fmt.Sprintf("expected 1 partition, but got %d, details: %s", len(partitions.Items), spew.Sdump(partitions.Items))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this prints hundreds of lines into the log - any way to make it more minimal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I tried first to only print out the last state outside of the loop but did not get what I expected. On the other hand it also provides the history.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - we can let it mature over time

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@stevekuznetsov
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 23, 2023

@ncdc anything left before this PR can get merged?

@ncdc
Copy link
Member

ncdc commented Mar 27, 2023

Was on PTO; sorry I didn't clear the hold sooner!

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2023
@ncdc
Copy link
Member

ncdc commented Mar 27, 2023

CI DNS issue
/test e2e-multiple-runs

@openshift-merge-robot openshift-merge-robot merged commit 3eca582 into kcp-dev:main Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiexports lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake: TestPartitionSet
4 participants