🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

fgiloux · 2023-03-14T08:32:28Z

Summary

In some rare cases (~1%) there are duplicated Partitions being created during the reconciliation of a PartitionSet. This PR adds a check for a duplicate and removes it if necessary.

Example from e2e tests:

Name:test-partitionset-partition-test-region-1-c2d6x
UID:10830e37-23f1-47b4-a7d9-853898ffeed3
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-1
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Name:test-partitionset-partition-test-region-2-hwdrt
UID:3351ac9d-c3ed-48d9-a4a2-71d8d1e48253
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-2
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Name:test-partitionset-partition-test-region-2-t9qlv
UID:33b6759c-f69e-4070-ae03-97d449b832be
kcp.io/cluster: 1ep5astrch6m00uw
OwnerReference
	Kind:PartitionSet
	Name:test-partitionset
	UID:073b3acf-ff23-43a1-bf92-63bb6aecfbbc
Spec:
     Selector:
        MatchLabels:
		partition-test-region: partition-test-region-2
        MatchExpressions:
		Key:excluded,
		Operator:DoesNotExist
		Values:[]

Related issue(s)

Fixes #2865

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

ncdc · 2023-03-14T13:04:20Z

@fgiloux do you know why duplicates are being created?

stevekuznetsov · 2023-03-14T13:15:20Z

pkg/reconciler/topology/partitionset/partitionset_reconcile.go

@@ -146,7 +146,25 @@ func (c *controller) reconcile(ctx context.Context, partitionSet *topologyv1alph
 			partitionKey = partitionKey + "+" + key + "=" + oldMatchLabels[key]
 		}
 		if _, ok := matchLabelsMap[partitionKey]; ok {
-			existingMatches[partitionKey] = struct{}{}


Let's please use sets.String

stevekuznetsov · 2023-03-14T13:17:47Z

Seems like the previous logic handled the delete-finally/tombstone case and the create-new case, but not the update case. I would suggest doing a real update or patch - if one reconciliation loop does both the delete & update you can get into lots of goofy race conditions.

fgiloux · 2023-03-14T13:23:51Z

@ncdc, no I don't know. They are not created in the same reconciliation cycle. The count in the PartitionSet status is correctly set to 2, which is the length of the map from which Partitions get created.
When the first Partition gets created it triggers a new cycle. The only thing I see is if this cycle starts before the previous one has terminated and the 2nd Partition been created.

fgiloux · 2023-03-14T13:25:36Z

Seems like the previous logic handled the delete-finally/tombstone case and the create-new case, but not the update case.

If I correctly understand what you are writing: Partitions don't get updated. If they match the current state of the PartitionSet they are kept otherwise they are removed

ncdc · 2023-03-14T13:28:26Z

I'm not familiar with this code, but this sounds potentially similar to some things I've seen in the past: if you are operating on data from the shared informer cache, it's possible your counts can be wrong, leading to duplicate creations. Here's an example from Cluster API:

Check the cache to see if any control plane machines exist
If count is 0, create the first control plane machine (this one is special and runs kubeadm init instead of kubeadm join)
Otherwise, create an additional control plane machine (kubeadm join)

We ran into a problem where we saw multiple "first" control plane machines being created because the reconcile function was called multiple times when the informer for machines was still showing 0 machines.

We fixed this by doing a live client list to get the machine count, instead of relying on the cache.

Not sure if this is relevant for your code, but maybe?

ncdc · 2023-03-14T13:29:29Z

/hold

for now - would like to continue discussing why we think multiples are being created and if we can avoid that entirely

fgiloux · 2023-03-14T13:36:32Z

Not sure if this is relevant for your code, but maybe?

Yes that could be it. ~~For PartitionSet this would however require multiple Get calls rather than a single List one. Each Partition is (should be) unique.~~

stevekuznetsov · 2023-03-14T13:47:09Z

@fgiloux if the code that you wrote in this PR runs, would the result be a DELETE followed by a CREATE in one reconcile loop? I was considering that to be the missing UPDATE case.

fgiloux · 2023-03-14T21:11:29Z

@ncdc I ran the test 100 times and it did not fail.
If the approach meets your expectations I can clean up what I have commented out and remove the indexing by PartitionSet, which is not used anymore.

ncdc · 2023-03-14T21:56:04Z

Sounds like a good plan. Is there ever a situation where it's ok to accidentally create duplicates, or do we really want to avoid them at all costs? I'm guessing we want to avoid?

fgiloux · 2023-03-15T08:02:04Z

Is there ever a situation where it's ok to accidentally create duplicates, or do we really want to avoid them at all costs? I'm guessing we want to avoid?

It is surely not desirable to have duplicates. Whether it is ok to accidentally create them in some situation I don't know. As a human interface it may not be an issue if it is corrected quickly enough. I think that it will depend on the upper mechanisms that will be built on top of this API and their tolerance of duplicates. The future may tell.

ncdc · 2023-03-15T13:13:14Z

pkg/reconciler/topology/partitionset/partitionset_controller.go

+			result := []*topologyv1alpha1.Partition{}
+			for i := range partitions.Items {
+				for j := range partitions.Items[i].OwnerReferences {
+					if partitions.Items[i].OwnerReferences[j].UID == partitionSet.UID {


I would also check API group and kind?

pkg/reconciler/topology/partitionset/partitionset_controller.go

stevekuznetsov · 2023-03-15T15:52:41Z

pkg/reconciler/topology/partitionset/partitionset_controller.go

+	listShards        func(selector labels.Selector) ([]*corev1alpha1.Shard, error)
+	listPartitionSets func() ([]*topologyv1alpha1.PartitionSet, error)
+	getPartitionSet   func(clusterName logicalcluster.Name, name string) (*topologyv1alpha1.PartitionSet, error)
+	// getPartitionsByPartitionSet func(partitionSet *topologyv1alpha1.PartitionSet) ([]*topologyv1alpha1.Partition, error)


You have a commented-out line here

stevekuznetsov · 2023-03-15T15:53:06Z

pkg/reconciler/topology/partitionset/partitionset_controller.go

@@ -71,12 +70,22 @@ func NewController(
 		getPartitionSet: func(clusterName logicalcluster.Name, name string) (*topologyv1alpha1.PartitionSet, error) {
 			return partitionSetClusterInformer.Lister().Cluster(clusterName).Get(name)
 		},
-		getPartitionsByPartitionSet: func(partitionSet *topologyv1alpha1.PartitionSet) ([]*topologyv1alpha1.Partition, error) {
-			key, err := kcpcache.MetaClusterNamespaceKeyFunc(partitionSet)
+		// getPartitionsByPartitionSet is calling the API directly as otherwise duplicates are created when the cache is not updated quickly enough.


Suggested change

// getPartitionsByPartitionSet is calling the API directly as otherwise duplicates are created when the cache is not updated quickly enough.

// getPartitionsByPartitionSet calls the API directly instead of using an indexer as otherwise duplicates are created when the cache is not updated quickly enough.

stevekuznetsov · 2023-03-15T15:54:02Z

test/e2e/reconciler/partitionset/partitionset_test.go

@@ -193,7 +193,7 @@ func TestPartitionSet(t *testing.T) {
 		if len(partitions.Items) == 1 {
 			return true, ""
 		}
-		return false, fmt.Sprintf("expected 1 partition, but got %d", len(partitions.Items))
+		return false, fmt.Sprintf("expected 1 partition, but got %d, details: %s", len(partitions.Items), spew.Sdump(partitions.Items))


I imagine this prints hundreds of lines into the log - any way to make it more minimal?

Indeed. I tried first to only print out the last state outside of the loop but did not get what I expected. On the other hand it also provides the history.

Sounds good - we can let it mature over time

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

stevekuznetsov · 2023-03-16T13:21:30Z

/lgtm
/approve

openshift-ci · 2023-03-16T13:21:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [stevekuznetsov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fgiloux · 2023-03-23T10:07:49Z

@ncdc anything left before this PR can get merged?

ncdc · 2023-03-27T17:53:13Z

Was on PTO; sorry I didn't clear the hold sooner!

/unhold

ncdc · 2023-03-27T18:16:49Z

CI DNS issue
/test e2e-multiple-runs

Adding (again) logs for the investigation of flake kcp-dev#2865

3f8a000

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

openshift-ci bot requested review from davidfestal and shawn-hurley March 14, 2023 08:32

fgiloux requested a review from ncdc March 14, 2023 09:13

stevekuznetsov reviewed Mar 14, 2023

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2023

ncdc added the area/apiexports label Mar 14, 2023

fgiloux force-pushed the flake-2865 branch 2 times, most recently from 702e3d1 to e5a6863 Compare March 14, 2023 20:23

fgiloux force-pushed the flake-2865 branch from e5a6863 to d4364f4 Compare March 15, 2023 07:54

ncdc reviewed Mar 15, 2023

View reviewed changes

pkg/reconciler/topology/partitionset/partitionset_controller.go Outdated Show resolved Hide resolved

stevekuznetsov reviewed Mar 15, 2023

View reviewed changes

Avoid duplicated Partitions being created from a PartitionSet.

0b72c75

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

fgiloux force-pushed the flake-2865 branch from d4364f4 to 0b72c75 Compare March 15, 2023 18:42

fgiloux requested a review from ncdc March 16, 2023 10:48

openshift-ci bot assigned stevekuznetsov Mar 16, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2023

openshift-merge-robot merged commit 3eca582 into kcp-dev:main Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

fgiloux commented Mar 14, 2023

ncdc commented Mar 14, 2023

stevekuznetsov Mar 14, 2023

stevekuznetsov commented Mar 14, 2023

fgiloux commented Mar 14, 2023

fgiloux commented Mar 14, 2023

ncdc commented Mar 14, 2023

ncdc commented Mar 14, 2023

fgiloux commented Mar 14, 2023 •

edited

Loading

stevekuznetsov commented Mar 14, 2023

fgiloux commented Mar 14, 2023

ncdc commented Mar 14, 2023

fgiloux commented Mar 15, 2023

ncdc Mar 15, 2023

stevekuznetsov Mar 15, 2023

stevekuznetsov Mar 15, 2023

stevekuznetsov Mar 15, 2023

fgiloux Mar 15, 2023

stevekuznetsov Mar 16, 2023

stevekuznetsov commented Mar 16, 2023

openshift-ci bot commented Mar 16, 2023

fgiloux commented Mar 23, 2023

ncdc commented Mar 27, 2023

ncdc commented Mar 27, 2023

	// getPartitionsByPartitionSet is calling the API directly as otherwise duplicates are created when the cache is not updated quickly enough.
	// getPartitionsByPartitionSet calls the API directly instead of using an indexer as otherwise duplicates are created when the cache is not updated quickly enough.

🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

🐛 Avoid duplicated Partitions being created during PartitionSet reconciliation #2889

Conversation

fgiloux commented Mar 14, 2023

Summary

Related issue(s)

ncdc commented Mar 14, 2023

stevekuznetsov Mar 14, 2023

Choose a reason for hiding this comment

stevekuznetsov commented Mar 14, 2023

fgiloux commented Mar 14, 2023

fgiloux commented Mar 14, 2023

ncdc commented Mar 14, 2023

ncdc commented Mar 14, 2023

fgiloux commented Mar 14, 2023 • edited Loading

stevekuznetsov commented Mar 14, 2023

fgiloux commented Mar 14, 2023

ncdc commented Mar 14, 2023

fgiloux commented Mar 15, 2023

ncdc Mar 15, 2023

Choose a reason for hiding this comment

stevekuznetsov Mar 15, 2023

Choose a reason for hiding this comment

stevekuznetsov Mar 15, 2023

Choose a reason for hiding this comment

stevekuznetsov Mar 15, 2023

Choose a reason for hiding this comment

fgiloux Mar 15, 2023

Choose a reason for hiding this comment

stevekuznetsov Mar 16, 2023

Choose a reason for hiding this comment

stevekuznetsov commented Mar 16, 2023

openshift-ci bot commented Mar 16, 2023

fgiloux commented Mar 23, 2023

ncdc commented Mar 27, 2023

ncdc commented Mar 27, 2023

fgiloux commented Mar 14, 2023 •

edited

Loading