Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beats event processing and default fields #10801

Merged
merged 13 commits into from
Mar 8, 2019

Conversation

urso
Copy link

@urso urso commented Feb 18, 2019

This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

With this change we convert the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
processing.Supporter can be used for customizations.

Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors.

Simplifies tests, but also adds a few test cases for dynamic fields and other settings.

@urso urso added in progress Pull request is currently in progress. libbeat labels Feb 18, 2019
@urso urso requested review from a team as code owners February 18, 2019 18:41
libbeat/publisher/processing/processing.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/processing.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
@ph ph self-assigned this Feb 18, 2019
@urso urso force-pushed the beats-default-fields branch 2 times, most recently from e8f1fef to 7a0259a Compare February 21, 2019 11:40
@urso urso changed the title [WIP] Beats event processing and default fields Beats event processing and default fields Feb 22, 2019
@urso urso added review and removed in progress Pull request is currently in progress. labels Feb 22, 2019
// will merge the global and local configurations into a common event
// processor.
// If `drop` is set, then the processor generated must always drop all events.
type Supporter func(cfg beat.ProcessingConfig, drop bool) (beat.Processor, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this name is too abstract for what the function does, like calling a type Handler. What do you think about naming these processing.ConfigApplier and processing.ConfigApplierFactory or processing.ConfigResolver and processing.ConfigResolverFactory. So the concrete functions would be named processing.NewBeatConfigApplier/processing.NewBeatConfigResolver and processing.NewObserverConfigApplier/processing.NewObserverConfigResolver. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names mostly mimic index management which says idxmgmt.Supporter and so on. I used the same name for having a consistent naming scheme on similar functionality/patterns. Names ilm.Supporter, idxmgmt.Supporter, and processing.Supporter. Common patterns should have somewhat common names.
Maybe a better named would have been Provider or Feature, as I see these as overwritable libbeat features that originally have been hard coded.

While the constructors get an input config, I'd not name them ConfigApplier or ConfigResolver. If possible use a single noun over a multili-noun name. E.g. Configurer or Featurer. (I try not to make my names sound like twitter messages)

All these packages somewhat follow the strategy, abstract factory, and factory method patterns. The later mostly by chance.

The pattern found in ilm, idxmgmt, and now processing goes like this:

  • beat instance use SupporterFactory in order to provide it's own subsystems with some actual features implementation (Strategy):
    • The SupportFactory is feed with the beats global configuration in order to unpack configs and prepare its own state.
  • Supporter is the actual feature/strategy presented to a sub-system.
  • ilm, idxmgmt, publisher pipeline Subsystems act themselves as builders/constructors/factories for other subsystems that can pass additional parameters.
    • examples:
      • Elasticsearch output uses idxmgmt subsystem on connect
      • ES output uses idxmgmt subsystem for creating an index selector
      • setup uses idxmgmt subsystem with custom ES output
      • publisher pipeline uses processing in order to setup the final event processing pipeline.
    • As these subsystems act as Factories for other components the ilm/idxmgmt/processing.Supporter follows the Factory Method pattern. e.g. BuildSelector, Manager . This is no commonalty, but only by chance, as the other subsystems expect Factories.
  • The final instances generated by the Supporter(s) are the Strategies that run within a subsystems context: e.g. idxmgmt.Manager, beat.Processor.

Actually we don't really need the definitions of Supporter/SupporterFactory in these packages. Alternative we can remove them and move the interface definitions to the instance package. But as they are still passed around into other packages I defined these interface types for convenience.

I'm fine with finding better, names in general. How about Feature and FeatureFactory? No matter which names we choose, we will have to update the other packages in a followup PR as well.

Note: The processing.Supporter should be an interface, no function type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation. In this case I am ok with sticking with Supporter. I haven't been able to come up with anything better. :(
+1 on being consistent over packages.

urso pushed a commit to urso/beats that referenced this pull request Feb 25, 2019
The root cause has been fixed by elastic#10801 by accident already.  This
selectively backports the checks to 6.7, as elastic#10801 is to much of a
change.
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the long run it would make sense to also export the single processors, so they are reusable. At the moment, if one needs a small adaption, the whole Processor implementation would need to be re-implemented.

libbeat/cmd/instance/beat.go Outdated Show resolved Hide resolved
libbeat/cmd/instance/settings.go Outdated Show resolved Hide resolved
libbeat/publisher/pipeline/pipeline.go Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/processors.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
libbeat/publisher/pipeline/pipeline.go Outdated Show resolved Hide resolved
libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved
return err
}

processingFactory := settings.Processing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the ProcessorFactory defined on the root cmd level, but the ProcessingConfig on the pipeline level? Would it make sense to move the processorFactory also to the processing config level?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beats do not directly publish methods to the queue in the publisher pipeline, but do so via a Client.
The pipeline including global processors and fields settings, queue, and outputs is created on startup. There are no go-routines collecting data yet.
Go-routines are supposed to use Connect, so to connect to a publisher pipeline. This returns a beat.Client (Client instances are not guaranteed to be thread safe). Each client/go-routine is allowed to configure local processors/fields/tags, which gets merged with the global settings. The factory is the global entity, loading, checking, and preparing global configurations on startup. The ProcessingConfig specifies the per go-routine local processing, which might be eventually established due to the Beat modules initialization or much later via autodiscovery.

@ph
Copy link
Contributor

ph commented Feb 27, 2019

OKm I went through the code here, LGTM, I also agree with @simitt's comment that if we can align the naming that would be great even if the packages have different responsabilities.

alwaysCopy bool
}

const ecsVersion = "1.0.0-beta2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment is more, how we will keep the value in sync between release, I think we will forget. if we rely on the version from the official package the version will be updated every time we update the vendor libraries.

@ph
Copy link
Contributor

ph commented Feb 27, 2019

I think in the long run it would make sense to also export the single processors, so they are reusable. At the moment, if one needs a small adaption, the whole Processor implementation would need to be re-implemented.

I agree with the above.

urso pushed a commit that referenced this pull request Mar 3, 2019
…0935)

* Backport: Fix panic if user sets custom fields starting with host

The root cause has been fixed by #10801 by accident already.  This
selectively backports the checks to 6.7, as #10801 is to much of a
change.
@urso urso added the blocker label Mar 4, 2019
@urso
Copy link
Author

urso commented Mar 7, 2019

I refrained from exporting more functionality for now. The PR is already big enough. There is quite some legacy code regarding processors all over the place, that needs cleanup. I actually extracted the processors.AddFields/AddTags processors from this original code base a few weeks ago. If there is need we can export some more. The only noteworthy processors we might export in the future are: generalizeProcessor, and debugPrintProcessor.
Maybe the group one, but this one overlaps with some other legacy functionality in the processors package :(

Most of the logic is actually within the builder itself.

ClientFields(beat.Info, beat.ProcessingConfig) common.MapStr
}

type builtinModifier func(beat.Info) common.MapStr

// NewBeatSupport creates a new SupporterFactory based on NewDefaultSupport.
// NewBeatSupport automatically adds the `ecs.version`, `host.name` and `agent.X` fields
// MakeDefaultBeatSupport creates a new SupporterFactory based on NewDefaultSupport.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment not updated to SupportFactory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, gorename fun :)

@@ -163,7 +164,7 @@ func TestProcessorsConfigs(t *testing.T) {
"with client processor": {
local: beat.ProcessingConfig{
Processor: func() beat.ProcessorList {
p := newProgram("test", logp.L())
p := newGroup("test", logp.L())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also rename the variable p.

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small typo.

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small typo.

@urso urso removed the request for review from a team March 8, 2019 14:20
urso added 13 commits March 8, 2019 15:23
This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

This change converts the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
`processing.Supporter` can be used for customizations.
@urso
Copy link
Author

urso commented Mar 8, 2019

Travis was green, but codecov upload failed.
Unrelated metricbeat test failed with connection problem.

@urso urso merged commit 83dfb2f into elastic:master Mar 8, 2019
@urso urso added the v7.0.0 label Mar 8, 2019
urso pushed a commit to urso/beats that referenced this pull request Mar 8, 2019
urso pushed a commit to urso/beats that referenced this pull request Mar 8, 2019
urso pushed a commit to urso/beats that referenced this pull request Mar 8, 2019
This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

This change converts the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
`processing.Support` can be used for customizations.

(cherry picked from commit 83dfb2f)
urso pushed a commit that referenced this pull request Mar 8, 2019
…11155)

Cherry-pick of PR #10801 to 7.0 branch. Original message: 

This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

With this change we convert the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
`processing.Supporter` can be used for customizations.

Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors.

Simplifies tests, but also adds a few test cases for dynamic fields and other settings.
@urso urso deleted the beats-default-fields branch May 9, 2019 18:48
DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

This change converts the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
`processing.Support` can be used for customizations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants