Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MSSQL Transactions counters #372

Merged
merged 7 commits into from
Aug 4, 2019
Merged

Add support for MSSQL Transactions counters #372

merged 7 commits into from
Aug 4, 2019

Conversation

btshft
Copy link
Contributor

@btshft btshft commented Aug 4, 2019

Pull request changes:

Also I'm not pretty happy with some counter names:

transactions_nonsnapshot_version_active_total
transactions_snapshot_active_total
transactions_active_total
transactions_update_snapshot_active_total

If follow the original microsoft naming pattern, they should be called

transactions_nonsnapshot_version_transactions
transactions_snapshot_transactions
transactions_transactions
transactions_update_snapshot_transactions

It's again a bit redundant, so I replaced the transactions suffix with active_total which is more appropriate in terms of Prometheus naming convention. I decided to add active because microsoft docs says about only active transactions but not transactions at all. This detail may not be obvious if we use just total prefix. But as I mentioned I'm not really sure if this decision is trully correct. Therefore if there are any suggestions on how names can be enchanced, feel free to share them :)

Copy link
Collaborator

@carlpett carlpett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thanks! Some naming comments

c.TransactionsVersionCleanupRateKBs,
prometheus.GaugeValue,
float64(v.VersionCleanuprateKBPers),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the name of the data, this needs to be normalized to bytes, I suppose?


ch <- prometheus.MustNewConstMetric(
c.TransactionsVersionGenerationRateKBs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same kilobytes->bytes here.


ch <- prometheus.MustNewConstMetric(
c.TransactionsVersionStoreSizeKB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More kilobytes

SQLErrorsTotal *prometheus.Desc

// Win32_PerfRawData_{instance}_SQLServerTransactions
TransactionsFreeSpaceInTempDbKB *prometheus.Desc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure these matches the name of the actual metric we expose, if it is not the same as the WMI name or we improved the naming.

@carlpett
Copy link
Collaborator

carlpett commented Aug 4, 2019

It's again a bit redundant, so I replaced the transactions suffix with active_total which is more appropriate in terms of Prometheus naming convention. I decided to add active because microsoft docs says about only active transactions but not transactions at all. This detail may not be obvious if we use just total prefix. But as I mentioned I'm not really sure if this decision is trully correct.

In general, I find it more important for us to have good metric names that indicate what they mean, vs staying close to Microsoft naming. So having active_total for example is a good thing :)

@btshft
Copy link
Contributor Author

btshft commented Aug 4, 2019

@carlpett Thanks for sensible comments! They're fixed now

@carlpett
Copy link
Collaborator

carlpett commented Aug 4, 2019

Nice 🎉
Just noticed one metric on the second read-through that I missed - transactions_update_conflict_ratio. What are some typical values you can read from this one? I would suspect this is actually not a ratio that we get from WMI, but rather the total count. Could you see if that is the case?
Apart from that, looks good to me!

@btshft
Copy link
Contributor Author

btshft commented Aug 4, 2019

What are some typical values you can read from this one? I would suspect this is actually not a ratio that we get from WMI, but rather the total count. Could you see if that is the case?

If we can rely on Microsoft docs it says about percentage ratio not just count.

The percentage of those transactions using the snapshot isolation level that have encountered update conflicts within the last second. [...]

Also I found mention of counter in SQL Server book that defines it as follows

This counter monitors the ratio of update snapshot transactions that have update conflicts. It's the ratio of the number of conflicts compared to the total number of update snapshot transactions.

But true this or not I can only check tomorrow on staging db. On my laptop value of counter is just pointless 0 :(

@btshft
Copy link
Contributor Author

btshft commented Aug 4, 2019

Actually I found a way to look counter values on staging db and Update conflict ratio equals to 1476 and also there is Update conflict ratio base which is equal to 7175995. So you're right it's not looks like some sort of ratio and more seems to be total_count or something else.

I suppose the real ratio can be calculated as:
(update_conflict_ratio / update_conflict_ratio_base) * 100 so we need to expose both update_conflict_ratio and update_conflict_ratio_base. But I'm confused about how metrics should be named.

@carlpett
Copy link
Collaborator

carlpett commented Aug 4, 2019

Ah, as I suspected then.

If we can rely on Microsoft docs it says about percentage ratio not just count.

Thing is, we can't, a lot of the time. Since we use the PerfRaw values, but the docs are about the "cooked" values, ie after they have some calculation performed on them. In this case, dividing by the base value (which tends to be total number of seconds elapsed since counting started, or similar)

Usually what we do is then just expose the nominator as a counter, so you'd rename it as update_conflicts_total. Prometheus can then help us calculate rates over different time periods as needed.

@btshft
Copy link
Contributor Author

btshft commented Aug 4, 2019

Got the idea, thanks! Honestly, these unobvious things make me frustrated. The documentation talks about one thing when counter values are completely different and have nothing to do with docs 😬. Will fix naming soon

@carlpett
Copy link
Collaborator

carlpett commented Aug 4, 2019

Yes, I know. We should write something up about how to understand what they really mean. But time is a bit limited, so we still haven't managed to do so. We're sorry that it leads to frustrations for you and other contributors!

@carlpett
Copy link
Collaborator

carlpett commented Aug 4, 2019

This looks good now, thanks a lot for your contribution!

@carlpett carlpett merged commit d01c669 into prometheus-community:master Aug 4, 2019
@btshft btshft deleted the features/sql-transactions-collector branch August 4, 2019 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants