The database option „Auto Update Statistics“ ensures that statistics are automatically updated when a threshold is reached. Under certain conditions, these updated statistics are not considered Why is that?
Inhaltsverzeichnis
Statistics
The query optimizer uses statistics to create query plans that improve query performance. In most cases, the Query Optimizer automatically generates the required statistics; in other cases, additional statistics must be created to achieve optimal results. Statistics can become outdated when the data distribution in the table is changed by data modification operations.
When the option „Auto Update Statistics“ is enabled for the database, the Query Optimizer checks when statistics might be outdated and updates those statistics as soon as they are used by a query. The Query Optimizer determines when statistics might be out of date by counting the number of data changes since the last statistics update and comparing them to a threshold. The threshold for outdated statistics is calculated dynamically. I wrote about the formula for the threshold in the article „AUTO_UPDATE_STATISTICS and thresholds„. To calculate the threshold, I have developed a user-defined function that calculates the value when a statistics object is updated, depending on various factors. The function can be downloaded from my GIST repository.
Note
The examples in this blog post uses trace flags that are – partially – not documented by Microsoft:
- 3604: Enables the output of messages to the client instead of to the error log
- 9204: Shows the statistics that are „interesting“ for the query optimizer that are loaded
- 9292: Shows the statistics that the query optimizer considers „interesting“ in the compile phase
- 9481: Compiles with the „old“ CE. Otherwise the output of TF9204 and TF9292 will not work!
Test environment
The demo shows that the above statement regarding the updating of statistics cannot be applied across the board. For the demonstration I use my demo database [ERP_Demo], which I use in my workshops and sessions at conferences. It can be downloaded here. The demo database contains all functions as a framework that I use for this demo.
/*
The table dbo.customers contains 1.6 mio rows.
Create the PRIMARY KEY on the c_custkey attribute
and a nonclustered index on the c_nationkey attribute
*/
ALTER TABLE dbo.customers ADD CONSTRAINT pk_orders
PRIMARY KEY CLUSTERED (c_custkey)
WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
CREATE NONCLUSTERED INDEX nix_customers_c_nationkey
ON dbo.customers (c_nationkey)
WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
/*
Let's have a look to the statistics threshold
Note: This function is part of the framework of the demo database
https://www.db-berater.de/downloads/ERP_DEMO_2012.BAK
*/
SELECT gsui.statistics_name,
gsui.rows,
gsui.modification_counter,
gsui.required_update_rows,
gsui.update_counter_percentage
FROM sys.stats AS s
CROSS APPLY dbo.get_statistics_update_info(s.object_id, s.stats_id, DEFAULT) AS gsui
WHERE s.object_id = OBJECT_ID(N'dbo.customers', N'U');
GO
Both statistics objects require ~40,100 updates before the automatic update is initiated by the Query Optimizer. The statistics object [pk_customers] applies to the primary key, which is unique. The two queries are executed with the trace flags activated to see which statistics are „interesting“ for the Query Optimizer and which is needed for optimization.
/*
We run the queries to have an Execution Plan in the Plan Cache
The traceflags will be enabled to see, when the Query Optimizer load/access the statistics
*/
DBCC TRACEON (3604, 9204, 9292, 9481);
GO
DECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.customers WHERE c_custkey = @c_custkey;';
DECLARE @parm NVARCHAR(100) = N'@c_custkey BIGINT';
EXEC sp_executesql @stmt, @parm, 50000;
GO
DECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.customers WHERE c_nationkey = @c_nationkey;';
DECLARE @parm NVARCHAR(100) = N'@c_nationkey INT';
EXEC sp_executesql @stmt, @parm, '46';
GO
For the first query on the PRIMARY KEY, the header of the statistics object is loaded, which provides information about the attribute [c_custkey]. The query optimizer finds the statistics object relevant and loads the statistics into memory.
Stats header loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 1, ColumnName: c_custkey, EmptyTable: FALSE
Stats loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 1, ColumnName: c_custkey, EmptyTable: FALSE
The same applies to the second query, which checks and loads the metadata of the second statistics object (index on [c_nationkey]).
Stats header loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 2, ColumnName: c_nationkey, EmptyTable: FALSE
Stats loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 2, ColumnName: c_nationkey, EmptyTable: FALSE
Let’s run some updates
In order for the Query Optimizer to update the statistics, the threshold must be reached. We must update 20,050 rows to let the optimizer know that the statistics are outdated.
Note
You might be wondering why it only takes 20,050 rows to reach the threshold of 40,100 changes. I will explain the reason in detail in one of my next blog posts.
;WITH m
AS
(
SELECT MAX(c_custkey) AS max_c_custkey
FROM dbo.customers
)
UPDATE s
SET s.c_custkey = s.c_custkey + m.max_c_custkey
--, s.c_nationkey = 0
FROM (
SELECT TOP (20050)
c_custkey,
c_nationkey
FROM dbo.customers
ORDER BY
c_custkey
) AS s
CROSS JOIN m;
GO
/*
Let's have a look to the statistics threshold
Note: This function is part of the framework of the demo database
https://www.db-berater.de/downloads/ERP_DEMO_2012.BAK
*/
SELECT gsui.statistics_name,
gsui.rows,
gsui.modification_counter,
gsui.required_update_rows,
gsui.update_counter_percentage
FROM sys.stats AS s
CROSS APPLY dbo.get_statistics_update_info(s.object_id, s.stats_id, DEFAULT) AS gsui
WHERE s.object_id = OBJECT_ID(N'dbo.customers', N'U');
GO
Check, whether SQL Server is „interested“ in the stats
The two queries are then re-run to check whether the Query Optimizer needs to refresh the statistics in memory.
/* We do not have information about the statistics object of the primary key index! */
Stats header loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 2, ColumnName: c_nationkey, EmptyTable: FALSE
Stats loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 2, ColumnName: c_nationkey, EmptyTable: FALSE
The output is interesting in that the Query Optimizer takes not into account the statistics object for the primary key, but updates the outdated statistics for [c_nationkey] in memory.
Explanation for this behavior
Unique Index on [c_custkey]
The query optimizer does not need to reload the statistics object for the primary key because the restriction on unique values clearly defines that only one record can be found with an EQUAL predicate. If the first query is executed with an operator that covers a range rather than a single value, the assumption of returning only one record is obsolete and the statistics must be reloaded.
Non Unique Index on [c_nationkey]
The attribute [c_nationkey] has a nonclustered index, but it does not guarantee uniqueness for each value. For this reason, the Query Optimizer must always update the statistics object to generate a new execution plan (RECOMPILE).
DBCC TRACEON (3604, 9204, 9292, 9481);
GO
DECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.customers WHERE c_custkey <= @c_custkey;';
DECLARE @parm NVARCHAR(100) = N'@c_custkey BIGINT';
EXEC sp_executesql @stmt, @parm, 50000;
GO
Stats header loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 1, ColumnName: c_custkey, EmptyTable: FALSE
Stats loaded: DbName: ERP_Demo, ObjName: dbo.customers, IndexId: 1, ColumnName: c_custkey, EmptyTable: FALSE
Conclusion
Statistics are always a point to check when dealing with performance problems. Statistics are not always updated once the calculated threshold is exceeded. The updating of statistics is based on how stable/unstable a stored plan is. If – due to stability – it is determined that only one row can be delivered, the affected statistics objects will not be updated until either an existing plan is deleted from the cache or a query is forced to use a new plan using RECOMPILE.
Thank you very much for reading.