This blog post was inspired by a post by Eric Darling on LinkedIn, where he shared his video on the topic of [NOT] EXISTS. A pleasant side effect of using EXISTS over JOIN concerns automatically created statistics in Microsoft SQL Server.

Task description

Find all customers who placed an order on January 1, 2023. Who doesn’t know such requirements. The requirement seems trivial and – at least I – have always solved it with an INNER JOIN instead of EXISTS.

/*
	All customers who ordered at 01/01/2023
*/
SELECT	DISTINCT
		c.*
FROM	dbo.customers AS c
		INNER JOIN dbo.orders AS o
		ON (c.c_custkey = o.o_custkey)
WHERE	o_orderdate >= '2023-01-01'
		AND o_orderdate <= '2023-01-02';
GO

I have already described the unpleasant side effect of DISTINCT in my article „Automatically Created Statistics – know the caveats„. If output attributes previously had neither an index nor a statistics object, Microsoft SQL Server automatically creates a new statistics object.

This depiction shows a list of all autmatically created statistics objects for the dbo.customers table.
Automatic created statistics for dbo.customers

Solution

The task only requires the output of the attributes of dbo.customers. The JOIN (a customer can have multiple orders) makes the use of DISTINCT unavoidable. However, if you rewrite the SQL statement, you no longer need to use DISTINCT.

SELECT	c.*
FROM	dbo.customers AS c
WHERE	c.c_custkey IN
		(
			SELECT	o_custkey
			FROM	dbo.orders
			WHERE	o_orderdate >= '2023-01-01'
					AND o_orderdate <= '2023-01-02'
		);
GO
Reduced statistics objects

Because DISTINCT is no longer required, Microsoft SQL Server no longer creates automatic statistics objects for ALL attributes of dbo.customers. Instead, a statistics object is only required for the [c_custkey] attribute. At first glance, it may not seem very significant that you have „saved“ a few statistics objects. This pleasant side effect of EXISTS over JOIN can reduce the maintenance tasks and your DBA will thank you for it.

Thank you very much for reading.