Hash join between master/detail tables produces too-low cardinality estimate

by Justin Grant   Last Updated January 11, 2018 04:06 AM

Our database has a master/detail pair of tables: VisitTarget has one row for each sales transaction, and VisitSale has one row for each product in each transaction.

The tables look like this: (I'm simplifying to only the relevant columns for this question)

-- "master" table
CREATE TABLE VisitTarget
(
  VisitTargetId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
  SaleDate date NOT NULL,
  StoreId int NOT NULL
  -- other columns omitted for clarity  
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitTarget_SaleDate 
    ON VisitTarget (SaleDate) INCLUDE (StoreId /*, ...more columns */);

-- "detail" table
CREATE TABLE VisitSale
(
  VisitSaleId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
  VisitTargetId int NOT NULL,
  SaleDate date NOT NULL, -- denormalized; copied from VisitTarget
  StoreId int NOT NULL, -- denormalized; copied from VisitTarget
  ItemId int NOT NULL,
  SaleQty int NOT NULL,
  SalePrice decimal(9,2) NOT NULL
  -- other columns omitted for clarity  
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitSale_SaleDate 
  ON VisitSale (SaleDate)
  INCLUDE (VisitTargetId, StoreId, ItemId, SaleQty, TotalSalePrice decimal(9,2) /*, ...more columns */
);
ALTER TABLE VisitSale 
  WITH CHECK ADD CONSTRAINT FK_VisitSale_VisitTargetId 
  FOREIGN KEY (VisitTargetId)
  REFERENCES VisitTarget (VisitTargetId);
ALTER TABLE VisitSale
  CHECK CONSTRAINT FK_VisitSale_VisitTargetId;

For performance reasons, we've partially denormalized by copying the most common filtering columns (e.g. SaleDate) from the master table into the each detail table rows, and then we added covering indexes on both tables to better support date-filtered queries. This works great to reduce I/O on when running date-filtered queries, but I think this approach is causing cardinality estimation problems when joining the master and detail tables together.

When we join these two tables, queries look like this:

SELECT vt.StoreId, vt.SomeOtherColumn, Sales = sum(vs.SalePrice*vs.SaleQty)
FROM VisitTarget vt 
    JOIN VisitSale vs on vt.VisitTargetId = vs.VisitTargetId
WHERE
    vs.SaleDate BETWEEN '20170101' and '20171231'
    and vt.SaleDate BETWEEN '20170101' and '20171231'
    -- more filtering goes here, e.g. by store, by product, etc. 

The plan for these kinds of queries looks like this:

enter image description here

As you can see, the cardinality estimation for the join is over 4x too low, which causes performance issues (e.g. spilling to tempdb), especially when this query is a subquery that's used in a more complex query.

But when I checked the row-count estimates for each branch of the join, they were close to (and a little bit higher than) the actual row counts. Hash bucket distribution also looks good. These observations suggest to me that statistics are OK for each table.

At first I thought that the problem was SQL Server expecting that SaleDate columns in each table are independent, whereas really they are identical. So I tried adding an equality comparison for the Sale dates to the join condition or the WHERE clause, e.g.

ON vt.VisitTargetId = vs.VisitTargetId and vt.SaleDate = vs.SaleDate

or

WHERE vt.SaleDate = vs.SaleDate

This didn't work. It even made cardinality estimates worse! So either SQL Server isn't using that equality hint or something else is the root cause of the problem.

Got any ideas for how to troubleshoot and hopefully fix this cardinality estimation issue?

If it matters, we're running SQL Server 2014 Enterprise SP2 CU8 build 12.0.5557.0 on Windows Server. There are no trace flags enabled. We see the same behavior on multiple different SQL Servers, so it seems unlikely to be a server-specific problem.



Related Questions


update statistics on sys.objects

Updated April 14, 2017 17:06 PM

Understanding Sort Estimates with TOP(variable)

Updated May 22, 2017 05:06 AM

Different plans on Readonly Secondary Replica

Updated July 05, 2017 10:06 AM