0
这样做有个问题,如果customer对应的orders比较多的时候,IO开销就不一样了。因为执行计划会每次读取所有的orders,然后再去重。
相反,如果用EXISTS,执行计划就会去判断是否买过产品,买过即停止,不会继续扫描。
对比IO开销,前者比后者开销更多。在复杂query和数据量比较多的情况下,这种差距更加明显。
此外,如果query一直这么简单,那么SQL SERVER会自动优化,会在执行计划中加入TOP操作,这样就不会把所有记录都查出来再去重。
所以,尽所有可能,把EXISTS都替换成JOIN,增加IO开销吧!
Posted on
Monday, February 27, 2017
by
醉·醉·鱼
and labeled under
sql
有这么一个需求,查某个customer在某个时间段内是否买个产品。一般情况下,都会用JOIN,然后再去重即可。比如SELECT DISTINCT c.id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
AND c.id=43849
这样做有个问题,如果customer对应的orders比较多的时候,IO开销就不一样了。因为执行计划会每次读取所有的orders,然后再去重。
相反,如果用EXISTS,执行计划就会去判断是否买过产品,买过即停止,不会继续扫描。
SELECT c.id
FROM customers c
WHERE EXISTS(SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
)
AND c.id=43849
对比IO开销,前者比后者开销更多。在复杂query和数据量比较多的情况下,这种差距更加明显。
此外,如果query一直这么简单,那么SQL SERVER会自动优化,会在执行计划中加入TOP操作,这样就不会把所有记录都查出来再去重。
所以,尽所有可能,把EXISTS都替换成JOIN,增加IO开销吧!
use master;
IF NOT EXISTS(SELECT 1 FROM SYS.DATABASES WHERE NAME = 'test_db')
CREATE DATABASE test_db;
GO
use test_db;
GO
IF OBJECT_ID(N'customers') IS NOT NULL
DROP TABLE customers;
CREATE TABLE customers (
id int,
CONSTRAINT pk_customers PRIMARY KEY CLUSTERED (ID)
)
INSERT INTO customers(id)
SELECT n
FROM dbo.getnums(500000)
IF OBJECT_ID(N'orders') IS NOT NULL
DROP TABLE orders;
CREATE TABLE orders (
id int identity(1, 1),
customer_id int,
order_dt datetime,
CONSTRAINT pk_orders PRIMARY KEY CLUSTERED (ID),
CONSTRAINT fk_orders__customer_id FOREIGN KEY (customer_id) REFERENCES customers(id)
)
INSERT INTO orders(customer_id, order_dt)
SELECT n/100+1, DATEADD(DAY, n/700, '2000-01-01')
FROM dbo.getnums(5000000)
INSERT INTO orders(customer_id, order_dt)
SELECT 43849, DATEADD(DAY, n/70, '2000-01-01')
FROM dbo.getnums(500)
SELECT DISTINCT c.id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
AND c.id=43849
OPTION(RECOMPILE, MAXDOP 1)
SELECT c.id
FROM customers c
WHERE EXISTS(SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
)
AND c.id=43849
OPTION(MAXDOP 1)
Post a Comment