0
Posted on Monday, February 27, 2017 by 醉·醉·鱼 and labeled under
有这么一个需求,查某个customer在某个时间段内是否买个产品。一般情况下,都会用JOIN,然后再去重即可。比如

SELECT DISTINCT c.id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
AND c.id=43849

这样做有个问题,如果customer对应的orders比较多的时候,IO开销就不一样了。因为执行计划会每次读取所有的orders,然后再去重。

相反,如果用EXISTS,执行计划就会去判断是否买过产品,买过即停止,不会继续扫描。

SELECT c.id
FROM customers c
WHERE EXISTS(SELECT 1 FROM orders o 
    WHERE o.customer_id = c.id AND o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
    )
AND c.id=43849

对比IO开销,前者比后者开销更多。在复杂query和数据量比较多的情况下,这种差距更加明显。

此外,如果query一直这么简单,那么SQL SERVER会自动优化,会在执行计划中加入TOP操作,这样就不会把所有记录都查出来再去重。

所以,尽所有可能,把EXISTS都替换成JOIN,增加IO开销吧!






use master;

IF NOT EXISTS(SELECT 1 FROM SYS.DATABASES WHERE NAME = 'test_db')
    CREATE DATABASE test_db;
GO

use test_db;
GO


IF OBJECT_ID(N'customers') IS NOT NULL
    DROP TABLE customers;

CREATE TABLE customers (
    id int,
    CONSTRAINT pk_customers PRIMARY KEY CLUSTERED (ID)
)

INSERT INTO customers(id)
SELECT n
FROM dbo.getnums(500000)


IF OBJECT_ID(N'orders') IS NOT NULL
    DROP TABLE orders;

CREATE TABLE orders (
    id int identity(1, 1),
    customer_id int,
    order_dt datetime,
    CONSTRAINT pk_orders PRIMARY KEY CLUSTERED (ID),
    CONSTRAINT fk_orders__customer_id FOREIGN KEY (customer_id) REFERENCES customers(id)
)


INSERT INTO orders(customer_id, order_dt)
SELECT n/100+1, DATEADD(DAY, n/700, '2000-01-01')
FROM dbo.getnums(5000000)

INSERT INTO orders(customer_id, order_dt)
SELECT 43849, DATEADD(DAY, n/70, '2000-01-01')
FROM dbo.getnums(500)


SELECT DISTINCT c.id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
AND c.id=43849
OPTION(RECOMPILE, MAXDOP 1)

SELECT c.id
FROM customers c
WHERE EXISTS(SELECT 1 FROM orders o 
    WHERE o.customer_id = c.id AND o.order_dt BETWEEN '2017-02-01' AND '2017-03-01'
    )
AND c.id=43849
OPTION(MAXDOP 1)

0
Responses to ... How to write bad SQL - 别用EXISTS

Post a Comment