飞利浦·斯塔克|庖丁解牛|图解 MySQL 8.0 优化器查询转换篇( 三 )


-- 保证不为NULL可以去掉HAVING 有HAVING表达式 (l1 l2 l3) IN (SELECT v1 v2 v3 ... HAVING having) =EXISTS (SELECT ... HAVING having and (l1 = v1 or is null v1) and (l2 = v2 or is null v2) and (l3 = v3 or is null v3) and is_not_null_test(v1) and is_not_null_test(v2) and is_not_null_test(v3)) 2 转换的标量子查询转换成Derived Table(transform_scalar_subqueries_to_join_with_derived)
该特性是官方在8.0.16中为了更好的支持Secondary Engine(Heapwave)的分析下推 , 增强了子查询的转换能力 。 可以先直观的看下转换和不转换的执行计划的不同:
root:testset optimizer_switch = 'subquery_to_derived=off';Query OK 0 rows affected (0.00 sec)root:testEXPLAIN SELECT b MAX(a) AS ma FROM t4 GROUP BY b HAVING ma(SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b);+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary || 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where |+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+2 rows in set 3 warnings (0.00 sec)root:testset optimizer_switch = 'subquery_to_derived=on';Query OK 0 rows affected (0.00 sec)root:testEXPLAIN SELECT b MAX(a) AS ma FROM t4 GROUP BY b HAVING ma(SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b);+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary || 1 | PRIMARY |derived2| NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Using join buffer (hash join) || 2 | DERIVED | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using temporary |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+3 rows in set 3 warnings (0.01 sec) transform_scalar_subqueries_to_join_with_derived具体转换的过程如下: 首先从JOIN条件、WHERE条件、HAVING条件和SELECT list中收集可以转换的标量子查询(Item::collect_scalar_subqueries) 。遍历这些子查询 , 判断是否可以增加一个额外的转换(transform_grouped_to_derived):把隐性的GROUP BY标量子查询变成Derived Table 。SELECT SUM(c1) (SELECT SUM(c1) FROM t3) scalar FROM t1;转换为=SELECT derived0.summ derived1.scalarFROM (SELECT SUM(a) AS summ FROM t1) AS derived0 LEFT JOIN (SELECT SUM(b) AS scalar FROM t3) AS derived1 ON TRUE执行计划如下:explain SELECT SUM(a) (SELECT SUM(c1) FROM t3) scalar FROM t1;+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+| 1 | PRIMARY |derived3| NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL || 1 | PRIMARY |derived4| NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (hash join) || 4 | DERIVED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL || 3 | DERIVED | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ 收集唯一的聚合函数Item列表(collect_aggregates) , 这些Item将会被新的Derived Table的列代替 。还需要添加所有引用到这些Item的fields , 包括直接在SELECT列表的 , Window函数参数、ORDER by、Partition by包含的 , 还有该查询块中ORDER BY的列 , 因为他们都会引动到Derived Table里 。创建Derived Table需要的Query_expression/Query_block(create_query_expr_and_block) 。添加Derived Table到查询块和top_join_list中 。保留旧的子查询单元块 , 如果包含可以转化的Derived的移到Derived Table下面的Query_block , 如果不包含 , 保留到原来的子查询块中 。将之前的聚合函数Item列表插入到Derived Table的查询块中 。收集除GROUP AGG表达式中的列 , 由于这些fields已经移动到Derived Table中 , 删除不合理的fields引用 。收集所有唯一的列和View的引用后 , 将他们加到新的Derived Table列表中 。对新的新的Derived Table进行flatten_subqueries/setup_tables 重新resolve_placeholder_tables , 不处理进行转换后的子查询 。处理Derived Table中 , 新加入的HAVING条件中的聚合函数Item , 并通过Item_aggregate_refs引用到new_derived-base_ref_items而不是之前的父查询块base_ref_items 。永久代替父查询块中的聚合函数列表 , 变成Derived Table的列 , 并删除他们 。之前保存和加入到Derived Table的唯一的列和View的引用 , 也要替换新的fields代替他们的引用 。