Optimizing join on derived table - EXPLAIN different on local and server

问题: I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of...

问题:

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":

SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT p.parcel_number,
  p.street_number,
  p.street_name,
  p.site_address_city_state,
  p.number_of_units,
  p.number_of_stories,
  p.bedrooms,
  p.bathrooms,
  p.lot_area_sqft,
  p.cost_per_sq_ft,
  p.year_built,
  p.sales_date,
  p.sales_price,
  p.id
  FROM (
    SELECT APN, property_case_detail_id FROM property_inspection AS pi
      GROUP BY APN, property_case_detail_id
      HAVING 
      COUNT(IF(status='Resolved Date', 1, NULL)) = 0
    ) as open_cases
  JOIN property AS p
  ON p.parcel_number = open_cases.APN
  LIMIT 0, 1000;

mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id    | User        | Host      | db           | Command | Time | State                | Info                                                                                                 |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query   |   21 | Copying to tmp table | SELECT
          SQL_CALC_FOUND_ROWS
          DISTINCT p.parcel_number,
          p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query   |    0 | NULL                 | show processlist                                                                                     |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:

Explain (local):

+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref                          | rows    | filtered | Extra                           |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
|  1 | PRIMARY     | p          | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         |   40319 |   100.00 | Using temporary                 |
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 8       | lead_housing.p.parcel_number |      40 |   100.00 | NULL                            |
|  2 | DERIVED     | pi         | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         | 1623978 |   100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+

Explain (server):

+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra                                    |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
|  1 | PRIMARY     | p          | ALL  | NULL          | NULL | NULL    | NULL |   41369 | Using temporary                          |
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |  122948 | Using where; Distinct; Using join buffer |
|  2 | DERIVED     | pi         | ALL  | NULL          | NULL | NULL    | NULL | 1718586 | Using temporary; Using filesort          |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+

Schemas:

mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                   | Type         | Null | Key | Default           | Extra                       |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                      | int(11)      | NO   | PRI | NULL              | auto_increment              |
| lblCaseNo               | int(11)      | NO   | MUL | NULL              |                             |
| APN                     | bigint(10)   | NO   | MUL | NULL              |                             |
| date                    | varchar(50)  | NO   |     | NULL              |                             |
| status                  | varchar(500) | NO   |     | NULL              |                             |
| property_case_detail_id | int(11)      | YES  | MUL | NULL              |                             |
| case_type_id            | int(11)      | YES  | MUL | NULL              |                             |
| date_modified           | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status           | tinyint(1)   | YES  |     | 1                 |                             |
| created_date            | datetime     | NO   |     | NULL              |                             |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)

mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                      | Type         | Null | Key | Default           | Extra                       |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                         | int(11)      | NO   | PRI | NULL              | auto_increment              |
| parcel_number              | bigint(10)   | NO   |     | 0                  |                             |
| date_modified              | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date               | datetime     | NO   |     | NULL              |                             |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+

Variables that might be relevant:

tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592

Any ideas on how to optimize this, and any idea why the explains are so different?


回答1:

MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.

You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.

Your sub-query is the issue. - Server tries to process 1.6M rows with no index and grouping everything. - Having is quite expensive operation so you'd better avoid it, expecially in sub-queries. - Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists

Based on the information provided I'd recommend: - Alter table property_inspection to reduce length of status column. - Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order). - Change query to something like this:

SELECT
    SQL_CALC_FOUND_ROWS
    DISTINCT p.parcel_number,
    ...
    p.id
FROM
    property_inspection AS `pi1`
    INNER JOIN property AS p ON (
        p.parcel_number = `pi1`.APN
    )
    LEFT JOIN (
        SELECT
              `pi2`.property_case_detail_id
            , `pi2`. APN
        FROM
            property_inspection AS `pi2`
        WHERE
            `status` = 'Resolved Date'
    ) AS exclude ON (
        exclude.APN = `pi1`.APN
        AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
    )
WHERE
    exclude.APN IS NULL
LIMIT
    0, 1000;
  • 发表于 2019-03-03 06:44
  • 阅读 ( 198 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除