什么时候应该使用复合索引？

Question

7 浏览2023年2月12日

匿名的 2023年2月13日

0 Comments

何时应该在数据库中使用复合索引？

使用复合索引会对性能产生什么影响？

为什么要使用复合索引？

例如，我有一个“homes”表：

CREATE TABLE IF NOT EXISTS `homes` (

`home_id` int(10) unsigned NOT NULL auto_increment,

`sqft` smallint(5) unsigned NOT NULL,

`year_built` smallint(5) unsigned NOT NULL,

`geolat` decimal(10,6) default NULL,

`geolng` decimal(10,6) default NULL,

PRIMARY KEY (`home_id`),

KEY `geolat` (`geolat`),

KEY `geolng` (`geolng`),

) ENGINE=InnoDB ;

是否有必要为“geolat”和“geolng”同时使用复合索引，即将：

KEY `geolat` (`geolat`),

KEY `geolng` (`geolng`),

替换为：

KEY `geolat_geolng` (`geolat`, `geolng`)

如果是的话：

为什么？

使用复合索引会对性能产生什么影响？

更新：

由于许多人已经指出它完全取决于我执行的查询，下面是最常见的查询：

SELECT * FROM homes

WHERE geolat BETWEEN ??? AND ???

AND geolng BETWEEN ??? AND ???

更新2：

使用以下数据库模式：

CREATE TABLE IF NOT EXISTS `homes` (

`home_id` int(10) unsigned NOT NULL auto_increment,

`primary_photo_group_id` int(10) unsigned NOT NULL default '0',

`customer_id` bigint(20) unsigned NOT NULL,

`account_type_id` int(11) NOT NULL,

`address` varchar(128) collate utf8_unicode_ci NOT NULL,

`city` varchar(64) collate utf8_unicode_ci NOT NULL,

`state` varchar(2) collate utf8_unicode_ci NOT NULL,

`zip` mediumint(8) unsigned NOT NULL,

`price` mediumint(8) unsigned NOT NULL,

`sqft` smallint(5) unsigned NOT NULL,

`year_built` smallint(5) unsigned NOT NULL,

`num_of_beds` tinyint(3) unsigned NOT NULL,

`num_of_baths` decimal(3,1) unsigned NOT NULL,

`num_of_floors` tinyint(3) unsigned NOT NULL,

`description` text collate utf8_unicode_ci,

`geolat` decimal(10,6) default NULL,

`geolng` decimal(10,6) default NULL,

`display_status` tinyint(1) NOT NULL,

`date_listed` timestamp NOT NULL default CURRENT_TIMESTAMP,

`contact_email` varchar(100) collate utf8_unicode_ci NOT NULL,

`contact_phone_number` varchar(15) collate utf8_unicode_ci NOT NULL,

PRIMARY KEY (`home_id`),

KEY `customer_id` (`customer_id`),

KEY `city` (`city`),

KEY `num_of_beds` (`num_of_beds`),

KEY `num_of_baths` (`num_of_baths`),

KEY `geolat` (`geolat`),

KEY `geolng` (`geolng`),

KEY `account_type_id` (`account_type_id`),

KEY `display_status` (`display_status`),

KEY `sqft` (`sqft`),

KEY `price` (`price`),

KEY `primary_photo_group_id` (`primary_photo_group_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8 ;

使用以下SQL：

EXPLAIN SELECT homes.home_id,

address,

city,

state,

zip,

price,

sqft,

year_built,

account_type_id,

num_of_beds,

num_of_baths,

geolat,

geolng,

photo_id,

photo_url_dir

FROM homes

LEFT OUTER JOIN home_photos ON homes.home_id = home_photos.home_id

AND homes.primary_photo_group_id = home_photos.home_photo_group_id

AND home_photos.home_photo_type_id = 2

WHERE homes.display_status = true

AND homes.geolat BETWEEN -100 AND 100

AND homes.geolng BETWEEN -100 AND 100

EXPLAIN返回：

id select_type table type possible_keys key key_len ref rows Extra

----------------------------------------------------------------------------------------------------------

1 SIMPLE homes ref geolat,geolng,display_status display_status 1 const 2 Using where

1 SIMPLE home_photos ref home_id,home_photo_type_id,home_photo_group_id home_photo_group_id 4 homes.primary_photo_group_id 4

我不太理解如何阅读EXPLAIN命令。这看起来是好的还是坏的。目前，我没有为geolat和geolng使用复合索引。我应该使用吗？

0

3 答案

匿名的 · Answer 1 · 2023-05-03T04:39:51+00:00

当查询需要使用多个字段进行连接、过滤和选择时，应该使用复合索引。复合索引的格式如下所示：

index( column_A, column_B, column_C )

这种索引将对使用这些字段进行连接、过滤和选择的查询产生帮助。它还将对使用该复合索引的左侧子集的查询产生帮助。所以上述索引也将满足需要的查询：

index( column_A, column_B, column_C )
index( column_A, column_B )
index( column_A )

但对于需要的查询，它将没有帮助（至少不是直接的，如果没有更好的索引可能部分帮助）。注意到缺少了column_B。

在您的原始示例中，对于两个维度的复合索引，大多数情况下将对同时查询这两个维度或仅查询左侧维度的查询产生帮助，但不对仅查询右侧维度的查询产生帮助。如果您始终查询两个维度，那么使用复合索引是正确的选择，不管哪个维度放在第一位（很可能是这样）。

马克，我已更新我的原始帖子（更新2）。这是我的实际查询。我的实际数据库模式。以及EXPLAIN命令返回的结果。因此，根据这些信息，我应该使用复合索引吗？我仍然不清楚。提前谢谢。

马克，你回答中的复合索引是否满足index(column_C)？

-1因为复合索引不能帮助WHERE geolat BETWEEN ??? AND ??? AND geolng BETWEEN ??? AND ???。它将在第一个字段之后停止。来自"Question Overflow"的答案解释了为什么。

我真正想知道的是：复合索引相比于每个列上的单独索引的好处是什么？

MySQL在查询中每个表只能使用一个索引（有例外情况，例如索引合并）。这意味着查询中的表必须对所有的where条件、表连接、group-by和order-by使用单个索引。因此，每个列上的单独索引可能不总是有效，但复合索引可以实现这一点。