MySQL 分组统计与联合查询

attachment/2018/01/15/9901516025745.jpg

分组统计相关

count() 统计记录数

统计对整张表的所有行的数目，含有 NULL 值的行也会被计入。

统计 student 表中所有行数，用法如下：

select count(1) from student;

count(1) 与 count(*) 是一样的，这里的 1 没有特殊意义，也可以写成 count(2)，count(3)，不过某些数据库会对 count(1) 进行优化，从而导致 count(1) 比 count(*) 更快。

由于不知道 MySQL 具体优化没优化，所以建议尽量使用 count(1) 而不使用 count(*) 进行查询。

统计指定字段的所有行的数目，该字段值为 NULL 的行不会被计入，用法如下：

select count(id) from student;

优化建议：如果某张表 count 用的比较多时，可以在最短的字段建立一个辅助索引会极大地提升 count 性能。

min() 统计指定字段的最小值

统计 student 表中 age 字段的最小值

select min(age) from student;

max() 统计指定字段的最大值

统计 student 表中 age 字段的最大值

select max(age) from student;

avg() 统计指定字段的平局值

统计student表中age字段的平均值

select avg(age) from student;

sum() 统计指定字段的和

统计 student 表中 age 字段的和

select sum(age) from student;

group by 对指定字段进行分组

统计student表中每个年龄的总数

select age,count(age) from student group by age;

having 对分组后的数据进行条件筛选

与 where 相似，但是只针对分组后的数据，where 无法筛选聚合函数。

统计 student 表中年龄大于 20 的每个年龄的总数

select age,count(age) from student group by age having age > 20;

联合查询

笛卡尔积

笛卡尔（Descartes）乘积又叫直积。假设集合A={a,b}，集合B={0,1,2}，则两个集合的笛卡尔积为 {(a,0),(a,1),(a,2),(b,0),(b,1), (b,2)}，可以扩展到多个集合的情况。

类似的例子有，如果 A 表示某学校学生的集合，B表示该学校所有课程的集合，则A与B的笛卡尔积表示所有可能的选课情况。

MySQL 中有如下 4 种写法可以获得笛卡尔积，运行效果是等同的：

select * from student, class;

select * from student inner join class;

select * from student cross join class;

select * from student join class;

inner join 内连接

也叫等值连接，查询符合 A 表与 B 表匹配关系的数据。

例如，查询出 name 为 leo 并且含有所在班级信息的数据：

select * from student inner join class on student.cid = class.id where student.name = 'leo';
+----+------+-----+----+-----------+
| id | name | cid | id | classname |
+----+------+-----+----+-----------+
|  6 | leo  |   2 |  2 | 二班      |
+----+------+-----+----+-----------+
1 row in set (0.00 sec)

join 与 inner join 是一样的效果，所以在使用 inner join 时可将 `inner 省略。

显式与隐式内连接

显式内连接（explicit）的写法如下：

select * from student join class on student.cid = class.id;

隐式内连接（implicit）的写法如下：

select * from student,class where student.cid = class.id;

其实以上两种写法在运行效果上也是等同的。

left join 左连接

从左表产生一套完整的记录，与匹配的记录（右表），如果没有匹配，则结果的右侧将显示为 null。

例如，根据 student 表的 id 字段进行匹配：

select * from student left join class on student.id = class.id
+----+----------------+-----+------+-----------+
| id | name           | cid | id   | classname |
+----+----------------+-----+------+-----------+
|  1 | Alicia Keys    |   3 |    1 | 一班       |
|  2 | Bob            |   4 |    2 | 二班       |
|  3 | Alice          |   2 |    3 | 三班       |
|  4 | Beyonce        |   1 |    4 | 四班       |
|  5 | Marilyn Manson |   4 |    5 | 五班       |
|  6 | leo            |   2 | NULL | NULL      |
+----+----------------+-----+------+-----------+
6 rows in set (0.00 sec)

left join 与 left outer join 是一样的效果，所以在使用 left outer join 时可将 outer省略。

right join 右连接

从右表产生一套完整的记录，与匹配的记录（左表），如果没有匹配，则结果的左侧将包含 null。

例如，根据 class 表的 classname 字段进行匹配：

select * from student right join class on student.id = class.classname
+------+------+------+----+-----------+
| id   | name | cid  | id | classname |
+------+------+------+----+-----------+
| NULL | NULL | NULL |  1 | 一班         |
| NULL | NULL | NULL |  2 | 二班          |
| NULL | NULL | NULL |  3 | 三班          |
| NULL | NULL | NULL |  4 | 四班         |
| NULL | NULL | NULL |  5 | 五班          |
+------+------+------+----+-----------+
5 rows in set (0.00 sec)

full join 全连接

产生的所有记录（双方匹配记录）在表 A 和表 B。如果一张表没有匹配，则另一张表将包含 null。

由于 MySQL 本身并没有提供 full join 关键词，所以只能使用 union 关键词来模拟 full join，例如：

select * from student left join class on student.id = class.id
union
select * from student right join class on student.id = class.classname
+------+----------------+------+------+-----------+
| id   | name           | cid  | id   | classname |
+------+----------------+------+------+-----------+
| NULL | NULL           | NULL |    1 | 一班         |
| NULL | NULL           | NULL |    2 | 二班          |
| NULL | NULL           | NULL |    3 | 三班          |
| NULL | NULL           | NULL |    4 | 四班         |
| NULL | NULL           | NULL |    5 | 五班          |
|    1 | Alicia Keys    |    3 |    1 | 一班         |
|    2 | Bob            |    4 |    2 | 二班          |
|    3 | Alice          |    2 |    3 | 三班          |
|    4 | Beyonce        |    1 |    4 | 四班         |
|    5 | Marilyn Manson |    4 |    5 | 五班          |
|    6 | leo            |    2 | NULL | NULL      |
+------+----------------+------+------+-----------+
11 rows in set (0.00 sec)

自连接

与表自身进行检索操作，换句话说，就是与另一张一模一样的表进行 join 关联操作。

假设在不知道 Leo 同学班级 id 的情况下查询出 student 表中 Leo 的同班同学，有如下两种方式：

子查询

select name from student where cid in (select cid from student where name = 'Leo') and name != 'Leo';

自连接

select a.name from student as a join student as b on a.cid = b.cid where b.name = 'Leo' and a.name != 'Leo';

可以看出子查询的方式需要查询 2 次 student 表，而自连接的方式是笛卡尔积，只需要查询一次 student 表，效率方面肯定是高于子查询方式的，所以在能够使用自连接方式查询的时候尽量不要使用子查询。