貉之丘: 这外包实在是太垃圾了

2025-08-19

这外包实在是太垃圾了

图片由Google Gemini生成

老板之前找某个外包团队做的一个项目，今天把阿里云RDS搞爆了，找我来看。

看了一下，是下午两点半左右爆掉的。空间撑爆了被「锁定」了。看了下监控，把空间撑爆的是temp_file，临时文件。
那个时间段有不少慢查询，找了SQL来看。一看到我就开始骂娘了：

select DISTINCT * from
        (select
        t.business_id id,
        any_value(t.type) type,
        t.`user` `user`,
        any_value((case
        when t.type='1' then '文章'
        when t.type='2' then '公告'
        when t.type='3' then '问答'
        end))  typeName,
        any_value((case when (select a.clone_num from tb_article a where t.business_id = a.id) is null then 0 else (select a.clone_num from tb_article a where t.business_id = a.id) end)) cloneNum,
        any_value((select a.title from tb_article a where a.id = t.business_id)) title,
        any_value((select a.content from tb_article a where a.id = t.business_id)) content,
        any_value((select a.publish_time from tb_article a where a.id = t.business_id)) publishTime,
        any_value((select a.essence from tb_article a where a.id = t.business_id)) essence,
        any_value((select a.`status` from tb_article a where a.id = t.business_id)) `status`,
        any_value((select a.top from tb_article a where a.id = t.business_id)) top,
        any_value((select count(c.content) from tb_comment c where c.business_id = t.business_id)) comments,
        any_value((select count(l.id) from tb_like l where l.business_id = t.business_id)) likes,
        any_value((select count(f.id) from tb_focus f where f.business_id = t.business_id)) focus,
        any_value((select a.pageview_num from tb_article a where a.id = t.business_id)) viewCount,
        any_value((select tba.auther from tb_article tba where tba.id = t.business_id)) userId,
        any_value((select m.auth_user_id nickName from tb_member m
        where m.id = (select a.auther from tb_article a where a.id = t.business_id))) nickName,
        any_value(t.browsed_date) browsedDate
        from tb_browsed_records t group by t.`user`, t.business_id ) temp
         WHERE  temp.user = 494 
        order by temp.browsedDate desc,temp.top desc limit 0,1

已经不想浪费我的时间了，找了AI来帮忙吐槽。以下是Google Gemini的输出：

这个SQL看起来像是初学者写的，槽点确实不少。下面我们来一起“欣赏”一下。

槽点一：子查询泛滥成灾，性能杀手！ 🔪

这个查询里，几乎每一个字段都用了一个独立的子查询来获取数据，像这样：
any_value((select a.title from tb_article a where a.id = t.business_id)) title,
这简直是性能灾难！数据库每查询一个字段，都要重新去 tb_article 表里查一次，而且还不是一次查所有需要的字段，是查一次 title，再查一次 content，再查一次 publish_time……这就好像你想买菜，不是一次把所有菜买齐，而是买完一个土豆跑回家，再回去买一根黄瓜，再跑回家……

正确的做法是使用 JOIN！将 tb_browsed_records 表和 tb_article 表连接起来，所有字段一次性查出来，效率高得多。
SELECT
  t.business_id AS id,
  t.type,
  t.`user`,
  a.title,
  a.content,
  a.publish_time,
  -- ...其他字段
FROM
  tb_browsed_records AS t
JOIN
  tb_article AS a ON t.business_id = a.id
-- ...其他连接
槽点二：DISTINCT + GROUP BY，多此一举

你看这个查询，外面有个 DISTINCT *，里面又有 GROUP BY t.user, t.business_id。

DISTINCT 的作用是去重，GROUP BY 的作用是分组并聚合。GROUP BY 本身就会对分组字段进行去重。所以，在外层加 DISTINCT * 是完全没有必要的，反而可能会增加额外的计算开销。这就像你把一堆苹果已经按品种分好类了，然后又再来一遍“去重”操作，完全是重复劳动。

槽点三：any_value 的滥用

这个查询里大量使用了 any_value。any_value 的作用是在 GROUP BY 聚合时，从一个组中任意取一个值。对于 t.type, t.user, t.business_id 这种已经在 GROUP BY 里的字段，直接写就行了，不需要加 any_value。
any_value((select a.title from tb_article a where a.id = t.business_id)) title
像这种对子查询结果使用 any_value 更是让人费解。它不仅没有解决任何问题，反而让代码看起来更复杂。

槽点四：重复的子查询，浪费资源
any_value((select a.auther from tb_article a where a.id = t.business_id))
和
any_value((select m.auth_user_id nickName from tb_member m where m.id = (select a.auther from tb_article a where a.id = t.business_id)))
这两个子查询都重复查询了 a.auther。完全可以在第一次查询到 a.auther 后，直接用这个值去查 tb_member 表，而不是再查一次。这就像你去问一个人的名字，问完后又回到原点，再问一次这个人是谁，然后再去找他朋友……

Google Gemini还帮我重写了SQL。写得挺好，执行速度至少快两个数量级。原来那个用了太多的子查询，就算逻辑上不该慢这么多，实际上就得慢这么多，而且还时间空间双杀。

这种写法，我也不认为是甲方逐步追加需求导致的。这就是不合格的程序员导致的，没有任何借口！

最后感慨一句：外包跟AI一样，你得要能驾驭。

貉之丘

善意提醒

2025-08-19

这外包实在是太垃圾了

槽点一：子查询泛滥成灾，性能杀手！ 🔪

槽点二：`DISTINCT` + `GROUP BY`，多此一举

槽点三：`any_value` 的滥用

槽点四：重复的子查询，浪费资源

没有评论:

发表评论

善意提醒

2025-08-19

这外包实在是太垃圾了

槽点一：子查询泛滥成灾，性能杀手！ 🔪

槽点二：DISTINCT + GROUP BY，多此一举

槽点三：any_value 的滥用

槽点四：重复的子查询，浪费资源

没有评论:

发表评论

槽点二：`DISTINCT` + `GROUP BY`，多此一举

槽点三：`any_value` 的滥用