问题描述
我将 SQLAlchemy 与 MySQL 数据库一起使用,我想计算表中的行数(大约 300k).SQLAlchemy count 函数需要大约 50 倍的时间直接在 MySQL 中编写相同的查询来运行.我做错了什么吗?
I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?
# this takes over 3 seconds to return
session.query(Segment).count()
然而:
SELECT COUNT(*) FROM segments;
+----------+
| COUNT(*) |
+----------+
| 281992 |
+----------+
1 row in set (0.07 sec)
速度差异随着表的大小而增加(在 100k 行下几乎不明显).
The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).
更新
使用 session.query(Segment.id).count()
而不是 session.query(Segment).count()
似乎可以解决问题并得到它加快速度.我仍然不明白为什么初始查询会变慢.
Using session.query(Segment.id).count()
instead of session.query(Segment).count()
seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.
推荐答案
不幸的是,MySQL 对子查询的支持非常糟糕,这对我们产生了非常负面的影响.SQLAlchemy 文档 指出可以使用 query(func.count(Segment.id))
实现优化"查询:
Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id))
:
返回此查询将返回的行数.
Return a count of rows this Query would return.
这将为此查询生成 SQL,如下所示:
This generates the SQL for this Query as follows:
SELECT count(1) AS count_1 FROM (
SELECT <rest of query follows...> ) AS anon_1
为了对要计数的特定列进行细粒度控制,跳过使用子查询或以其他方式控制 FROM 子句,或使用其他聚合函数,与 func 表达式结合使用查询(),即:
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:
from sqlalchemy import func
# count User records, without
# using a subquery.
session.query(func.count(User.id))
# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).
group_by(User.name)
from sqlalchemy import distinct
# count distinct "name" values
session.query(func.count(distinct(User.name)))
这篇关于为什么 SQLAlchemy count() 比原始查询慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!