问题描述
当我们的 Azure SQL 数据库性能偶尔显着下降时,我们会在生产中解决这个问题.我们知道我们在其中一张表上有锁,但这些锁不是死锁,它们是长锁,大约一个小时后性能恢复正常.我们正在尝试找到所有可能的情况来了解如何获得这些长锁(每个查询都非常快,所有性能分析器都可以向我们展示导致长锁的原因).这个问题的原因如下图:
We fight the issue in production when once in a while our Azure SQL database performance significantly degrades. We know we have locks on one of the tables, but these locks are not deadlocks, they are long locks and in an hour or so the performance returns to normal. We are trying to find all the possible scenarios on how we get these long locks (every query is super fast and all performance analyzers could show us what causes long locks). The reason for this question is the picture below:
Out 连接池设置仅允许池化 200 个连接.大多数时候,我们与数据库有大约 10-20 个打开/池连接.然后突然一些活动连接开始增长并且池被完全占用.虽然池中的连接数保持在 200 以下,但我们看到使用 sp_who2
的活动连接数达到 1.5k-2k 连接(有时是 4k-5k).
Out connection pool settings allow only 200 connections to be pooled. And most of the times we have about 10-20 open/pooled connections with the database. Then suddenly a number of active connections start to grow and the pool is completely taken. While a number of pooled connections stay below 200, we see a number of active connections using sp_who2
reach 1.5k-2k connections (sometimes 4k-5k).
我使用 Azure 门户监控工具构建了相同的图表.它具有不同的聚合期,但显示相同的问题:
I've built the same chart using Azure Portal monitoring tools. It has different aggregation period but shows the same issue:
我们使用的连接字符串:
the connection string we use:
数据源=[服务器].database.windows.net;初始目录=[数据库];持久安全信息=真;用户id=[用户];密码=[密码];MultipleActiveResultSets=True;连接Timeout=30;Max Pool Size=200;Pooling=True;App=[AppName]
Data Source=[server].database.windows.net;initial catalog=[database];persist security info=True;user id=[user];password=[password];MultipleActiveResultSets=True;Connection Timeout=30;Max Pool Size=200;Pooling=True;App=[AppName]
考虑到 200 个连接的连接池限制,这怎么可能?
ps:没有周期性任务、长时间运行的查询或其他工具在做任何事情,我们使用 sp_who2
检查了与数据库的所有活动连接.
ps: there is no periodic task, long running query or other tool doing anything, we checked with sp_who2
all the active connections to the database.
推荐答案
[这更像是一个长评论而不是一个答案]
[this is more of a long comment than an answer]
我确实有几台主机连接到同一个数据库,但每台主机都有相同的 200 个连接限制
I do have several hosts connected to the same database but each host has the same limitation of 200 connections
连接池是 per (Connection String,AppDomain).每个服务器可能有多个 AppDomain.每个 AppDomain 的每个连接字符串都有一个连接池.所以在这里,如果你有不同的用户/密码组合,它们会生成不同的连接池.因此,为什么可以有超过 200 个连接,这并不神秘.
The connection pool is per (Connection String,AppDomain). Each Server might have multiple AppDomains. And each AppDomain will have one connection pool per connection string. So here if you have different user/password combos, they will generate different connection pools. So no real mystery why it is possible to have more than 200 connections.
那么为什么你会得到很多连接?可能的原因:
So why are you getting lots of connections? Possible causes:
连接泄漏.
如果您未能处理 DbContext 或 SqlConnection,则该连接将在托管堆上逗留直到最终确定,并且不可重用.当连接池达到其限制时,新的连接请求将等待 30 秒以使连接可用,然后失败.
If you are failing to Dispose a DbContext or a SqlConnection that connection will linger on the managed heap until finalized, and not be available for reuse. When a connection pool reaches its limit, new connection request will wait for 30sec for a connection to become available, and fail after that.
在这种情况下,您不会在服务器上看到任何等待或阻塞.会话都将处于空闲状态,而不是等待.并且不会有大量的请求在
You will not see any waits or blocking on the server in this scenario. The sessions will all be idle, not waiting. And there would not be a large number of requests in
select *
from sys.dm_exec_requests
请注意,会话等待统计信息现在在 Azure SQL DB 上运行,因此更容易查看实时阻塞和等待情况.
Note that Session Wait Stats are now live on Azure SQL DB, so it's much easier to see realtime blocking and waits.
select *
from sys.dm_exec_session_wait_stats
阻塞.
如果传入请求开始被某些事务阻止,并且新请求不断启动,您的会话数可能会增加,因为新请求会获得新会话、启动请求并被阻止.在这里,您会在
If incoming requests begin to be blocked by some transaction, and new requests keep starting, your number of sessions can grow, as new requests get new sessions, start requests and become blocked. Here you would see lots of blocked requests in
select *
from sys.dm_exec_requests
慢查询.
如果请求由于资源可用性(CPU、磁盘、日志)而需要很长时间才能完成,您可以看到这一点.但这不太可能,因为在此期间您的 DTU 使用率很低.
If requests were just talking a long time to finish due to resourse availability (CPU, Disk, Log), you could see this. But that's unlikely as your DTU usage is low during this time.
因此,下一步是查看这些连接在服务器上是否处于活动状态,表明存在阻塞,或者在服务器上处于空闲状态,表明存在连接池问题.
So the next step for you is to see if these connections are active on the server suggesting blocking, or idle on the server suggesting a connection pool problem.
这篇关于活动 Azure Sql 连接超出连接池限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!