在搜索之间加速

Speeding up Between Search(在搜索之间加速)
本文介绍了在搜索之间加速的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 MaxMind 免费数据库进行 IP 查找.我将数据转换成下表:

I am using MaxMind free databases to do IP lookups. I convert the data to the following table:

CREATE TABLE [dbo].[GeoBlocks](
    [StartIPNum] [varchar](50) NULL,
    [EndIPNumb] [varchar](50) NULL,
    [LocationNum] [varchar](50) NULL,
    [PostalCode] [varchar](50) NULL,
    [Latitude] [varchar](50) NULL,
    [Longitude] [varchar](50) NULL)

这个查找表中大约有 350 万条记录.

There are about 3.5M records in this lookup table.

我的目标是通过查找 IP 在 StartIPNum 和 EndIPNum 之间的记录来确定 IP(十进制形式)的 LocationNum

My goal is to determine the LocationNum for an IP(decimal form) by finding the record where the IP is between StartIPNum and EndIPNum

我的存储过程如下所示:参数:@DecimalIP bigint

My stored procedure looks like this: Parameter: @DecimalIP bigint

select GeoBlocks.StartIPNum ,@DecimalIP as DecimalIp 
     ,GeoBlocks.Postalcode ,GeoBlocks.Latitude as Latitude
     ,GeoBlocks.Longitude as Longitude
     from GeoBlocks
    where @DecimalIP between GeoBlocks.StartIPNum  and GeoBlocks.EndIPNumb

我在 StartIPNum 和 EndIPNum 上创建了唯一索引.

I have created unique indexes on StartIPNum and EndIPNum.

但是,当我运行它时,SQL Server 会对查询的 Where 部分进行表扫描.此查询需要 650-750 毫秒.(我服务器上的大多数查询需要 0-2 毫秒)

However, when I run this, SQL server does a table scan for the Where portion of the query. This query takes 650-750ms. (Most queries on my server take 0-2ms)

如何加快查询速度?

添加示例数据:

StartIPNum  EndIPNumb   LocationNum PostalCode  Latitude    Longitude
1350218632  1350218639  2782113                  48.2000    16.3667
1350218640  1350218655  2782113                  48.2000    16.3667
1350218656  1350218687  2782113                  48.2000    16.3667
1350218688  1350218751  2782113                  48.2000    16.3667
1350218752  1350218783  2782113                  48.2000    16.3667

推荐答案

更新:

总结分散在各种评论中的信息:

To summarize information scattered among various comments:

  1. IP 地址列是 VarChar(50) 字符串,包含没有左填充的十进制值.这些列上的索引将按字母顺序而不是数字顺序对它们进行排序,即10"<2".(使用左填充,排序在数字上也是正确的:10">02".)

  1. The IP address columns are VarChar(50) strings containing decimal values without left padding. An index on those columns will sort them alphabetically, not numerically, i.e. "10" < "2". (With left padding the sort will be correct numerically as well: "10" > "02".)

WHERE 子句( where @DecimalIP 在 GeoBlocks.StartIPNum 和 GeoBlocks.EndIPNumb 之间)使用混合数据类型.@DecimalIP 是一个 BIGINT 而两列是 VarChar(50).SQL 通过实现数据类型优先级方案来处理混合数据类型之间的操作.(Ref.)这会导致每行中的 IP 地址被转换从字符串到 BIGINT 值,因此比较以数字方式完成,并且以相当大的成本返回预期"结果.在这种情况下,索引(几乎)毫无用处.

The WHERE clause (where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb) uses mixed datatypes. @DecimalIP is a BIGINT while the two columns are VarChar(50). SQL handles operations among mixed datatypes by implementing a data type precedence scheme. (Ref.) This causes the IP addresses in each row to be converted from strings to BIGINT values, hence the comparison is done numerically and the "expected" results are returned at a considerable cost. The indexes are (all but) useless in this case.

将列更改为 BIGINT 将允许使用索引来提高性能并确保比较按数字而不是按字母顺序进行.包含 StartIPNumEndIPNumb 列的单个索引将大大提高性能.请注意,如果不允许重叠地址范围,则索引在 StartIPNum 上将有效地唯一,并且可以用 StartIPNum 上的索引替换为 EndIPNumb作为包含列的性能.

Changing the columns to BIGINT will allow the use of an index to improve performance and ensure that comparisons are done numerically rather than alphabetically. An single index containing both the StartIPNum and EndIPNumb columns will greatly improve performance. Note that if overlapping address ranges are not allowed then the index will effectively be unique on StartIPNum and could be replaced with an index on StartIPNum with EndIPNumb as an included column for performance.

原答案:

如果您使用点号表示的 IPV4 地址,例如192.168.0.42",您可以使用此 UDF 将字符串转换为 BIGINT 值:

If you are using IPV4 addresses in dotted notation, e.g. "192.168.0.42", you can convert the strings into BIGINT values with this UDF:

create function [dbo].[IntegerIPV4Address]( @IPV4Address VarChar(16) )
  returns BigInt
  with SchemaBinding
  begin
  declare @Dot1 as Int = CharIndex( '.', @IPV4Address );
  declare @Dot2 as Int = CharIndex( '.', @IPV4Address, @Dot1 + 1 );
  declare @Dot3 as Int = CharIndex( '.', @IPV4Address, @Dot2 + 1 );
  return Cast( Substring( @IPV4Address, 0, @Dot1 ) as BigInt ) * 0x1000000 +
    Cast( Substring( @IPV4Address, @Dot1 + 1, @Dot2 - @Dot1 - 1 ) as BigInt ) * 0x10000 +  
    Cast( Substring( @IPV4Address, @Dot2 + 1, @Dot3 - @Dot2 - 1 ) as BigInt ) * 0x100 +
    Cast( Substring( @IPV4Address, @Dot3 + 1, Len( @IPV4Address ) * 1 ) as BigInt );
  end

您可以根据函数结果存储整数值或在计算列上创建索引.请注意,您需要更改查询以引用 WHERE 子句中的整数列.

You can either store the integer values or create an index on a computed column based on the functions result. Note that you need to change your query to reference the integer column in the WHERE clause.

如果您将值存储为整数,以下函数会将它们转换回规范化字符串,其中地址的每个部分都是三位数.这些值可用于比较,因为它们将按字母顺序和数字顺序排序.

If you store the values as integers the following function will convert them back to normalized strings where each part of the address is three digits. These values can be used in comparisons since they will sort the same way both alphabetically and numerically.

create function [dbo].[NormalizedIPV4Address]( @IntegerIPV4Address as BigInt )
  returns VarChar(16)
  with SchemaBinding -- Deterministic function.
  begin
  declare @BinaryAddress as VarBinary(4) = Cast( @IntegerIPV4Address as VarBinary(4) );
  return Right( '00' + Cast( Cast( Substring( @BinaryAddress, 1, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 2, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 3, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 4, 1 ) as Int ) as VarChar(3) ), 3 )
  end

您可以对表中的字符串值进行往返,将它们全部转换为规范化"形式,以便使用这两个函数对它们进行正确排序.不是一个理想的解决方案,因为它需要对所有未来的插入和更新进行规范化,但目前可能会有所帮助.

You could round-trip the string values in your table to get them all into "normalized" form so that they sort correctly by using both functions. Not an ideal solution since it requires that all future inserts and updates be normalized, but it may help for the moment.

这篇关于在搜索之间加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
SSIS: Model design issue causing duplications - can two fact tables be connected?(SSIS:模型设计问题导致重复-两个事实表可以连接吗?)
SQL Server Graph Database - shortest path using multiple edge types(SQL Server图形数据库-使用多种边类型的最短路径)
Invalid column name when using EF Core filtered includes(使用EF核心过滤包括时无效的列名)
How should make faster SQL Server filtering procedure with many parameters(如何让多参数的SQL Server过滤程序更快)
How can I generate an entity–relationship (ER) diagram of a database using Microsoft SQL Server Management Studio?(如何使用Microsoft SQL Server Management Studio生成数据库的实体关系(ER)图?)