从 Sqlite 到 SQL Server 读取大量数据在预执行时失败

Reading Huge volume of data from Sqlite to SQL Server fails at pre-execute(从 Sqlite 到 SQL Server 读取大量数据在预执行时失败)
本文介绍了从 Sqlite 到 SQL Server 读取大量数据在预执行时失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的 (26GB) sqlite 数据库,我想用 SSIS 将它导入到 SQL Server.

我已经正确设置了所有内容.部分数据流工作正常并正在导入数据.

数据流很简单.它们仅由源和目标组成.

但是当涉及到一个有 8000 万行的表时,数据流会失败并显示以下无用消息:

<块引用>

代码:0xC0047062
来源:数据流任务源 9 - nibrs_bias_motivation [55]
说明:System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] 未知错误 (7)

在 System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
在 System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior 行为,String 方法,Boolean needReader,Object[] methodArguments,SQL_API odbcApiMethod)
在 System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior 行为,String 方法,Boolean needReader)
在 System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior 行为)
在 System.Data.Odbc.OdbcCommand.ExecuteDbDataReader(CommandBehavior 行为)
在 System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior 行为)
在 Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
在 Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 包装器)

而在此任务失败之前,内存使用率上升到 99%,然后任务失败.这让我觉得这是一个内存问题.但我不知道我该如何解决这个问题.

我尝试在所有数据流任务上将 DelayValidation 设置为 true.没有改变.我玩过缓冲区大小.没什么.

我能做什么?

解决方案

分步指南

由于从大数据集读取时抛出错误,尝试分块读取数据,可以按照以下步骤实现:

  1. 声明 2 个 Int32 类型的变量(@[User::RowCount]@[User::IncrementValue])
  2. 添加一个 Execute SQL Task 来执行一个 select Count(*) 命令并将结果集存储到变量 @[User::RowCount]

  1. 添加具有以下首选项的 For 循环:

  1. 在for循环容器内添加一个数据流任务
  2. 在数据流任务中添加ODBC SourceOLEDB Destination
  3. 在 ODBC Source 中选择 SQL Command 选项并编写一个 SELECT * FROM TABLE 查询 *(仅检索元数据`
  4. 映射源和目标之间的列
  5. 返回Control flow并点击Data flow task并点击F4查看属性窗口
  6. 在属性窗口中,转到表达式并将以下表达式分配给 [ODBC Source].[SQLCommand] 属性:(有关更多信息,请参阅

    参考文献

    • ODBC 源 - SQL Server
    • 如何在 ODBC SQLCommand 表达式中传递 SSIS 变量?
    • 如何使用 SSIS ODBC 源以及 OLE DB 和 ODBC 之间的区别?
    • SQLite 限制

    I have a huge (26GB) sqlite database that I want to import to SQL Server with SSIS.

    I have everything setup correctly. Some of the data flows are working correctly and importing the data.

    Data flows are simple. They just consist of source and destination.

    But when it comes to a table that has 80 million rows, data flow fails with this unhelpful message:

    Code: 0xC0047062
    Source: Data Flow Task Source 9 - nibrs_bias_motivation [55]
    Description: System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] unknown error (7)

    at System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
    at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod)
    at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader)
    at System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior)
    at System.Data.Odbc.OdbcCommand.ExecuteDbDataReader(CommandBehavior behavior)
    at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
    at Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    And before this task fails, memory usage goes up to 99%, then the task fails. This made me think its a memory issue. But I don't know how can I solve this.

    I tried setting DelayValidation to true on all data flow tasks. Nothing changed. I played with the buffer sizes. Nothing.

    What can I do?

    解决方案

    Step by Step guide

    Since the error is thrown when reading from a large dataset, try reading data by chunks, to achieve that you can follow these steps:

    1. Declare 2 Variables of type Int32 (@[User::RowCount] and @[User::IncrementValue])
    2. Add an Execute SQL Task that execute a select Count(*) command and store the Result Set into the variable @[User::RowCount]

    1. Add a For Loop with the following preferences:

    1. Inside the for loop container add a Data flow task
    2. Inside the dataflow task add an ODBC Source and OLEDB Destination
    3. In the ODBC Source select SQL Command option and write a SELECT * FROM TABLE query *(to retrieve metadata only`
    4. Map the columns between source and destination
    5. Go back to the Control flow and click on the Data flow task and hit F4 to view the properties window
    6. In the properties window go to expression and Assign the following expression to [ODBC Source].[SQLCommand] property: (for more info refer to How to pass SSIS variables in ODBC SQLCommand expression?)

      "SELECT * FROM MYTABLE ORDER BY ID_COLUMN
      LIMIT 500000
      OFFSET " + (DT_WSTR,50)@[User::IncrementValue]"
      

    Where MYTABLE is the source table name, and IDCOLUMN is your primary key or identity column.

    Control Flow Screenshot

    References

    • ODBC Source - SQL Server
    • How to pass SSIS variables in ODBC SQLCommand expression?
    • HOW TO USE SSIS ODBC SOURCE AND DIFFERENCE BETWEEN OLE DB AND ODBC?
    • SQLite Limit

    这篇关于从 Sqlite 到 SQL Server 读取大量数据在预执行时失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
SSIS: Model design issue causing duplications - can two fact tables be connected?(SSIS:模型设计问题导致重复-两个事实表可以连接吗?)
SQL Server Graph Database - shortest path using multiple edge types(SQL Server图形数据库-使用多种边类型的最短路径)
Invalid column name when using EF Core filtered includes(使用EF核心过滤包括时无效的列名)
How should make faster SQL Server filtering procedure with many parameters(如何让多参数的SQL Server过滤程序更快)
FastAPI + Tortoise ORM + FastAPI Users (Python) - Relationship - Many To Many(FastAPI+Tortoise ORM+FastAPI用户(Python)-关系-多对多)