问题描述
我需要使用带有 SSIS 的 CSV 文件创建一个数据库.CSV 文件包括四列:
I need to create a database using a CSV file with SSIS. The CSV file includes four columns:
我需要使用那个表的信息来填充我下面用SQL创建的三个表.
I need to use the information of that table to populate the three tables I created in SQL below.
我意识到我需要使用员工表的一列 EmployeeNumber 和组表
GroupID 来填充 EmployeeGroup 表.为此,我认为我需要一个Join Merge表,但我在SSIS中创建了数据流任务,结果相同,没有显示数据.
I have realized that what I need is to use one column of the Employee Table, EmployeeNumber
, and Group Table, GroupID
, to populate the EmployeeGroup table. For that, I thought that a Join Merge table is what I needed, but I created the Data Flow Task in SSIS, and the results are the same, no data displayed.
中间的表是用来关联其他表的.
The middle table is the one used to relate the other tables.
我在 SSIS 中创建了包,并且填充了 Employee 和 Group 表,但没有填充 EmployeeGroup 表.EmployeeGroup 将只显示没有数据的 EmployeeNumber 和 Group ID 列.
I created the package in SSIS and the Employee and Group Tables are populated, but the EmployeeGroup table is not. EmployeeGroup will only show the EmployeeNumber and Group ID columns with no data.
我是 SSIS 的新手,我真的不知道还能做什么.我将非常感谢您的帮助.
I am new using SSIS, and I really do not know what else to do. I will really appreciate your help.
推荐答案
概述
- 使用 SSIS 的解决方案
- 使用 3 个数据流任务
- 使用 2 个数据流任务
- 使用 Microsoft.Ace.OLEDB
- 使用 Microsoft 文本驱动程序
第一个解决方案 - SSIS
使用 3 个数据流任务
这可以只使用 2 个数据流任务来完成,但是根据问题中提到的 OP 我是使用 SSIS 的新手,我真的不知道还能做什么
,我将提供最简单的解决方案,即 3 DataFlow Task 以避免使用更多组件,例如 MultiCast
.
1st Solution - SSIS
Using 3 Data Flow Tasks
This can be done using only 2 Data Flow Task, but according to what the OP mentioned in the question I am new using SSIS, and I really do not know what else to do
, i will provide easiest solution which is 3 DataFlow Task to avoid using more components like MultiCast
.
因为你想建立一个关系数据库并从csv中提取关系,你必须读取csv 3次-将其视为3个单独的文件-.
Because you want to build a relational database and extract relations from the csv, you have to read the csv 3 times -consider it as 3 seperated files -.
首先要导入Employees和Groups数据,然后导入它们之间的关系表.
First you have to import Employees and Groups Data, Then you have to import the relation table between them.
每个导入步骤都可以在单独的数据流任务中完成
Each Import step can be done in a seperate Data Flow Task
- 添加平面文件连接管理器(Csv 文件)
- 添加OLEDB 连接管理器(SQL 目标)
- 添加 3 个 DataFlow 任务,如下图所示
- Add a Flat File connection Manager (Csv File)
- Add An OLEDB connection Manager (SQL Destination)
- Add 3 DataFlow Task like the image below
- 添加一个平面文件源、一个脚本组件、OLEDB目标,如下图所示
- Add a Flat File Source , a Script Component , OLEDB destination like shown in the image below
- 在 Script Component 中选择 Group Name 列作为 Input
- In the Script Component choose Group Name column as Input
- 选择输出缓冲区并将
SynchronousInputID Property
更改为None
并添加类型为<的输出列OutGroupname
代码>DT_STR
- Select the Output Buffer and change
SynchronousInputID Property
toNone
And add an output columnOutGroupname
with typeDT_STR
在 Script 部分写入以下代码:
Imports System.Collections.Generic
Private m_List As New List(Of String)
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Not Row.GroupName_IsNull AndAlso
Not String.IsNullOrEmpty(Row.GroupName.Trim) Then
If Not m_List.Contains(Row.GroupName.Trim) Then
m_List.Add(Row.GroupName.Trim)
CreateOutputRows(Row.GroupName.Trim)
End If
End If
End Sub
Public Sub CreateOutputRows(ByVal strValue As String)
Output0Buffer.AddRow()
Output0Buffer.OutGroupName = strValue
End Sub
在 OLEDB Destination 映射 OutGroupName
到 GroupName
Column
On the OLEDB Destination map OutGroupName
to GroupName
Column
- 重复对
Groupname
Column 完成的相同步骤:唯一的区别是您必须选择EmployeeID
、Employee Name
、LoginName
列作为 Script Component 中的输入,并在比较中使用ID
列代替Groupname
列
- Repeat the same steps done with
Groupname
Column : with a single difference that is you have to choose theEmployeeID
,Employee Name
,LoginName
columns as Input in the Script Component and Use theID
Column instead ofGroupname
column in the comparaison
- 你必须添加一个平面文件源、查找转换、OLEDB目标
在LookUp Transformation组件中选择
Groups
表作为查找表
In The LookUp Transformation Component select
Groups
Table as a Lookup table
映射 GroupName
列并获取 Group ID
作为输出
Map GroupName
Columns and Get Group ID
as output
在错误输出配置
在Oledb Destination映射列如下
注意:GroupID
必须是Identity(在sql server中设置)
Note: GroupID
must be an Identity (set it in sql server)
您必须执行与 3 个数据流任务解决方案相同的步骤,但不是将 2 个数据流任务添加到 Group
和 Employee
,只需添加一个数据流任务,在 Flat File Source
之后添加一个 MultiCast
组件来复制 Flow.然后,对于第一个流,使用 Employee
数据流任务中使用的相同 Script Component
和 OLEDB Destination
,对于第二个流,使用 Script Component
和 OLEDB Destination
code>Script Component 和 OLEDB Destination
与 Group
相关.
You have to do the same steps as the 3 Data Flow Tasks solution, but instead of adding 2 Data Flow Tasks to Group
and Employee
, just add one Data Flow Task, and after the Flat File Source
add a MultiCast
component to duplicate the Flow. Then for the first flow use the same Script Component
and OLEDB Destination
used in the Employee
Data Flow Task, and for the second flow use the Script Component
and OLEDB Destination
related to Group
.
通过 T-SQL 命令将平面文件导入 SQL 的方法有很多
There are many method to import Flat file to SQL via T-SQL commands
假设安装的 Microsoft ACE OLEDB 版本是 Microsoft.ACE.OLEDB.12.0
并且 csv 文件位置是 C:abc.csv
Assuming that the installed version of Microsoft ACE OLEDB is Microsoft.ACE.OLEDB.12.0
and that the csv file location is C:abc.csv
首先将数据导入员工和组表
INSERT INTO [GROUP]
([Group Name])
SELECT
[Group Name]
FROM
OPENROWSET
(
'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv'
) t
INSERT INTO [Employee]
([Employee Number],[Employee Name],[LoginName])
SELECT
[Employee Number],[Employee Name],[LoginName]
FROM
OPENROWSET
(
'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv'
) t
导入 Employee_Group 数据
INSERT INTO [EmployeeGroup]
([Employee Number],[GroupID])
SELECT
t1.[Employee Number],t2.[GroupID]
FROM
OPENROWSET
(
'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv'
) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]
带有 Microsoft 文本驱动程序的 OPENROWSET
首先将数据导入员工和组表
INSERT INTO [GROUP]
([Group Name])
SELECT
[Group Name]
FROM
OPENROWSET
(
'MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=C:;',
'SELECT * FROM abc.csv'
) t
INSERT INTO [Employee]
([Employee Number],[Employee Name],[LoginName])
SELECT
[Employee Number],[Employee Name],[LoginName]
FROM
OPENROWSET
(
'MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=C:;',
'SELECT * FROM abc.csv'
) t
导入 Employee_Group 数据
INSERT INTO [EmployeeGroup]
([Employee Number],[GroupID])
SELECT
t1.[Employee Number],t2.[GroupID]
FROM
OPENROWSET
(
'MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=C:;',
'SELECT * FROM abc.csv'
) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]
注意:您可以将数据导入临时表,然后查询该表,以避免多次连接到 csv 文件
导入 csv 文件到 SQL Server 的方法有很多,您可以查看以下链接以获取更多信息.
There are many method to import csv files to SQL server, you can check the following links for additional informations.
- 使用 PowerShell 将 CSV 文件导入 SQL Server 的四种简单方法
- 如何从 .csv 导入数据在 SQL Server 中使用 PowerShell?
- OPENROWSET (Transact-SQL)一个>
- T-SQL – 使用 OpenRowSet 读取 CSV 文件
- 使用 Openrowset 导入错误
这篇关于从 CSV 文件填充 SQL 数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!