问题描述
以下是我需要帮助的代码.我必须运行超过 1,300,000 行,这意味着插入 ~300,000 行需要 40 分钟.
Below is my code that I'd like some help with. I am having to run it over 1,300,000 rows meaning it takes up to 40 minutes to insert ~300,000 rows.
我认为批量插入是加速它的途径?还是因为我通过 for data in reader:
部分遍历行?
I figure bulk insert is the route to go to speed it up?
Or is it because I'm iterating over the rows via for data in reader:
portion?
#Opens the prepped csv file
with open (os.path.join(newpath,outfile), 'r') as f:
#hooks csv reader to file
reader = csv.reader(f)
#pulls out the columns (which match the SQL table)
columns = next(reader)
#trims any extra spaces
columns = [x.strip(' ') for x in columns]
#starts SQL statement
query = 'bulk insert into SpikeData123({0}) values ({1})'
#puts column names in SQL query 'query'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print 'Query is: %s' % query
#starts curser from cnxn (which works)
cursor = cnxn.cursor()
#uploads everything by row
for data in reader:
cursor.execute(query, data)
cursor.commit()
我故意动态选择我的列标题(因为我想创建尽可能多的 Pythonic 代码).
I am dynamically picking my column headers on purpose (as I would like to create the most pythonic code possible).
SpikeData123 是表名.
SpikeData123 is the table name.
推荐答案
更新 - 2021 年 7 月:bcpyaz 是 Microsoft 的 bcp
实用程序的包装器.
Update - July 2021: bcpyaz is a wrapper for Microsoft's bcp
utility.
更新 - 2019 年 4 月:如@SimonLang 的评论所述,SQL Server 2017 及更高版本下的 BULK INSERT
显然支持 CSV 文件中的文本限定符(参考:此处).
Update - April 2019: As noted in the comment from @SimonLang, BULK INSERT
under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).
BULK INSERT 几乎肯定会比逐行读取源文件和对每一行执行常规 INSERT 快得多.但是,BULK INSERT 和 BCP 都对 CSV 文件有很大的限制,因为它们无法处理文本限定符(参考:此处).也就是说,如果您的 CSV 文件没有在其中包含限定的文本字符串...
BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...
1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07
... 然后你可以批量插入它,但如果它包含文本限定符(因为某些文本值包含逗号)...
... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...
1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07
... 然后 BULK INSERT 无法处理它.尽管如此,将这样的 CSV 文件预处理为以管道分隔的文件总体上可能会更快......
... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...
1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07
... 或制表符分隔的文件(其中 →
代表制表符)...
... or a tab-delimited file (where →
represents the tab character) ...
1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07
... 然后批量插入该文件.对于后者(制表符分隔)文件,BULK INSERT 代码如下所示:
... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:
import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\__tmp\biTest.txt' WITH (
FIELDTERMINATOR='\t',
ROWTERMINATOR='\n'
);
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()
注意:如评论中所述,执行BULK INSERT
语句仅适用于SQL Server 实例可以直接读取源文件的情况.对于源文件位于远程客户端的情况,请参阅此答案.
Note: As mentioned in a comment, executing a BULK INSERT
statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.
这篇关于如何使用 pyodbc 加速批量插入 MS SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!