多线程文件复制比多核 CPU 上的单线程慢得多

本文介绍了多线程文件复制比多核 CPU 上的单线程慢得多的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试用 Python 编写一个多线程程序来加速(1000 个以下).csv 文件的复制.多线程代码的运行速度甚至比顺序方法还要慢.我用 profile.py 对代码进行了计时.我确定我一定做错了什么，但我不确定是什么.

I am trying to write a multithreaded program in Python to accelerate the copying of (under 1000) .csv files. The multithreaded code runs even slower than the sequential approach. I timed the code with profile.py. I am sure I must be doing something wrong but I'm not sure what.

环境:

四核 CPU.
2 个硬盘驱动器，其中一个包含源文件.另一个是目的地.
1000 个 csv 文件，大小从几 KB 到 10 MB 不等.

方法:

我把所有的文件路径放在一个Queue中，并创建4-8个工作线程从队列中拉取文件路径并复制指定的文件.在任何情况下，多线程代码都不会更快:

I put all the file paths in a Queue, and create 4-8 worker threads pull file paths from the queue and copy the designated file. In no case is the multithreaded code faster:

连续复制需要 150-160 秒
线程复制需要超过 230 秒

我假设这是一个 I/O 绑定任务，所以多线程应该有助于提高操作速度.

I assume this is an I/O bound task, so multithreading should help the operation speed.

守则:

    import Queue
    import threading
    import cStringIO 
    import os
    import shutil
    import timeit  # time the code exec with gc disable
    import glob    # file wildcards list, glob.glob('*.py')
    import profile # 

    fileQueue = Queue.Queue() # global
    srcPath  = 'C:\temp'
    destPath = 'D:\temp'
    tcnt = 0
    ttotal = 0

    def CopyWorker():
        while True:
            fileName = fileQueue.get()
            fileQueue.task_done()
            shutil.copy(fileName, destPath)
            #tcnt += 1
            print 'copied: ', tcnt, ' of ', ttotal

    def threadWorkerCopy(fileNameList):
        print 'threadWorkerCopy: ', len(fileNameList)
        ttotal = len(fileNameList)
        for i in range(4):
            t = threading.Thread(target=CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

    def sequentialCopy(fileNameList):
        #around 160.446 seconds, 152 seconds
        print 'sequentialCopy: ', len(fileNameList)
        cnt = 0
        ctotal = len(fileNameList)
        for fileName in fileNameList:
            shutil.copy(fileName, destPath)
            cnt += 1
            print 'copied: ', cnt, ' of ', ctotal

    def main():
        print 'this is main method'
        fileCount = 0
        fileList = glob.glob(srcPath + '\' + '*.csv')
        #sequentialCopy(fileList)
        threadWorkerCopy(fileList)

    if __name__ == '__main__':
        profile.run('main()')

多线程文件复制比多核 CPU 上的单线程慢得多

问题描述

推荐答案

相关文档推荐