问题描述
所以我基本上有一个非常长的字符串列表,以及一个包含一列字符串和一列数字的 CSV 文件.我需要遍历非常长的字符串列表,并且对于每个字符串,循环遍历 CSV 文件的行,检查 CSV 第一列中的每个字符串,看看它是否出现在我的字符串中,如果出现,添加另一列中的数字到某事.一个最小的例子是
So I basically have an extremely long list of strings, and a CSV file that contains a column of strings and a column of numbers. I need to loop through the extremely long list of strings, and for each one, loop through the rows of the CSV file checking each string in the first column of the CSV to see if it occurs in my string, and if it does, add the number in the other column to something. A minimal sort of example would be
import csv
sList = ['a cat', 'great wall', 'mediocre wall']
vals = []
with open('file.csv', 'r') as f:
r = csv.reader(f)
for w in sList:
val = 0
for row in r:
if row[0] in w:
val += 1
vals.append(val)
我可能会使用它的 CSV 文件示例
An example of a CSV file with which I might use this could be
a, 1
great, 2
当然 csv.reader(f) 创建了一个我只能循环一次的可迭代对象.我在其他地方看到了使用 itertools 的建议,但我发现的所有建议都是针对涉及少量循环 CSV 文件的问题,通常只有两次.如果我多次尝试使用它来循环遍历 CSV,我不确定这对内存消耗意味着什么,总的来说,我只是想知道解决这个问题的最聪明的方法.
Of course csv.reader(f) creates an iterable that I can loop through only once. I've seen recommendations elsewhere to use itertools but all of the recommendations I've found have been for problems that involve looping through the CSV file a small number of times, usually just twice. If I tried to use this to loop through the CSV many times I'm unsure of what that would mean for memory consumption, and in general I'm just wondering about the smartest way to approach this problem.
推荐答案
你需要重置"文件迭代器:
You need to "reset" the file iterator:
import csv
sList = ['a cat', 'great wall', 'mediocre wall']
vals = []
with open('data.csv', 'r') as f:
r = csv.reader(f)
for w in sList:
val = 0
f.seek(0) #<-- set the iterator to beginning of the input file
for row in r:
print(row)
if row[0] in w:
val += 1
vals.append(val)
这篇关于只能通过 csv 阅读器迭代一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!