比较两个文本文件以找出差异并将它们输出到新的文本文件

Compare two text files to find differences and output them to a new text file(比较两个文本文件以找出差异并将它们输出到新的文本文件)
本文介绍了比较两个文本文件以找出差异并将它们输出到新的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试处理一个简单的数据比较文本文档.目标是让用户能够选择一个文件,在该文件中搜索某个参数,然后在将新文本文档中的这些参数与具有默认值的文本文档进行比较之后,将这些参数打印到一个新的文本文档中参数,然后在比较它们后将差异打印到新的文本文档中.

I am trying to work on a simple data comparison text document. The goal is for the user to be able to select a file, search through this file for a certain parameter, then print those parameters into a new text document, after compare those parameters from the new text document with a text document that has the default parameters and then once they've been compared to print out the differences into a new text document.

我创建了一个简单的流程图来总结这一点:

I've created a simple flowchart to summarize this:

这是我当前的代码.我正在使用 diff 库来比较这两个文件.

This is my current code. I am using the diff lib to compare the two files.

import difflib
from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root, text ="Configuration Inspector")
w.pack()
tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector")
filename = askopenfilename() # Logs File
filename2 = askopenfilename() # Default Configuration
compareFile = askopenfilename() # Comparison File
outputfilename = askopenfilename() # Out Serial Number Configuration from Logs

with open(filename, "rb") as f_input:
    start_token = tkSimpleDialog.askstring("Serial Number", "What is the serial number?")
    end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword")
    reText = re.search("%s(.*?)%s" % (re.escape(start_token + ",SHOWALL"), re.escape(end_token)), f_input.read(), re.S)
    if reText:
        output = reText.group(1)
        fo = open(outputfilename, "wb")
        fo.write(output)
        fo.close()

        diff = difflib.ndiff(outputfilename, compareFile)
        print '
'.join(list(diff))

    else:
        tkMessageBox.showinfo("Output", "Sorry that input was not found in the file")
        print "not found"

到目前为止的结果是程序正确地搜索了您选择的文件进行搜索,然后将找到的参数打印到一个新的输出文本文件中.

The result so far is that the program correctly searches through the file you select for it to search through, Then prints out the parameters it finds into a new Output Text file.

在尝试比较两个文件(默认数据和输出文件)时会出现问题.

The issues arises when trying to compare the two files, the Default Data and the Output File.

当比较程序将输出差异时,由于默认数据文件与输出文件有不同的行,它只会打印不匹配的行而不是不匹配的参数.换句话说,假设我有这两个文件:

When comparing the program will output the differences however, Since the Default Data File has different lines than the Output file it will only print out the lines that do not match rather than the Parameters that do not match. In other words lets say I have these two files:

默认数据文本文件:

Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6

输出数据文本文件:

Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7

因此,由于 Data3 和 Data4 不匹配,因此 difference.txt 文件(比较输出)应该显示这一点.例如:

So since Data3 and Data4 do Not Match the difference.txt file (The Comparison Output) should show that. For Example:

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

但是它不匹配或比较行,它只是检查该空间中是否有行.所以目前我的比较输出如下所示:

However it does not match or compare the lines, it just checks to see if there's a line in that space. So currently my Comparison output looks like this:

Data5 = 5
Data6 = 6

关于如何进行比较的任何想法可以显示文件参数之间的所有差异?

Any ideas on how I can make the comparison show everything that is difference between the file's parameters?

如果您需要更多详细信息,请在评论中告诉我,我将编辑原始帖子以添加更多详细信息.

If you need any more details please let me know in the comments I will edit the original post to add more details.

推荐答案

我不知道你想用 difflib.ndiff() 做什么.该函数需要两个字符串列表,但您传递的是文件名.

I don't know what you're trying to do with difflib.ndiff(). That function takes two lists of strings, but you are passing it filenames.

无论如何,这是一个简短的演示,可以执行您想要的比较.它使用 dict 来加快比较过程.显然,我没有你的数据文件,所以这个程序使用字符串 .splitlines() 方法创建字符串列表.

Anyway, here's a short demo that performs the comparison that you want. It uses a dict to speed up the comparison process. Obviously, I don't have your data files, so this program creates lists of strings using the string .splitlines() method.

它逐行遍历默认数据列表.
如果输出 dict 中不存在该数据,则打印默认行.
如果输出 dict 中存在具有该值的数据键,则跳过该行.
如果找到键但输出 dict 中的值与默认值不同,则使用键 &输出值被打印.

It goes through the default data list line by line.
If that data is not present in the output dict, then the default line is printed.
If a data key with that value is present in the output dict, then that line is skipped.
If the key is found but the value in the output dict is different to the default value, then a line with the key & output value is printed.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
'''.splitlines()[1:]

outdict = dict(line.split(' = ') for line in outdata)

for line in defdata:
    key, val = line.split(' = ')
    if key in outdict:
        outval = outdict[key]
        if outval != val:
            print '%s = %s' % (key, outval)
    else:
        print line

输出

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

<小时>

以下是如何将文本文件读入行列表.


Here's how to read a text file into a list of lines.

with open(filename) as f:
    data = f.read().splitlines()

还有一个 .readlines() 方法,但在这里用处不大,因为它在每一行的末尾保留了 换行符,我们不用不想那样.

There's also a .readlines() method, but it's not so useful here because it preserves the newline character at the end of each line, and we don't want that.

请注意,如果文本文件中有任何空行,则结果列表将在该位置有一个空字符串 ''.此外,该代码不会删除每行上的任何前导或尾随空格或其他空格.但是,如果您需要这样做,那么 Stack Overflow 上有数以千计的示例可以向您展示如何操作.

Note that if there are any blank lines in the text file then the resulting list will have an empty string '' in that position. Also, that code won't remove any leading or trailing blanks or other whitespace on each line. But if you need to do that there are thousands of examples that can show you how here on Stack Overflow.

这个新版本使用了稍微不同的方法.它循环遍历在默认列表或输出列表中找到的所有键的排序列表.
如果仅在其中一个列表中找到键,则将相应的行添加到差异列表中.
如果在两个列表中都找到了一个键,但输出行与默认行不同,则将输出列表中的相应行添加到差异列表中.如果两行相同,则不会将任何内容添加到差异列表中.

This new version uses a slightly different approach. It loops over a sorted list of all the keys found in either the default list or the output list.
If a key is only found in one of the lists the corresponding line is added to the diff list.
If a key is found in both lists but the output line differs from the default line then the corresponding line from the output list is added to the diff list. If both lines are identical, nothing is added to the diff list.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
Data8 = 8
'''.splitlines()[1:]

def make_dict(data):
    return dict((line.split(None, 1)[0], line) for line in data)

defdict = make_dict(defdata)
outdict = make_dict(outdata)

#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys

difflines = []
for key in allkeys:
    indef = key in defdict
    inout = key in outdict
    if indef and not inout:
        difflines.append(defdict[key])
    elif inout and not indef:
        difflines.append(outdict[key])
    else:
        #key must be in both dicts
        defval = defdict[key]
        outval = outdict[key]
        if outval != defval:
            difflines.append(outval)

for line in difflines:
    print line

输出

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6
Data8 = 8

这篇关于比较两个文本文件以找出差异并将它们输出到新的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)