Có, tôi nghĩ băm tệp sẽ là cách tốt nhất nếu bạn phải so sánh một số tệp và băm cửa hàng để so sánh sau. Như băm có thể đụng độ, một so sánh byte-byte có thể được thực hiện tùy thuộc vào trường hợp sử dụng.
Nói chung so sánh byte-byte sẽ đủ và hiệu quả, mà mô-đun FILECMP cũng đã thực hiện + những thứ khác.
Xem //docs.python.org/l Library/filecmp.html E.G.
>>> import filecmp
>>> filecmp.cmp['file1.txt', 'file1.txt']
True
>>> filecmp.cmp['file1.txt', 'file2.txt']
False
Xem xét tốc độ: Thông thường nếu chỉ có hai tệp phải được so sánh, việc băm chúng và so sánh chúng sẽ chậm hơn thay vì so sánh byte theo đơn giản nếu được thực hiện hiệu quả. ví dụ. Mã dưới đây cố gắng theo thời gian băm vs byte-by-byte Usually if only two files have to be compared, hashing them and comparing them would be slower instead of simple byte-by-byte comparison if done efficiently. e.g. code below tries to time hash vs byte-by-byte
Tuyên bố miễn trừ trách nhiệm: Đây không phải là cách tốt nhất để thời gian hoặc so sánh hai algo. Và cần phải cải thiện nhưng nó đưa ra ý tưởng sơ bộ. Nếu bạn nghĩ rằng nó nên được cải thiện, hãy nói với tôi rằng tôi sẽ thay đổi nó.
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
và đầu ra là
cmpHash 0.234999895096
cmpByteByByte 0.0
Xem thảo luận
Cải thiện bài viết
Lưu bài viết
Xem thảo luận
Cải thiện bài viết
Lưu bài viết
ĐọcCompare two different files line by line. Python supports many modules to do so and here we will discuss approaches using its various modules.
Bàn luận
Trong Python, có nhiều phương pháp có sẵn để so sánh này. Trong bài viết này, chúng tôi sẽ tìm ra cách Tocompare hai dòng khác nhau từng dòng. Python hỗ trợ nhiều mô -đun để làm như vậy và ở đây chúng tôi sẽ thảo luận về các phương pháp sử dụng các mô -đun khác nhau.
- file.txt
- file1.txt
Bài viết này sử dụng hai tệp mẫu để thực hiện.
Tệp đang sử dụng:difflib library, we have to call the unified_diff[] function to this comparison.
Syntax:
Phương pháp 1: Sử dụng Unified_diff []
Parameter:
- Python có một mô -đun được sử dụng đặc biệt để so sánh sự khác biệt giữa các tệp. Để có được sự khác biệt khi sử dụng thư viện Difflib, chúng ta phải gọi hàm unified_diff [] để so sánh này. & nbsp;List of String such as file_1_text
- Unified_diff [File1, File2, FromFile, Tofile, Lineterm]List of String such as file_2_text
- File1: Danh sách chuỗi như File_1_Text first file name with extension
- File2: Danh sách chuỗi như File_2_Textsecond file name with extension
- FromFile: Tên tệp đầu tiên có phần mở rộngargument to “” so that the output will be automatically uniformly newline free
TOFILE: Tên tệp thứ hai có phần mở rộng
- Lineterm: Đối số về trực tiếp để đầu ra sẽ tự động đồng đều dòng mới
- Cách tiếp cận
- Nhập mô -đun
Example:
Python3
Mở tập tin
with
open
[
'file1.txt'
] as file_1:
So sánh bằng cách sử dụng Unified_diff [] với các thuộc tính thích hợp
with
open
[
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
7import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
8import
difflib
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
1import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2 import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
3cmpHash 0.234999895096
cmpByteByByte 0.0
7cmpHash 0.234999895096
cmpByteByByte 0.0
8import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2'file1.txt'
import
1cmpHash 0.234999895096
cmpByteByByte 0.0
7import
3import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
7import
6import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2import
8import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0difflib
0difflib
1Output:
0import random import string import hashlib import time def getRandText[N]: return "".join[[random.choice[string.printable] for i in xrange[N]]] N=1000000 randText1 = getRandText[N] randText2 = getRandText[N] def cmpHash[text1, text2]: hash2 = hashlib.md5[] hash2.update[text1] hash2 = hash2.hexdigest[] hash2 = hashlib.md5[] hash2.update[text2] hash2 = hash2.hexdigest[] return hash2 == hash2 def cmpByteByByte[text1, text2]: return text1 == text2 for cmpFunc in [cmpHash, cmpByteByByte]: st = time.time[] for i in range[10]: cmpFunc[randText1, randText2] print cmpFunc.func_name,time.time[]-st
0cmpHash 0.234999895096 cmpByteByByte 0.0
2import random import string import hashlib import time def getRandText[N]: return "".join[[random.choice[string.printable] for i in xrange[N]]] N=1000000 randText1 = getRandText[N] randText2 = getRandText[N] def cmpHash[text1, text2]: hash2 = hashlib.md5[] hash2.update[text1] hash2 = hash2.hexdigest[] hash2 = hashlib.md5[] hash2.update[text2] hash2 = hash2.hexdigest[] return hash2 == hash2 def cmpByteByByte[text1, text2]: return text1 == text2 for cmpFunc in [cmpHash, cmpByteByByte]: st = time.time[] for i in range[10]: cmpFunc[randText1, randText2] print cmpFunc.func_name,time.time[]-st
2cmpHash 0.234999895096 cmpByteByByte 0.0
3cmpHash 0.234999895096 cmpByteByByte 0.0
4cmpHash 0.234999895096 cmpByteByByte 0.0
5cmpHash 0.234999895096 cmpByteByByte 0.0
6cmpHash 0.234999895096 cmpByteByByte 0.0
- File1.txt
Learning
Python
is
-too
-simple.
+so
+easy.
+++ tập tin2.txt
@@ -1,5 +1,5 @@Differ inside the difflib library. This class is used for comparing sequences of lines of text, and producing human-readable differences or deltas.
Mã số | Nghĩa |
-- | dòng duy nhất cho chuỗi 1 |
++ | dòng duy nhất cho chuỗi 2 |
‘ | dòng chung cho cả hai chuỗi |
TOFILE: Tên tệp thứ hai có phần mở rộng
- Lineterm: Đối số về trực tiếp để đầu ra sẽ tự động đồng đều dòng mới
- Cách tiếp cận
- Nhập mô -đun
- Mở tập tin
Example:
Python3
So sánh bằng cách sử dụng Unified_diff [] với các thuộc tính thích hợp
with
open
[
'file1.txt'
with
0open
[
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
7import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
8import
difflib
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
1import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2 import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
3cmpHash 0.234999895096
cmpByteByByte 0.0
7difflib
0difflib
1Output:
0import random import string import hashlib import time def getRandText[N]: return "".join[[random.choice[string.printable] for i in xrange[N]]] N=1000000 randText1 = getRandText[N] randText2 = getRandText[N] def cmpHash[text1, text2]: hash2 = hashlib.md5[] hash2.update[text1] hash2 = hash2.hexdigest[] hash2 = hashlib.md5[] hash2.update[text2] hash2 = hash2.hexdigest[] return hash2 == hash2 def cmpByteByByte[text1, text2]: return text1 == text2 for cmpFunc in [cmpHash, cmpByteByByte]: st = time.time[] for i in range[10]: cmpFunc[randText1, randText2] print cmpFunc.func_name,time.time[]-st
0cmpHash 0.234999895096 cmpByteByByte 0.0
2import random import string import hashlib import time def getRandText[N]: return "".join[[random.choice[string.printable] for i in xrange[N]]] N=1000000 randText1 = getRandText[N] randText2 = getRandText[N] def cmpHash[text1, text2]: hash2 = hashlib.md5[] hash2.update[text1] hash2 = hash2.hexdigest[] hash2 = hashlib.md5[] hash2.update[text2] hash2 = hash2.hexdigest[] return hash2 == hash2 def cmpByteByByte[text1, text2]: return text1 == text2 for cmpFunc in [cmpHash, cmpByteByByte]: st = time.time[] for i in range[10]: cmpFunc[randText1, randText2] print cmpFunc.func_name,time.time[]-st
2cmpHash 0.234999895096 cmpByteByByte 0.0
3cmpHash 0.234999895096 cmpByteByByte 0.0
4cmpHash 0.234999895096 cmpByteByByte 0.0
5cmpHash 0.234999895096 cmpByteByByte 0.0
6cmpHash 0.234999895096 cmpByteByByte 0.0
- File1.txt
+++ tập tin2.txt
@@ -1,5 +1,5 @@
Phương pháp 2: Sử dụng khác nhau
Có một lớp có sẵn để so sánh sự khác biệt giữa các tệp có tên là khác nhau bên trong thư viện Difflib. Lớp này được sử dụng để so sánh các chuỗi các dòng văn bản và tạo ra sự khác biệt hoặc deltas có thể đọc được của con người.
Mã số
TOFILE: Tên tệp thứ hai có phần mở rộng
- Lineterm: Đối số về trực tiếp để đầu ra sẽ tự động đồng đều dòng mới
- Cách tiếp cận
- Nhập mô -đun
- Mở tập tin
- So sánh bằng cách sử dụng Unified_diff [] với các thuộc tính thích hợp
Example:
Python3
import
difflib
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
1import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2 import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
3import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0cmpHash 0.234999895096
cmpByteByByte 0.0
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2 cmpHash 0.234999895096
cmpByteByByte 0.0
2cmpHash 0.234999895096
cmpByteByByte 0.0
3 cmpHash 0.234999895096
cmpByteByByte 0.0
4cmpHash 0.234999895096
cmpByteByByte 0.0
5 cmpHash 0.234999895096
cmpByteByByte 0.0
6
- File1.txt
+++ tập tin2.txt
difflib
0
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
08with
open
[
'file1.txt'
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
13import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0with
open
[
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
7import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
19@@ -1,5 +1,5 @@
difflib
0[
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
27[
4Phương pháp 2: Sử dụng khác nhau
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
0difflib
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
35import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
2import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
37difflib
0[
] as file_1:
6[
4
difflib
0[
import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
44[
4Có một lớp có sẵn để so sánh sự khác biệt giữa các tệp có tên là khác nhau bên trong thư viện Difflib. Lớp này được sử dụng để so sánh các chuỗi các dòng văn bản và tạo ra sự khác biệt hoặc deltas có thể đọc được của con người.
Mã số
Nghĩa
--
dòng duy nhất cho chuỗi 1
++
cmpHash 0.234999895096
cmpByteByByte 0.0
7import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
80import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
81dòng duy nhất cho chuỗi 2
‘
dòng chung cho cả hai chuỗi
? "
dòng không có trong chuỗi đầu vào
cmpHash 0.234999895096
cmpByteByByte 0.0
7difflib
0import random
import string
import hashlib
import time
def getRandText[N]:
return "".join[[random.choice[string.printable] for i in xrange[N]]]
N=1000000
randText1 = getRandText[N]
randText2 = getRandText[N]
def cmpHash[text1, text2]:
hash2 = hashlib.md5[]
hash2.update[text1]
hash2 = hash2.hexdigest[]
hash2 = hashlib.md5[]
hash2.update[text2]
hash2 = hash2.hexdigest[]
return hash2 == hash2
def cmpByteByByte[text1, text2]:
return text1 == text2
for cmpFunc in [cmpHash, cmpByteByByte]:
st = time.time[]
for i in range[10]:
cmpFunc[randText1, randText2]
print cmpFunc.func_name,time.time[]-st
08Đọc nội dung dòng BT
Chức năng so sánh cuộc gọi với việc sử dụng đối tượng lớp khác nhau
difflib
2 difflib
3import
difflib
5
cmpHash 0.234999895096
cmpByteByByte 0.0
31cmpHash 0.234999895096
cmpByteByByte 0.0
32Output:
So sánh các tập tin & nbsp;
& nbsp;@ file1.txt
& nbsp;# file2.txt
Các dòng phổ biến trong cả hai tệp
Học tập
Python
Là
Dòng chênh lệch trong cả hai tệp
@- Line-4 cũng vậy
#+ Line-4 vậy
@- Line-5 đơn giản.
#+ Line-5 dễ dàng.