提交 344419ca authored 作者: Daren Eiri's avatar Daren Eiri 提交者: Frederic Bastien

Update comments referring to md5 hash

Some comments were referring to the use of md5 hash instead of sha256 hash. Will need to review at a later date for changing function names and string usage with 'md5' as these are now a misnomer.
上级 1f7d47f0
...@@ -185,9 +185,12 @@ def _config_print(thing, buf, print_doc=True): ...@@ -185,9 +185,12 @@ def _config_print(thing, buf, print_doc=True):
def get_config_md5(): def get_config_md5():
""" """
Return a string md5 of the current config options. It should be such that Return a string sha256 of the current config options. hash_from_code uses
we can safely assume that two different config setups will lead to two sha256, and not md5. Updated in PR#5916. Function names will be properly
different strings. updated in future release.
The string should be such that we can safely assume that two different
config setups will lead to two different strings.
We only take into account config options for which `in_c_key` is True. We only take into account config options for which `in_c_key` is True.
""" """
......
...@@ -1236,10 +1236,13 @@ class CLinker(link.Linker): ...@@ -1236,10 +1236,13 @@ class CLinker(link.Linker):
(opK, input_signatureK, output_signatureK), (opK, input_signatureK, output_signatureK),
}}} }}}
Note that config md5 uses sha256, and not md5. Function names will
updated in a future release to reflect the use of hashlib.sha256.
The signature is a tuple, some elements of which are sub-tuples. The signature is a tuple, some elements of which are sub-tuples.
The outer tuple has a brief header, containing the compilation options The outer tuple has a brief header, containing the compilation options
passed to the compiler, the libraries to link against, an md5 hash passed to the compiler, the libraries to link against, a sha256 hash
of theano.config (for all config options where "in_c_key" is True). of theano.config (for all config options where "in_c_key" is True).
It is followed by elements for every node in the topological ordering It is followed by elements for every node in the topological ordering
of `self.fgraph`. of `self.fgraph`.
...@@ -1376,6 +1379,10 @@ class CLinker(link.Linker): ...@@ -1376,6 +1379,10 @@ class CLinker(link.Linker):
if c_compiler: if c_compiler:
sig.append('c_compiler_str=' + c_compiler.version_str()) sig.append('c_compiler_str=' + c_compiler.version_str())
# NOTE: config md5 is not using md5 hash, but sha256 instead. Function
# names and string instances of md5 will be updated at a later release.
# See PR#5916 for details.
# IMPORTANT: The 'md5' prefix is used to isolate the compilation # IMPORTANT: The 'md5' prefix is used to isolate the compilation
# parameters from the rest of the key. If you want to add more key # parameters from the rest of the key. If you want to add more key
# elements, they should be before this md5 hash if and only if they # elements, they should be before this md5 hash if and only if they
......
...@@ -377,7 +377,7 @@ def is_same_entry(entry_1, entry_2): ...@@ -377,7 +377,7 @@ def is_same_entry(entry_1, entry_2):
def get_module_hash(src_code, key): def get_module_hash(src_code, key):
""" """
Return an MD5 hash that uniquely identifies a module. Return a SHA256 hash that uniquely identifies a module.
This hash takes into account: This hash takes into account:
1. The C source code of the module (`src_code`). 1. The C source code of the module (`src_code`).
...@@ -416,8 +416,10 @@ def get_module_hash(src_code, key): ...@@ -416,8 +416,10 @@ def get_module_hash(src_code, key):
to_hash += list(key_element) to_hash += list(key_element)
elif isinstance(key_element, string_types): elif isinstance(key_element, string_types):
if key_element.startswith('md5:'): if key_element.startswith('md5:'):
# This is the md5 hash of the config options. We can stop # This is actually a sha256 hash of the config options.
# here. # Ref PR#5916. String and function names will be updated in
# future release.
# We can stop here.
break break
elif (key_element.startswith('NPY_ABI_VERSION=0x') or elif (key_element.startswith('NPY_ABI_VERSION=0x') or
key_element.startswith('c_compiler_str=')): key_element.startswith('c_compiler_str=')):
...@@ -435,17 +437,19 @@ def get_safe_part(key): ...@@ -435,17 +437,19 @@ def get_safe_part(key):
This tuple should only contain objects whose __eq__ and __hash__ methods This tuple should only contain objects whose __eq__ and __hash__ methods
can be trusted (currently: the version part of the key, as well as the can be trusted (currently: the version part of the key, as well as the
md5 hash of the config options). SHA256 hash of the config options).
It is used to reduce the amount of key comparisons one has to go through It is used to reduce the amount of key comparisons one has to go through
in order to find broken keys (i.e. keys with bad implementations of __eq__ in order to find broken keys (i.e. keys with bad implementations of __eq__
or __hash__). or __hash__).
""" """
version = key[0] version = key[0]
# This function should only be called on versioned keys. # This function should only be called on versioned keys.
assert version assert version
# Find the md5 hash part. # Find the hash part, which is using sha256, not md5.
# Instances of md5 will be replaced in future release.
c_link_key = key[1] c_link_key = key[1]
# In case in the future, we don't have an md5 part and we have # In case in the future, we don't have an md5 part and we have
# such stuff in the cache. In that case, we can set None, and the # such stuff in the cache. In that case, we can set None, and the
......
...@@ -567,7 +567,7 @@ else: ...@@ -567,7 +567,7 @@ else:
def hash_from_file(file_path): def hash_from_file(file_path):
""" """
Return the MD5 hash of a file. Return the SHA256 hash of a file.
""" """
with open(file_path, 'rb') as f: with open(file_path, 'rb') as f:
......
...@@ -9,7 +9,7 @@ def hash_from_sparse(data): ...@@ -9,7 +9,7 @@ def hash_from_sparse(data):
# We also need to add the dtype to make the distinction between # We also need to add the dtype to make the distinction between
# uint32 and int32 of zeros with the same shape. # uint32 and int32 of zeros with the same shape.
# Python hash is not strong, so I always use md5. To avoid having a too # Python hash is not strong, so use sha256 instead. To avoid having a too
# long hash, I call it again on the contatenation of all parts. # long hash, I call it again on the contatenation of all parts.
return hash_from_code(hash_from_code(data.data) + return hash_from_code(hash_from_code(data.data) +
hash_from_code(data.indices) + hash_from_code(data.indices) +
......
...@@ -19,8 +19,9 @@ def hash_from_ndarray(data): ...@@ -19,8 +19,9 @@ def hash_from_ndarray(data):
# We also need to add the dtype to make the distinction between # We also need to add the dtype to make the distinction between
# uint32 and int32 of zeros with the same shape and strides. # uint32 and int32 of zeros with the same shape and strides.
# python hash are not strong, so I always use md5 in order not to have a # python hash are not strong, so use sha256 (md5 is not
# too long hash, I call it again on the concatenation of all parts. # FIPS compatible). To not have too long of hash, I call it again on
# the concatenation of all parts.
if not data.flags["C_CONTIGUOUS"]: if not data.flags["C_CONTIGUOUS"]:
# hash_from_code needs a C-contiguous array. # hash_from_code needs a C-contiguous array.
data = np.ascontiguousarray(data) data = np.ascontiguousarray(data)
......
...@@ -112,7 +112,7 @@ class Record(object): ...@@ -112,7 +112,7 @@ class Record(object):
class RecordMode(Mode): class RecordMode(Mode):
""" """
Records all computations done with a function in a file at output_path. Records all computations done with a function in a file at output_path.
Writes into the file the index of each apply node and md5 digests of the Writes into the file the index of each apply node and sha256 digests of the
numpy ndarrays it receives as inputs and produces as output. numpy ndarrays it receives as inputs and produces as output.
Example: Example:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论