提交 218094ba authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #3167 from abergeron/fix_amdlibm

Rework amdlibm exclusion on GPU.
...@@ -57,39 +57,63 @@ There are less methods to define for an Op than for a Type: ...@@ -57,39 +57,63 @@ There are less methods to define for an Op than for a Type:
*Default:* The default behavior is to do nothing. *Default:* The default behavior is to do nothing.
.. method:: c_headers() .. method:: c_headers([c_compiler])
Returns a list of headers to include in the file. 'Python.h' is Returns a list of headers to include in the file. 'Python.h' is
included by default so you don't need to specify it. Also all included by default so you don't need to specify it. Also all
of the header required by the Types involved (inputs and of the headers required by the Types involved (inputs and
outputs) will also be included. outputs) will also be included.
.. method:: c_header_dirs() The `c_compiler` [#2v]_ parameter is the C compiler that will
be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_header_dirs([c_compiler])
Returns a list of directories to search for headers (arguments Returns a list of directories to search for headers (arguments
to -I). to -I).
.. method:: c_libraries() The `c_compiler` [#2v]_ parameter is the C compiler that will
be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_libraries([c_compiler])
Returns a list of library names that your op needs to link to. Returns a list of library names that your op needs to link to.
All ops are automatically linked with 'python' and the All ops are automatically linked with 'python' and the
libraries their types require. (arguments to -l) libraries their types require. (arguments to -l)
.. method:: c_lib_dirs() The `c_compiler` [#2v]_ parameter is the C compiler that will
be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_lib_dirs([c_compiler])
Returns a list of directory to search for libraries (arguments Returns a list of directory to search for libraries (arguments
to -L). to -L).
.. method:: c_compile_args() The `c_compiler` [#2v]_ parameter is the C compiler that will
be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_compile_args([c_compiler])
Allows to specify additional arbitrary arguments to the C
compiler. This is not usually required.
Allows to specify additional arbitrary arguments to g++. This The `c_compiler` [#2v]_ parameter is the C compiler that will
is not usually required. be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_no_compile_args() .. method:: c_no_compile_args([c_compiler])
Returns a list of g++ arguments that are forbidden when Returns a list of C compiler arguments that are forbidden when
compiling this Op. compiling this Op.
The `c_compiler` [#2v]_ parameter is the C compiler that will
be used to compile the code for the node. You may get multiple
calls with different C compilers.
.. method:: c_init_code() .. method:: c_init_code()
Allows you to specify code that will be executed once when the Allows you to specify code that will be executed once when the
...@@ -245,6 +269,18 @@ In a nutshell, ``input_names`` and ``output_names`` parameterize the ...@@ -245,6 +269,18 @@ In a nutshell, ``input_names`` and ``output_names`` parameterize the
names of the inputs your operation needs to use and the outputs it names of the inputs your operation needs to use and the outputs it
needs to put variables into. But this will be clear with the examples. needs to put variables into. But this will be clear with the examples.
.. rubric:: Footnotes
.. [#2v] There are actually two versions of this method one with a
`c_compiler` parameter and one without. The calling code will
try the version with c_compiler and try the version without
if it does not work. Defining both versions is pointless
since the one without `c_compiler` will never get called.
Note that these methods are not specific to a single apply
node so they may get called more than once on the same object
with different values for c_compiler.
Defining the methods Defining the methods
==================== ====================
......
...@@ -78,18 +78,32 @@ the most important ones: ...@@ -78,18 +78,32 @@ the most important ones:
When we are done using the data, clean up whatever we allocated and When we are done using the data, clean up whatever we allocated and
decrease the appropriate reference counts. decrease the appropriate reference counts.
.. method:: c_headers() .. method:: c_headers([c_compiler])
c_libraries() c_libraries([c_compiler])
c_header_dirs() c_header_dirs([c_compiler])
c_lib_dirs() c_lib_dirs([c_compiler])
Allows you to specify headers, libraries and associated directories. Allows you to specify headers, libraries and associated directories.
.. method:: c_compile_args() These methods have two versions, one with a `c_compiler`
c_no_compile_args() argument and one without. The version with c_compiler is tried
first and if it doesn't work, the one without is.
The `c_compiler` argument is the C compiler that will be used
to compile the C code for the node that uses this type.
.. method:: c_compile_args([c_compiler])
c_no_compile_args([c_compiler])
Allows to specify special compiler arguments to add/exclude. Allows to specify special compiler arguments to add/exclude.
These methods have two versions, one with a `c_compiler`
argument and one without. The version with c_compiler is tried
first and if it doesn't work, the one without is.
The `c_compiler` argument is the C compiler that will be used
to compile the C code for the node that uses this type.
.. method:: c_init_code() .. method:: c_init_code()
Allows you to specify code that will be executed once when the Allows you to specify code that will be executed once when the
......
...@@ -930,15 +930,19 @@ class CLinker(link.Linker): ...@@ -930,15 +930,19 @@ class CLinker(link.Linker):
"-Wno-unused-variable", # idem as the precedent "-Wno-unused-variable", # idem as the precedent
"-Wno-write-strings", # generated by our code generator... "-Wno-write-strings", # generated by our code generator...
] ]
c_compiler = self.c_compiler()
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
try:
ret += x.c_compile_args(c_compiler)
except TypeError:
ret += x.c_compile_args() ret += x.c_compile_args()
except utils.MethodNotDefined: except utils.MethodNotDefined:
pass pass
c_compiler = self.c_compiler()
ret = utils.uniq(ret) # to remove duplicate ret = utils.uniq(ret) # to remove duplicate
# The args set by the compiler include the user flags. We do not want # The args set by the compiler include the user flags. We do not want
# to reorder them # to reorder them
...@@ -946,7 +950,11 @@ class CLinker(link.Linker): ...@@ -946,7 +950,11 @@ class CLinker(link.Linker):
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
for i in x.c_no_compile_args(): try:
no_comp = x.c_no_compile_args(c_compiler)
except TypeError:
no_comp = x.c_no_compile_args()
for i in no_comp:
try: try:
ret.remove(i) ret.remove(i)
except ValueError: except ValueError:
...@@ -966,9 +974,13 @@ class CLinker(link.Linker): ...@@ -966,9 +974,13 @@ class CLinker(link.Linker):
""" """
ret = [] ret = []
c_compiler = self.c_compiler()
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
try:
ret += x.c_headers(c_compiler)
except TypeError:
ret += x.c_headers() ret += x.c_headers()
except utils.MethodNotDefined: except utils.MethodNotDefined:
pass pass
...@@ -1023,9 +1035,13 @@ class CLinker(link.Linker): ...@@ -1023,9 +1035,13 @@ class CLinker(link.Linker):
""" """
ret = [] ret = []
c_compiler = self.c_compiler()
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
try:
ret += x.c_header_dirs(c_compiler)
except TypeError:
ret += x.c_header_dirs() ret += x.c_header_dirs()
except utils.MethodNotDefined: except utils.MethodNotDefined:
pass pass
...@@ -1042,9 +1058,13 @@ class CLinker(link.Linker): ...@@ -1042,9 +1058,13 @@ class CLinker(link.Linker):
""" """
ret = [] ret = []
c_compiler = self.c_compiler()
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
try:
ret += x.c_libraries(c_compiler)
except TypeError:
ret += x.c_libraries() ret += x.c_libraries()
except utils.MethodNotDefined: except utils.MethodNotDefined:
pass pass
...@@ -1061,9 +1081,13 @@ class CLinker(link.Linker): ...@@ -1061,9 +1081,13 @@ class CLinker(link.Linker):
""" """
ret = [] ret = []
c_compiler = self.c_compiler()
for x in [y.type for y in self.variables] + [ for x in [y.type for y in self.variables] + [
y.op for y in self.node_order]: y.op for y in self.node_order]:
try: try:
try:
ret += x.c_lib_dirs(c_compiler)
except TypeError:
ret += x.c_lib_dirs() ret += x.c_lib_dirs()
except utils.MethodNotDefined: except utils.MethodNotDefined:
pass pass
...@@ -1431,20 +1455,6 @@ class CLinker(link.Linker): ...@@ -1431,20 +1455,6 @@ class CLinker(link.Linker):
c_compiler = self.c_compiler() c_compiler = self.c_compiler()
libs = self.libraries() libs = self.libraries()
preargs = self.compile_args() preargs = self.compile_args()
compiler_name = c_compiler.__name__
if compiler_name == 'NVCC_compiler' and config.lib.amdlibm:
# This lib does not work correctly with nvcc in device code.
# and newer version of g++ as 4.5.1.
# example of errors: "/usr/lib/gcc/x86_64-redhat-linux/4.5.1/
# include/mmintrin.h(49): error: identifier
# "__builtin_ia32_emms" is undefined"
if '<amdlibm.h>' in mod.includes:
mod.includes.remove('<amdlibm.h>')
if '-DREPLACE_WITH_AMDLIBM' in preargs:
preargs.remove('-DREPLACE_WITH_AMDLIBM')
if 'amdlibm' in libs:
libs.remove('amdlibm')
# We want to compute the code without the lock # We want to compute the code without the lock
src_code = mod.code() src_code = mod.code()
get_lock() get_lock()
......
...@@ -253,7 +253,10 @@ static struct PyModuleDef moduledef = {{ ...@@ -253,7 +253,10 @@ static struct PyModuleDef moduledef = {{
self.print_init(sio) self.print_init(sio)
rval = sio.getvalue() rval = sio.getvalue()
self.code_hash = hash_from_code(rval) # Make sure the hash of the code hasn't changed
h = hash_from_code(rval)
assert self.code_hash is None or self.code_hash == h
self.code_hash = h
rval = re.sub(self.hash_placeholder, self.code_hash, rval) rval = re.sub(self.hash_placeholder, self.code_hash, rval)
# Finalize the Module, so no support code or function # Finalize the Module, so no support code or function
# can be added # can be added
...@@ -1767,6 +1770,8 @@ class GCC_compiler(Compiler): ...@@ -1767,6 +1770,8 @@ class GCC_compiler(Compiler):
# The equivalent flags of --march=native used by g++. # The equivalent flags of --march=native used by g++.
march_flags = None march_flags = None
supports_amdlibm = True
@staticmethod @staticmethod
def version_str(): def version_str():
return theano.config.cxx + " " + gcc_version_str return theano.config.cxx + " " + gcc_version_str
......
...@@ -131,6 +131,8 @@ def add_standard_rpath(rpath): ...@@ -131,6 +131,8 @@ def add_standard_rpath(rpath):
class NVCC_compiler(Compiler): class NVCC_compiler(Compiler):
supports_amdlibm = False
@staticmethod @staticmethod
def try_compile_tmp(src_code, tmp_prefix='', flags=(), def try_compile_tmp(src_code, tmp_prefix='', flags=(),
try_run=False, output=False): try_run=False, output=False):
......
...@@ -201,24 +201,24 @@ class Scalar(Type): ...@@ -201,24 +201,24 @@ class Scalar(Type):
def values_eq_approx(self, a, b, tolerance=1e-4): def values_eq_approx(self, a, b, tolerance=1e-4):
return abs(a - b) <= ((abs(a) + abs(b)) * tolerance) return abs(a - b) <= ((abs(a) + abs(b)) * tolerance)
def c_headers(self): def c_headers(self, c_compiler):
l = ['<math.h>'] l = ['<math.h>']
# These includes are needed by Scalar and TensorType, # These includes are needed by Scalar and TensorType,
# we declare them here and they will be re-used by TensorType # we declare them here and they will be re-used by TensorType
l.append('<numpy/arrayobject.h>') l.append('<numpy/arrayobject.h>')
l.append('<numpy/arrayscalars.h>') l.append('<numpy/arrayscalars.h>')
if config.lib.amdlibm: if config.lib.amdlibm and c_compiler.supports_amdlibm:
l += ['<amdlibm.h>'] l += ['<amdlibm.h>']
return l return l
def c_libraries(self): def c_libraries(self, c_compiler):
l = [] l = []
if config.lib.amdlibm: if config.lib.amdlibm and c_compiler.supports_amdlibm:
l += ['amdlibm'] l += ['amdlibm']
return l return l
def c_compile_args(self): def c_compile_args(self, c_compiler):
if config.lib.amdlibm: if config.lib.amdlibm and c_compiler.supports_amdlibm:
return ['-DREPLACE_WITH_AMDLIBM'] return ['-DREPLACE_WITH_AMDLIBM']
else: else:
return [] return []
......
...@@ -596,18 +596,18 @@ class TensorType(Type): ...@@ -596,18 +596,18 @@ class TensorType(Type):
} }
""" % locals() """ % locals()
def c_headers(self): def c_headers(self, c_compiler):
""" """
Override `CLinkerObject.c_headers`. Override `CLinkerObject.c_headers`.
""" """
return scal.get_scalar_type(self.dtype).c_headers() return scal.get_scalar_type(self.dtype).c_headers(c_compiler)
def c_libraries(self): def c_libraries(self, c_compiler):
return scal.get_scalar_type(self.dtype).c_libraries() return scal.get_scalar_type(self.dtype).c_libraries(c_compiler)
def c_compile_args(self): def c_compile_args(self, c_compiler):
return scal.get_scalar_type(self.dtype).c_compile_args() return scal.get_scalar_type(self.dtype).c_compile_args(c_compiler)
def c_support_code(self): def c_support_code(self):
""" """
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论