提交 f359b692 authored 作者: Frédéric Bastien's avatar Frédéric Bastien

make Theano gpu compilation go farther on Windows.

上级 69f1c9a3
...@@ -292,12 +292,17 @@ Windows V1(bigger install, but simpler instruction + try instruction for gpu) ...@@ -292,12 +292,17 @@ Windows V1(bigger install, but simpler instruction + try instruction for gpu)
[blas] [blas]
ldflags = ldflags =
Space or non ascii caracter are not always supported. If that is your case,
Set the environment variable 'THEANO_FLAGS' to the value 'blas.ldflags='
This is enough to run Theano! It will use NumPy for dot products This is enough to run Theano! It will use NumPy for dot products
which, however, is pretty fast (see below). which, however, is pretty fast (see below).
To test that theano read correctly the .theanorc or .theanorc.txt file,
in python run:
.. code-block:: bash
import theano
print theano.config.blas.ldflags
That should print the same content as what is in your config file.
- (Optional) If you want a faster and/or multithreaded BLAS library, you can - (Optional) If you want a faster and/or multithreaded BLAS library, you can
compile GotoBLAS2. I did not try to compile ATLAS because I read that compile GotoBLAS2. I did not try to compile ATLAS because I read that
it is slower than Goto and very difficult to compile (especially for it is slower than Goto and very difficult to compile (especially for
...@@ -342,22 +347,49 @@ Windows V1(bigger install, but simpler instruction + try instruction for gpu) ...@@ -342,22 +347,49 @@ Windows V1(bigger install, but simpler instruction + try instruction for gpu)
Those are indication for 32 bits version of python, the one that come with pythonxy is 32 bits. Those are indication for 32 bits version of python, the one that come with pythonxy is 32 bits.
Install cuda driver(32 bits on 32 bits Windows, idem for 64 bits). Space or non ascii caracter are not always supported in path. Python support
Install cuda toolkit 32 bits(even if you computer is 64 bits, must match the python installation version) them, so your configuration file path can contain them.
Install cuda sdk 32 bits nvcc(at least version 3.1) don't support them well. If your USERPROFILE
Test some pre-compiled example of the sdk directory contain those caractere, you must add in your configuration file:
Download Visual Studio 2008 Express(free, VS2010 not supported by nvcc 3.1, VS2005, not available for download, but supported by nvcc, the non free version should work too) .. code-block:: bash
Follow the instruction in the GettingStartedWindows.pdf file from cuda web site to compile cuda code with VS2008. If that don't work, you won't be able to compile gpu code with theano. [global]
base_compiledir=PATH_TO_A_DIRECTORY_WITHOUT_THOSE_CARACTERE
You also need to add in the configuration file those line:
.. code-block:: bash
[cuda]
nvccflags=-LC:\Python26\libs
Then
1) Install cuda driver(32 bits on 32 bits Windows, idem for 64 bits).
2) Install cuda toolkit 32 bits(even if you computer is 64 bits,
must match the python installation version)
3) Install cuda sdk 32 bits
4) Test some pre-compiled example of the sdk
5) Download Visual Studio 2008 Express(free, VS2010 not supported by nvcc 3.1,
VS2005, not available for download, but supported by nvcc, the non free version should work too)
6) Follow the instruction in the GettingStartedWindows.pdf file from cuda web
site to compile cuda code with VS2008. If that don't work, you won't be
able to compile gpu code with theano.
Put into you PATH environment variable the directory where cl.exe is. In my case it is: C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin 7) Put into you PATH environment variable the directory where cl.exe is.
In my case it is: C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin
Make sure the theano folder is in your PYTHONPATH environment variable. 8) Make sure the theano folder is in your PYTHONPATH environment variable.
Then in theano do: import theano.sandbox.cuda 9) Then in theano do: import theano.sandbox.cuda
That will print some error if their is an error to compile the first cuda file. That will print some error if their is an error to compile the first cuda file.
Then run the theano cuda test file. In Windows command line (cmd.exe), run the program nosetests inside the theano repository. nosetests is installed by pythonxy. 10) Then run the theano cuda test file. In Windows command line (cmd.exe),
run the program nosetests inside the theano repository.
nosetests is installed by pythonxy.
Windows V2(smaller install, but longer instruction) Windows V2(smaller install, but longer instruction)
--------------------------------------------------- ---------------------------------------------------
......
...@@ -49,7 +49,7 @@ def set_cuda_disabled(): ...@@ -49,7 +49,7 @@ def set_cuda_disabled():
'working properly') 'working properly')
#cuda_ndarray compile and import #cuda_ndarray compile and import
cuda_path = os.path.split(__file__)[0] cuda_path = os.path.abspath(os.path.split(__file__)[0])
date = os.stat(os.path.join(cuda_path,'cuda_ndarray.cu'))[stat.ST_MTIME] date = os.stat(os.path.join(cuda_path,'cuda_ndarray.cu'))[stat.ST_MTIME]
date = max(date,os.stat(os.path.join(cuda_path,'cuda_ndarray.cuh'))[stat.ST_MTIME]) date = max(date,os.stat(os.path.join(cuda_path,'cuda_ndarray.cuh'))[stat.ST_MTIME])
date = max(date,os.stat(os.path.join(cuda_path,'conv_full_kernel.cu'))[stat.ST_MTIME]) date = max(date,os.stat(os.path.join(cuda_path,'conv_full_kernel.cu'))[stat.ST_MTIME])
......
...@@ -13,6 +13,9 @@ AddConfigVar('nvcc.compiler_bindir', ...@@ -13,6 +13,9 @@ AddConfigVar('nvcc.compiler_bindir',
"if defined, nvcc compiler driver will seek g++ and gcc in this directory", "if defined, nvcc compiler driver will seek g++ and gcc in this directory",
StrParam("")) StrParam(""))
AddConfigVar('cuda.nvccflags',
"Extra compiler flags for nvcc",
StrParam(""))
def error(*args): def error(*args):
#sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n') #sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n')
...@@ -62,7 +65,18 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[ ...@@ -62,7 +65,18 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
:param preargs: a list of extra compiler arguments :param preargs: a list of extra compiler arguments
:returns: dynamically-imported python module of the compiled code. :returns: dynamically-imported python module of the compiled code.
:note 1: On Windows 7 with nvcc 3.1 we need to compile in the real directory
Otherwise nvcc never finish.
""" """
if sys.platform=="win32":
#remove some compilation args that cl.exe don't understand
#cl.exe is the compiler used by nvcc on Windows
for a in ["-Wno-write-strings","-Wno-unused-label",
"-Wno-unused-variable", "-fno-math-errno"]:
if a in preargs:
preargs.remove(a)
if preargs is None: if preargs is None:
preargs= [] preargs= []
else: preargs = list(preargs) else: preargs = list(preargs)
...@@ -70,8 +84,17 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[ ...@@ -70,8 +84,17 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
preargs.append('-fPIC') preargs.append('-fPIC')
no_opt = False no_opt = False
cuda_root = config.cuda.root cuda_root = config.cuda.root
include_dirs = std_include_dirs() + include_dirs + [os.path.split(__file__)[0]]
libs = std_libs() + ['cudart'] + libs #The include dirs gived by the user should have precedence over
#the standards ones.
include_dirs = include_dirs + std_include_dirs()
if os.path.abspath(os.path.split(__file__)[0]) not in include_dirs:
include_dirs.append(os.path.abspath(os.path.split(__file__)[0]))
libs = std_libs() + libs
if 'cudart' not in libs:
libs.append('cudart')
lib_dirs = std_lib_dirs() + lib_dirs lib_dirs = std_lib_dirs() + lib_dirs
if cuda_root: if cuda_root:
lib_dirs.append(os.path.join(cuda_root, 'lib')) lib_dirs.append(os.path.join(cuda_root, 'lib'))
...@@ -133,11 +156,13 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[ ...@@ -133,11 +156,13 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
if sys.platform != 'darwin': if sys.platform != 'darwin':
# the 64bit CUDA libs are in the same files as are named by the function above # the 64bit CUDA libs are in the same files as are named by the function above
cmd.extend(['-Xlinker',','.join(['-rpath',os.path.join(config.cuda.root,'lib64')])]) cmd.extend(['-Xlinker',','.join(['-rpath',os.path.join(config.cuda.root,'lib64')])])
nvccflags = [flag for flag in config.cuda.nvccflags.split(' ') if flag]
cmd.extend(nvccflags)
cmd.extend('-I%s'%idir for idir in include_dirs) cmd.extend('-I%s'%idir for idir in include_dirs)
cmd.extend(['-o',lib_filename]) cmd.extend(['-o',lib_filename])
cmd.append(cppfilename) cmd.append(os.path.split(cppfilename)[-1])
if module_name != 'cuda_ndarray': if module_name != 'cuda_ndarray':
cmd.append(os.path.join(os.path.split(cppfilename)[0],'..','cuda_ndarray','cuda_ndarray.so')) cmd.append(os.path.join(os.path.split(cppfilename)[0],'..','cuda_ndarray','cuda_ndarray.'+get_lib_extension()))
cmd.extend(['-L%s'%ldir for ldir in lib_dirs]) cmd.extend(['-L%s'%ldir for ldir in lib_dirs])
cmd.extend(['-l%s'%l for l in libs]) cmd.extend(['-l%s'%l for l in libs])
if sys.platform == 'darwin': if sys.platform == 'darwin':
...@@ -156,12 +181,16 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[ ...@@ -156,12 +181,16 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
done = True done = True
#cmd.append("--ptxas-options=-v") #uncomment this to see register and shared-mem requirements #cmd.append("--ptxas-options=-v") #uncomment this to see register and shared-mem requirements
#print >> sys.stderr, 'COMPILING W CMD', cmd
debug('Running cmd', ' '.join(cmd)) debug('Running cmd', ' '.join(cmd))
orig_dir = os.getcwd()
try:
os.chdir(location)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
nvcc_stdout, nvcc_stderr = p.communicate()[:2] nvcc_stdout, nvcc_stderr = p.communicate()[:2]
finally:
os.chdir(orig_dir)
if nvcc_stdout: if nvcc_stdout:
# this doesn't happen to my knowledge # this doesn't happen to my knowledge
print >> sys.stderr, "DEBUG: nvcc STDOUT", nvcc_stdout print >> sys.stderr, "DEBUG: nvcc STDOUT", nvcc_stdout
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论