@@ -479,12 +480,12 @@ Windows V1.5 (optional follow-up to V1 instructions)
/postinstall/pi.sh
It will ask for your MinGW installation directory (e.g.
``c:\pythonxy\mingw``).
``c:/pythonxy/mingw``).
e) Download `ActivePerl <http://www.activestate.com/activeperl>`_ and
install it.
e) Download `ActivePerl <http://www.activestate.com/activeperl/downloads>`_ and
install it (other Perl interpreters should also work).
f) Unpack GotoBLAS2 (e.g. using `7-zip <http://www.7-zip.org/>`_ or in
f) Unpack GotoBLAS2, either using `7-zip <http://www.7-zip.org/>`_ or in
MSYS with:
.. code-block:: bash
...
...
@@ -500,47 +501,61 @@ Windows V1.5 (optional follow-up to V1 instructions)
quickbuild.win32 1>log.txt 2>err.txt
Compilation should take a few minutes. Afterwards, you will probably
find many error messages in err.txt, but also a libgoto2.dll
file in the exports folder. [NOTE: INSTRUCTIONS TO BE CONTINUED]
find many error messages in err.txt, but there should be an ``exports``
folder containing in particular ``libgoto2.dll``.
i) Copy libgoto2.dll from the exports folder to ``pythonxy\mingw\bin``
i) Copy ``libgoto2.dll`` from the ``exports`` folder to ``pythonxy\mingw\bin``
and ``pythonxy\mingw\lib``.
j) Modify your .theanorc (or .theanorc.txt) with "ldflags = -lgoto2".
This setting can also be changed in Python for testing purposes:
This setting can also be changed in Python for testing purpose (in which
case it will remain only for the duration of your Python session):
.. code-block:: python
theano.config.blas.ldflags = "-lgoto2"
- (Optional). To test the BLAS performance, you can run the script ``check_blas.py``.
For comparison I also downloaded and compiled the unoptimized standard
BLAS. The results were the following (Intel Core2 Duo 1.86 GHz):
Standard BLAS: 166 sec (unoptimized, 1 thread)
NumPy: 48 sec (1 thread)
Goto2: 16 sec (2 threads)
Conclusions:
a) The unoptimized standard BLAS is very slow. Don't use it.
b) The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
c) GotoBLAS is even faster, in particular if you have several kernels.
- (Optional) Gpu on Windows. Not sur it work! Can you report success/error on the `theano-users <http://groups.google.com/group/theano-users>`_ mailing list?
Those are indication for 32-bit version of Python, the one that come with Python(x,y) is 32-bit.
Space or non ascii caracter are not always supported in path. Python support
them, so your configuration file path can contain them.
nvcc(at least version 3.1) don't support them well. If your USERPROFILE
directory contain those caractere, you must add in your configuration file:
k) To test the BLAS performance, you can run the script
``theano/misc/check_blas.py``.
Note that you may control the number of threads used by GotoBLAS2 with
the ``GOTO_NUM_THREADS`` environment variable (default behavior is to use
all available cores).
Here are some performance results on an Intel Core2 Duo 1.86 GHz,
compared to using Numpy's BLAS or the un-optimized standard BLAS
(compiled manually from its source code):
* GotoBLAS2 (2 threads): 16s
* NumPy (1 thread): 48s
* Standard BLAS (un-optimized, 1 thread): 166s
Conclusions:
* The unoptimized standard BLAS is very slow and should not be used.
* The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
* GotoBLAS2 is even faster, in particular if you can use multiple cores.
Windows: Using the GPU
----------------------
Please note that these are tentative instructions (we have not yet been able to
get the GPU to work under Windows with Theano).
Please report your own successes / failures on the