@@ -479,12 +480,12 @@ Windows V1.5 (optional follow-up to V1 instructions)
/postinstall/pi.sh
It will ask for your MinGW installation directory (e.g.
``c:\pythonxy\mingw``).
``c:/pythonxy/mingw``).
e) Download `ActivePerl <http://www.activestate.com/activeperl>`_ and
install it.
e) Download `ActivePerl <http://www.activestate.com/activeperl/downloads>`_ and
install it (other Perl interpreters should also work).
f) Unpack GotoBLAS2 (e.g. using `7-zip <http://www.7-zip.org/>`_ or in
f) Unpack GotoBLAS2, either using `7-zip <http://www.7-zip.org/>`_ or in
MSYS with:
.. code-block:: bash
...
...
@@ -500,47 +501,61 @@ Windows V1.5 (optional follow-up to V1 instructions)
quickbuild.win32 1>log.txt 2>err.txt
Compilation should take a few minutes. Afterwards, you will probably
find many error messages in err.txt, but also a libgoto2.dll
file in the exports folder. [NOTE: INSTRUCTIONS TO BE CONTINUED]
find many error messages in err.txt, but there should be an ``exports``
folder containing in particular ``libgoto2.dll``.
i) Copy libgoto2.dll from the exports folder to ``pythonxy\mingw\bin``
i) Copy ``libgoto2.dll`` from the ``exports`` folder to ``pythonxy\mingw\bin``
and ``pythonxy\mingw\lib``.
j) Modify your .theanorc (or .theanorc.txt) with "ldflags = -lgoto2".
This setting can also be changed in Python for testing purposes:
This setting can also be changed in Python for testing purpose (in which
case it will remain only for the duration of your Python session):
.. code-block:: python
theano.config.blas.ldflags = "-lgoto2"
- (Optional). To test the BLAS performance, you can run the script ``check_blas.py``.
For comparison I also downloaded and compiled the unoptimized standard
BLAS. The results were the following (Intel Core2 Duo 1.86 GHz):
k) To test the BLAS performance, you can run the script
``theano/misc/check_blas.py``.
Note that you may control the number of threads used by GotoBLAS2 with
the ``GOTO_NUM_THREADS`` environment variable (default behavior is to use
all available cores).
Here are some performance results on an Intel Core2 Duo 1.86 GHz,
compared to using Numpy's BLAS or the un-optimized standard BLAS
(compiled manually from its source code):
Standard BLAS: 166 sec (unoptimized, 1 thread)
NumPy: 48 sec (1 thread)
Goto2: 16 sec (2 threads)
* GotoBLAS2 (2 threads): 16s
* NumPy (1 thread): 48s
* Standard BLAS (un-optimized, 1 thread): 166s
Conclusions:
a) The unoptimized standard BLAS is very slow. Don't use it.
b) The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
c) GotoBLAS is even faster, in particular if you have several kernels.
* The unoptimized standard BLAS is very slow and should not be used.
* The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
* GotoBLAS2 is even faster, in particular if you can use multiple cores.
- (Optional) Gpu on Windows. Not sur it work! Can you report success/error on the `theano-users <http://groups.google.com/group/theano-users>`_ mailing list?
Windows: Using the GPU
----------------------
Those are indication for 32-bit version of Python, the one that come with Python(x,y) is 32-bit.
Please note that these are tentative instructions (we have not yet been able to
get the GPU to work under Windows with Theano).
Please report your own successes / failures on the