Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
ec0419a6
提交
ec0419a6
authored
5月 30, 2016
作者:
Pascal Lamblin
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #4500 from slefrancois/gpu_out_sandbox
Update doc with instructions for using new gpu backend
上级
0044349f
974bd517
全部展开
隐藏空白字符变更
内嵌
并排
正在显示
14 个修改的文件
包含
66 行增加
和
102 行删除
+66
-102
.gitignore
.gitignore
+2
-0
extending_theano.txt
doc/extending/extending_theano.txt
+2
-2
install.txt
doc/install.txt
+10
-8
install_ubuntu.txt
doc/install_ubuntu.txt
+11
-11
install_windows.txt
doc/install_windows.txt
+7
-5
config.txt
doc/library/config.txt
+19
-19
optimizations.txt
doc/optimizations.txt
+1
-1
aliasing.txt
doc/tutorial/aliasing.txt
+0
-47
using_gpu.txt
doc/tutorial/using_gpu.txt
+0
-0
using_gpu_solution_1.py
doc/tutorial/using_gpu_solution_1.py
+0
-0
using_multi_gpu.txt
doc/tutorial/using_multi_gpu.txt
+3
-3
configdefaults.py
theano/configdefaults.py
+3
-4
check_blas.py
theano/misc/check_blas.py
+5
-0
__init__.py
theano/sandbox/gpuarray/__init__.py
+3
-2
没有找到文件。
.gitignore
浏览文件 @
ec0419a6
...
...
@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints
.pydevproject
.ropeproject
core
\ No newline at end of file
doc/extending/extending_theano.txt
浏览文件 @
ec0419a6
...
...
@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops
^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
When using the old GPU backend, Ops to be executed on the GPU should inherit
from
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU.
...
...
doc/install.txt
浏览文件 @
ec0419a6
...
...
@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
gpu
1``.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
cuda
1``.
See :ref:`libdoc_config` for more information on how to change these
configuration options.
...
...
@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
gpu
,floatX=float32'``.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
cuda
,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg
[global]
device =
gpu
device =
cuda
floatX = float32
Note that:
* If your computer has multiple GPUs and you use 'device=
gpu
', the driver
selects the one to use (usually
gpu
0).
* You can use the program
nvida-smi
to change this policy.
* You can choose one specific GPU by specifying 'device=
gpu
X', with X the
* If your computer has multiple GPUs and you use 'device=
cuda
', the driver
selects the one to use (usually
cuda
0).
* You can use the program
``nvidia-smi``
to change this policy.
* You can choose one specific GPU by specifying 'device=
cuda
X', with X the
the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU.
...
...
@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly.
...
...
doc/install_ubuntu.txt
浏览文件 @
ec0419a6
...
...
@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash
...
...
@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs
----------------------
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:
.. code-block:: bash
git clone git://github.com/Theano/Theano.git
cd Theano
cd Theano
python setup.py develop --user
cd ..
VirtualEnv
----------
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.
.. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate
pip install Theano
...
...
@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run:
.. code-block:: bash
git pull
...
...
@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash
THEANO_FLAGS=floatX=float32,device=
gpu
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
THEANO_FLAGS=floatX=float32,device=
cuda
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note::
...
...
doc/install_windows.txt
浏览文件 @
ec0419a6
...
...
@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput::
:hide:
:options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ...
.. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000
...
...
@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use
############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the
...
...
@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg
[global]
device =
gpu
device =
cuda
floatX = float32
[nvcc]
...
...
@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically
If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS.
.. note::
...
...
doc/library/config.txt
浏览文件 @
ec0419a6
...
...
@@ -51,11 +51,11 @@ Environment Variables
.. code-block:: bash
THEANO_FLAGS='floatX=float32,device=
gpu
0,lib.cnmem=1' python <myscript>.py
THEANO_FLAGS='floatX=float32,device=
cuda
0,lib.cnmem=1' python <myscript>.py
If a value is defined several times in ``THEANO_FLAGS``,
the right-most definition is used. So, for instance, if
``THEANO_FLAGS='device=cpu,device=
gpu0'``, then gpu
0 will be used.
``THEANO_FLAGS='device=cpu,device=
cuda0'``, then cuda
0 will be used.
.. envvar:: THEANORC
...
...
@@ -70,7 +70,7 @@ Environment Variables
[global]
floatX = float32
device =
gpu
0
device =
cuda
0
[lib]
cnmem = 1
...
...
@@ -102,22 +102,21 @@ import theano and print the config variable, as in:
.. attribute:: device
String value: either ``'cpu'``, ``'
gpu'``, ``'gpu0'``, ``'gpu
1'``,
``'
gpu2'``, or ``'gpu3'``
String value: either ``'cpu'``, ``'
cuda'``, ``'cuda0'``, ``'cuda
1'``,
``'
opencl0:0'``, ``'opencl0:1'``, ``'gpu'``, ``'gpu0'`` ...
Default device for computations. If ``gpu*``, change the default to try
to move computation to it and to put shared variable of float32 on
it.
Choose the default compute device for theano graphs. Setting this to a
``gpu*`` string will make theano to try by default to move computation to it.
Also it will make theano put by default shared variable of float32 on it.
``'gpu'`` lets the driver select the GPU to use, while ``'gpu?'`` makes Theano try
to use a specific device. If we are not able to use the GPU, either we fall back
on the CPU, or an error is raised, depending on the :attr:`force_device` flag.
Default device for computations. If ``'cuda*``, change the default to try
to move computation to the GPU using CUDA libraries. If ``'opencl*'``,
the openCL libraries will be used. To let the driver select the device,
use ``'cuda'`` or ``'opencl'``. If ``'gpu*'``, the old gpu backend will
be used, although users are encouraged to migrate to the new GpuArray
backend. If we are not able to use the GPU,
either we fall back on the CPU, or an error is raised, depending
on the :attr:`force_device` flag.
This flag's value cannot be modified during the program execution.
Do not use upper case letters, only lower case even if NVIDIA use
Do not use upper case letters, only lower case even if NVIDIA use
s
capital letters.
.. attribute:: force_device
...
...
@@ -138,11 +137,12 @@ import theano and print the config variable, as in:
.. attribute:: init_gpu_device
String value: either ``''``, ``'
gpu'``, ``'gpu0'``, ``'gpu1'``, ``'gpu2
'``,
or ``'gpu3'``
String value: either ``''``, ``'
cuda'``, ``'cuda0'``, ``'cuda1
'``,
``'opencl0:0'``, ``'opencl0:1'``, ``'gpu'``, ``'gpu0'`` ...
Initialize the gpu device to use.
When its value is gpu*, the theano flag :attr:`device` must be ``"cpu"``.
When its value is ``'cuda*'``, ``'opencl*'`` or ``'gpu*'``, the theano
flag :attr:`device` must be ``'cpu'``.
Unlike :attr:`device`, setting this flag to a specific GPU will not
try to use this device by default, in particular it will **not** move
computations, nor shared variables, to the specified GPU.
...
...
doc/optimizations.txt
浏览文件 @
ec0419a6
...
...
@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ =============
:term:`merge` x x
:term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...
...
@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x
:term:`inplace_random` x
:term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x
:term:`local_remove_all_assert`
========================================================= ========= ============ =============
...
...
doc/tutorial/aliasing.txt
浏览文件 @
ec0419a6
...
...
@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the
graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:*
When an input *x* to a function is not needed after the function
...
...
@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
doc/tutorial/using_gpu.txt
浏览文件 @
ec0419a6
差异被折叠。
点击展开。
doc/tutorial/using_gpu_solution_1.py
浏览文件 @
ec0419a6
差异被折叠。
点击展开。
doc/tutorial/using_multi_gpu.txt
浏览文件 @
ec0419a6
...
...
@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of
distr
u
buting the jobs across the different device a model can be
distr
i
buting the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has
smaller memory.
...
...
@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you
choose.
However you should try to minimize transfer operations
because they will introduce overhead
any
may reduce performance.
choose. However you should try to minimize transfer operations
because they will introduce overhead
that
may reduce performance.
theano/configdefaults.py
浏览文件 @
ec0419a6
...
...
@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar
(
'device'
,
(
"Default device for computations. If gpu*, change the default to try "
"to move computation to it and to put shared variable of float32 "
"on it. Do not use upper case letters, only lower case even if "
"NVIDIA use capital letters."
),
(
"Default device for computations. If cuda* or opencl*, change the"
"default to try to move computation to the GPU. Do not use upper case"
"letters, only lower case even if NVIDIA uses capital letters."
),
DeviceParam
(
'cpu'
,
allow_override
=
False
),
in_c_key
=
False
)
...
...
theano/misc/check_blas.py
100755 → 100644
浏览文件 @
ec0419a6
...
...
@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0
=
0
t1
=
-
1
f
()
# Ignore first function call to get representative time.
if
execute
:
sync
=
(
hasattr
(
theano
,
"sandbox"
)
and
hasattr
(
theano
.
sandbox
,
"cuda"
)
and
theano
.
sandbox
.
cuda
.
cuda_available
)
sync2
=
(
hasattr
(
theano
,
"gpuarray"
)
and
theano
.
gpuarray
.
pygpu_activated
)
t0
=
time
.
time
()
for
i
in
range
(
iters
):
f
()
if
sync
:
theano
.
sandbox
.
cuda
.
synchronize
()
if
sync2
:
c
.
get_value
(
borrow
=
True
,
return_internal_type
=
True
)
.
sync
()
t1
=
time
.
time
()
return
t1
-
t0
,
impl
...
...
theano/sandbox/gpuarray/__init__.py
浏览文件 @
ec0419a6
...
...
@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import
warnings
from
theano.gpuarray
import
*
message
=
"theano.sandbox.gpuarray has been moved to theano.gpuarray."
+
\
" Please update your code and pickles."
message
=
(
"theano.sandbox.gpuarray has been moved to theano.gpuarray. "
"Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear')."
)
warnings
.
warn
(
message
)
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论