Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
1203f136
提交
1203f136
authored
8月 21, 2012
作者:
steven-pigeon
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Removed HTML version
上级
8513b540
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
0 行增加
和
520 行删除
+0
-520
python-memory-management.html
doc/tutorial/python-memory-management.html
+0
-520
没有找到文件。
doc/tutorial/python-memory-management.html
deleted
100644 → 0
浏览文件 @
8513b540
<h1>
Python Memory Management
</h1>
One of the major challenges in writing (somewhat) large-scale Python
programs is to keep memory usage at a minimum. However, managing memory in
Python is easy
—
if you just don't care. Python allocates memory
transparently, manages objects using a reference count system, and frees
memory when an object's reference count falls to zero. In theory, it's
swell. In practice, you need to know a few things about Python memory
management to get a memory-efficient program running. One of the things you
should know, or at least get a good feel about, is the sizes of basic Python
objects. Another thing is how Python manages its memory internally.
<br><br>
<!--python-->
So let us begin with the size of basic objects. In Python, there's not a
lot of primitive data types: there are
<tt>
int
</tt>
s,
<tt>
long
</tt>
s (an
unlimited precision version of
<tt>
int
</tt>
), floats (which are doubles),
tuples, strings, lists, dictionaries, and classes.
<br><br>
<!--more-->
<h2>
Basic Objects
</h2>
What is the size of
<tt>
int
</tt>
? A programmer with a C or C++ background
will probably guess that the size of a machine-specific
<tt>
int
</tt>
is
something like 32 bits, maybe 64; and that therefore it occupies at most 8
bytes. But is that so in Python?
<br><br>
Let us first write a function that shows the sizes of objects (recursively
if necessary):
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
import sys
def show_sizeof(x,level=0):
print "\t"*level,x.__class__, sys.getsizeof(x), x
if hasattr(x,'__iter__'):
if hasattr(x,'items'):
for xx in x.items():
show_sizeof(xx,level+1)
else:
for xx in x:
show_sizeof(xx,level+1)
</pre>
We can now use the function to inspect the sizes of the different basic
data types:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
show_sizeof(None)
show_sizeof(3)
show_sizeof(2**63)
show_sizeof(102947298469128649161972364837164)
show_sizeof(918659326943756134897561304875610348756384756193485761304875613948576297485698417)
</pre>
If you have a 32-bits 2.7x Python, you'll see:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
NoneType
'
>
8 None
<type
'
int
'
>
12 3
<type
'
long
'
>
22 9223372036854775808
<type
'
long
'
>
28 102947298469128649161972364837164
<type
'
long
'
>
48 918659326943756134897561304875610348756384756193485761304875613948576297485698417
</pre>
and if you have a 64-bits 2.7x Python, you'll see:
<br><bR>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
NoneType
'
>
16 None
<type
'
int
'
>
24 3
<type
'
long
'
>
36 9223372036854775808
<type
'
long
'
>
40 102947298469128649161972364837164
<type
'
long
'
>
60 918659326943756134897561304875610348756384756193485761304875613948576297485698417
</pre>
Let us focus on the 64-bits version (mainly because that's what we need the
most often in our case).
<tt>
None
</tt>
takes 16 bytes.
<tt>
int
</tt>
take 24
bytes,
<em>
three times
</em>
as much memory as a C
<tt>
int64_t
</tt>
, despite
being some king of "machine-friendly" integer. Long integers (unbounded
precision), used to represent integers larger than 2
<sup>
63
</sup>
-1, have a
minimum size of 36 bytes. Then it grows linearly in the logarithm of the
integer represented.
<br><br>
Python's floats are implementation-specific but seems to be C doubles.
However, they do not eat up only 8 bytes:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
show_sizeof(3.14159265358979323846264338327950288)
</pre>
Outputs
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
float
'
>
16 3.14159265359
</pre>
on a 32-bits platform and
<br><bR>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
float
'
>
24 3.14159265359
</pre>
on a 64-bits platform. That's again, three times the size a C programmer
would expect. Now, what about strings?
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
show_sizeof("")
show_sizeof("My hovercraft is full of eels")
</pre>
outputs, on a 32 bits platform:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
str
'
>
21
<type
'
str
'
>
50 My hovercraft is full of eels
</pre>
and
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
str
'
>
37
<type
'
str
'
>
66 My hovercraft is full of eels
</pre>
An
<em>
empty
</em>
string costs 37 bytes in a 64-bits environment! Memory
used by string then linearly grow in the length of the (useful)
string.
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
Other structures commonly used, tuples, lists, and dictionary are
worthwhile to examine. Lists (which are implemented as
<a
href=
"http://en.wikipedia.org/wiki/Dynamic_array"
target=
_blank
>
array
lists
</a>
, not as
<a
href=
"http://en.wikipedia.org/wiki/Linked_list"
target=
_blank
>
linked lists
</a>
, with
<a
href=
"http://en.wikipedia.org/wiki/Dynamic_array#Performance"
target=
_blank
>
everything it entails
</a>
) are arrays of references to Python
objects, allowing them to be heterogeneous. Let us look at our
sizes:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
show_sizeof([])
show_sizeof([4,"toaster",230.1,])
</pre>
outputs
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
list
'
>
32 []
<type
'
list
'
>
44 [4, 'toaster', 230.1]
</pre>
on a 32-bits platform and
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
list
'
>
72 []
<type
'
list
'
>
96 [4, 'toaster', 230.1]
</pre>
on a 64-bits platform. An empty list eats up 72 bytes. The size of an
empty, 64-bits C++
<tt>
std::list
<int>
()
</tt>
is only 16 bytes, 4-5 times
less. What about tuples? (and dictionaries?):
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
show_sizeof({})
show_sizeof({'a':213,'b':2131})
</pre>
outputs, on a 32-bits box
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
dict
'
>
136 {}
<type
'
dict
'
>
136 {'a': 213, 'b': 2131}
<type
'
tuple
'
>
32 ('a', 213)
<type
'
str
'
>
22 a
<type
'
int
'
>
12 213
<type
'
tuple
'
>
32 ('b', 2131)
<type
'
str
'
>
22 b
<type
'
int
'
>
12 2131
</pre>
and
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
<type
'
dict
'
>
280 {}
<type
'
dict
'
>
280 {'a': 213, 'b': 2131}
<type
'
tuple
'
>
72 ('a', 213)
<type
'
str
'
>
38 a
<type
'
int
'
>
24 213
<type
'
tuple
'
>
72 ('b', 2131)
<type
'
str
'
>
38 b
<type
'
int
'
>
24 2131
</pre>
for a 64-bits box.
<br><br>
This last example is particularly interesting because it "doesn't add up."
If we look at individual tuples, they take 72 bytes (while their components
take 38+24=62 bytes, leaving 10 bytes for the tuple itself), but the
dictionary takes 280 bytes (rather than a strict minimum of 144=72
×
2
bytes). The dictionary is supposed to be an efficient data structure for
search and the two likely implementations will use more space that strictly
necessary. If it's some kind of tree, then we should pay the cost of
internal nodes that contain a key and two pointers to children nodes; if
it's a hash table, then we must have some room with free entries to ensure
good performance.
<br><br>
The (somewhat) equivalent
<tt>
std::map
<std::string
,
int
></tt>
C++ structure
takes 48 bytes when created (that is, empty). An empty C++ string takes 8
bytes (then allocated size grows linearly the size of the string). An
integer takes 32 bits.
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
Why does all this matter? It seems that whether an empty string takes 8
bytes or 37 doesn't change anything much. That's true. That's true
<em>
until
</em>
you need to scale. Then, you need to be really careful about
how many objects you create to limit the quantity of memory you program
uses. It is a problem in real-life applications.
<br><br>
However, to devise a really good strategy about memory management, we
mustn't only consider the sizes of objects, but how many and in which order
they are created. It turns out to be very important for Python programs.
One key element to understand is how Python allocates its memory
internally, which we will discuss next.
<br><br>
<h2>
Internal Memory Management
</h2>
To speed-up memory allocation (and reuse) Python uses a number of lists for
small objects. Each list will contain objects of similar size: there will
be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a
small object needs to be created, either we reuse a free block in the list,
or we allocate a new one.
<br><br>
There are some internal details on how Python manages those lists into
blocks, pools, and "arena": a number of block forms a pool, pools are
gathered into arena, etc., but they're not very relevant to the point we
want to make (if you really want to know, read Evan Jones'
<a
href=
"http://www.evanjones.ca/memoryallocator/"
target=
_blank
>
ideas on how
to improve Python's memory allocation
</a>
). The important point is that
those lists
<em>
never shrink
</em>
.
<br><br>
Indeed: if an item (of size
<i>
x
</i>
) is deallocated (freed by lack of
reference) its location is not returned to Python's global memory pool (and
even less to the system), but merely marked as free and added to the free
list of items of size
<i>
x
</i>
. The dead object's location will be reused
if another object of compatible size is needed. If there are no dead
objects available, new ones are created.
<br><br>
If small objects memory is never freed, then the inescapable conclusion is
that, like goldfishes, these small object lists only keep growing, never
shrinking, and that the memory footprint of your application is dominated
by the largest number of small objects allocated at any given
point.
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
Therefore, one should work hard to allocate only the number of small
objects necessary for one task, favoring (otherwise
<i>
unpythonèsque
</i>
)
loops where only a small number of elements are created/processed rather
than (more
<i>
pythonèsque
</i>
) patterns where lists are created using
list generation syntax then processed.
<br><br>
While the second pattern is more
<i>
à la Python
</i>
, it is rather the worst
case: you end up creating lots of small objects that will come populate the
small object lists, and even once the list is dead, the dead objects (now
all in the free lists) will still occupy a lot of memory.
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
The fact that the free lists grow does not seem like much of a problem
because the memory it contains is still accessible to the Python program.
But from the OS's perspective, your program's size is the total (maximum)
memory allocated to Python. Since Python returns memory to the OS on the
heap (that allocates other objects than small objects) only on Windows, if
you run on Linux, you can only see the total memory used by your program
increase.
<br><br
This
is
a
problem
if
you
have
to
run
multiple
instances
of
the
same
program
on
the
same
machine
.
Even
if
you
have
server-class
machine
,
you
may
still
encounter
problems
because
suddenly
32GB
of
RAM
isn
'
t
quite
enough
.<
br
><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
Let us prove my point using
<a
href=
"http://pypi.python.org/pypi/memory_profiler"
target=
_blank
>
memory_profiler
</a>
, a Python add-on module (which depends on
the
<tt>
python-psutil
</tt>
package) by
<a
href=
"https://github.com/fabianp"
target=
_blank
>
Fabian Pedregosa
</a>
(the module's
<a
href=
"https://github.com/fabianp/memory_profiler"
target=
_blank
>
github
page
</a>
). This add-on provides the decorator
<tt>
@profile
</tt>
that allows
one to monitor one specific function memory usage. It is extremely simple
to use. Let us consider this small program (it makes my point
entirely):
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
port copy, memory_profiler
@profile
def function():
x=range(1000000) # allocate a big list
y=copy.deepcopy(x)
del x
return y
if __name__=="__main__":
function()
</pre>
invoking
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
python -m memory_profiler memory-profile-me.py
</pre>
prints, on a 64-bits computer
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
Filename: memory-profile-me.py
Line # Mem usage Increment Line Contents
================================================
3 @profile
4 9.11 MB 0.00 MB def function():
5 40.05 MB 30.94 MB x=range(1000000) # allocate a big list
6 89.73 MB 49.68 MB y=copy.deepcopy(x)
7 82.10 MB -7.63 MB del x
8 82.10 MB 0.00 MB return y
</pre>
This small program creates a list with 1,000,000 ints (at 24 bytes each,
for ~24 million bytes) plus a list of references (at 8 bytes each, for ~8
million bytes), for about 30MB. It then deep-copies the object (which
allocates ~50MB, not sure why; a simple copy would allocate only 8MB of
references). Freeing
<tt>
x
</tt>
with
<tt>
del
</tt>
frees the reference list,
kills the associated objects, but lo!, the amount of memory only goes down
by the number of references, because the list itself is not in a small
objects' list, but on the heap, and the dead small objects remain in the
free list, and not returned to the interpreter's global heap.
<br><br>
In this example, we end up with
<em>
twice
</em>
the memory allocated, with
82MB, while only one list necessitating about 30MB is returned. You can see
why it is easy to have memory just increase more or less surprisingly if
we're not careful.
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
On a related note: is
<tt>
pickle
</tt>
wasteful?
<br><br>
<a
href=
"http://docs.python.org/library/pickle.html"
target=
_blank
>
Pickle
</a>
is the standard way of (de)serializing Python
objects to file. What is its memory footprint? Does it create extra copies
of the data or is it rather smart about it?
<br><br>
Consider this short example:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
import memory_profiler, random, pickle
def random_string():
return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ])
@profile
def create_file():
x=[ (random.random(),
random_string(),
random.randint(0,2**64))
for _ in xrange(1000000) ]
pickle.dump(x,open('machin.pkl','w'))
@profile
def load_file():
y=pickle.load(open('machin.pkl','r'))
return y
if __name__=="__main__":
create_file()
#load_file()
</pre>
With one invocation to profile the creation of the pickled data, and one
invocation to re-read it (you comment out the function not to be called).
Using
<tt>
memory_profiler
</tt>
, the creation uses a lot of memory:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
Filename: test-pickle.py
Line # Mem usage Increment Line Contents
================================================
8 @profile
9 9.18 MB 0.00 MB def create_file():
10 9.33 MB 0.15 MB x=[ (random.random(),
11 random_string(),
12 random.randint(0,2**64))
13 246.11 MB 236.77 MB for _ in xrange(1000000) ]
14
15 481.64 MB 235.54 MB pickle.dump(x,open('machin.pkl','w'))
</pre>
and re-reading a bit less:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
Filename: test-pickle.py
Line # Mem usage Increment Line Contents
================================================
18 @profile
19 9.18 MB 0.00 MB def load_file():
20 311.02 MB 301.83 MB y=pickle.load(open('machin.pkl','r'))
21 311.02 MB 0.00 MB return y
</pre>
So somehow,
<i>
pickling
</i>
is very bad for memory consumption. The initial
list takes up more or less 230MB, but pickling it creates an extra
230-something MB worth of memory allocation.
<br><br>
Unpickling, on the other hand, seems fairly efficient. It does create more
memory than the original list (300MB instead of 230-something) but it does
not double the quantity of allocated memory.
<br><br>
Overall, then, (un)pickling should be avoided for memory-sensitive
applications. What are the alternatives? Pickling preserves all the
structure of a data structure, so you can recover it exactly from the
pickled file at a later time. However, that might not always be needed. If
the file is to contain a list as in the example above, then maybe a simple
flat, text-based, file format is in order. Let us see what it
gives.
<br><br>
A naïve implementation would give:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
import memory_profiler, random, pickle
def random_string():
return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ])
@profile
def create_file():
x=[ (random.random(),
random_string(),
random.randint(0,2**64))
for _ in xrange(1000000) ]
f=open('machin.flat','w')
for xx in x:
print >>f, xx
@profile
def load_file():
y=[]
f=open('machin.flat','r')
for line in f:
y.append(eval(line))
return y
if __name__=="__main__":
create_file()
#load_file()
</pre>
Creating the file:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
Filename: test-flat.py
Line # Mem usage Increment Line Contents
================================================
8 @profile
9 9.19 MB 0.00 MB def create_file():
10 9.34 MB 0.15 MB x=[ (random.random(),
11 random_string(),
12 random.randint(0,2**64))
13 246.09 MB 236.75 MB for _ in xrange(1000000) ]
14
15 246.09 MB 0.00 MB f=open('machin.flat','w')
16 308.27 MB 62.18 MB for xx in x:
17 print >>f, xx
</pre>
and reading the file back:
<br><br>
<pre
style=
"background:#eef;padding:10px"
>
Filename: test-flat.py
Line # Mem usage Increment Line Contents
================================================
20 @profile
21 9.19 MB 0.00 MB def load_file():
22 9.34 MB 0.15 MB y=[]
23 9.34 MB 0.00 MB f=open('machin.flat','r')
24 300.99 MB 291.66 MB for line in f:
25 300.99 MB 0.00 MB y.append(eval(line))
26 301.00 MB 0.00 MB return y
</pre>
Memory consumption on writing is now much better. It still creates a lot of
temporary small objects (for 60MB's worth), but it's not doubling memory
usage. Reading is comparable (using only marginally less memory).
<br><br>
<p
align=
"center"
>
*
<br>
*
 
*
</p>
Python design goals are radically different than, say, C design goals.
While the latter is designed to give you good control on what you're doing
at the expense of more complex and explicit programming, the former is
designed to let you code rapidly while hiding most (if not all) of the
underlying implementation details. While this sounds nice, in a production
environment ignoring the implementation inefficiencies of a language can
bite you hard, and sometimes when it's too late. I think that having a good
feel of how inefficient Python is with memory management (by design!) will
play an important role in whether or not your code meets production
requirements, scales well, or, on the contrary, will be a burning hell of
memory.
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论