DataFrame memory usage

对于pandas版本0.15.0，当使用info方法访问结构数据时，将输出结构数据（包括索引）的内存使用情况。配置选项display.memory_usage（请参阅Options and Settings）指定在调用df.info()

例如，调用df.info()时会显示以下结构数据的内存使用情况：

In [1]: dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
   ...:           'complex128', 'object', 'bool']
   ...: 

In [2]: n = 5000

In [3]: data = dict([ (t, np.random.randint(100, size=n).astype(t))
   ...:                 for t in dtypes])
   ...: 

In [4]: df = pd.DataFrame(data)

In [5]: df['categorical'] = df['object'].astype('category')

In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 8 columns):
bool               5000 non-null bool
complex128         5000 non-null complex128
datetime64[ns]     5000 non-null datetime64[ns]
float64            5000 non-null float64
int64              5000 non-null int64
object             5000 non-null object
timedelta64[ns]    5000 non-null timedelta64[ns]
categorical        5000 non-null category
dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1)
memory usage: 284.1+ KB

+符号表示真正的内存使用率可能更高，因为pandas不会计算dtype=object的列中使用的内存。

版本0.17.1中的新功能。

传递memory_usage='deep'参数，将输出更准确的内存使用情况报告，包含结构数据内存的完全使用情况。这是参数是可选的，因为做更深入的内存检查需要付出更多。

In [7]: df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 8 columns):
bool               5000 non-null bool
complex128         5000 non-null complex128
datetime64[ns]     5000 non-null datetime64[ns]
float64            5000 non-null float64
int64              5000 non-null int64
object             5000 non-null object
timedelta64[ns]    5000 non-null timedelta64[ns]
categorical        5000 non-null category
dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1)
memory usage: 401.2 KB

默认情况下，display选项设置为True，但是可以在调用df.info()时传递memory_usage参数来显式覆盖。

通过调用memory_usage方法可以找到每列的内存使用情况。这将返回一个具有以字节表示的列的名称和内存使用情况的索引。对于上面的数据帧，可以使用memory_usage方法找到每列数据的内存使用情况和结构数据的总内存使用情况：

In [8]: df.memory_usage()
Out[8]: 
Index                 72
bool                5000
complex128         80000
datetime64[ns]     40000
float64            40000
int64              40000
object             40000
timedelta64[ns]    40000
categorical         5800
dtype: int64

# total memory usage of dataframe
In [9]: df.memory_usage().sum()
Out[9]: 290872

默认情况下，结构数据索引的内存使用情况显示在返回的Series中，可以通过传递index=False参数来去除索引的内存使用情况：

In [10]: df.memory_usage(index=False)
Out[10]: 
bool                5000
complex128         80000
datetime64[ns]     40000
float64            40000
int64              40000
object             40000
timedelta64[ns]    40000
categorical         5800
dtype: int64

info方法显示的内存使用情况利用memory_usage方法来确定结构数据的内存使用情况，同时还以人类可读单位格式化输出（base-2表示；即1KB = 1024字节）。

另请参见Categorical Memory Usage。

Frequently Asked Questions (FAQ)

DataFrame memory usage

Byte-Ordering Issues

Visualizing Data in Qt applications