原文:http://pandas.pydata.org/pandas-docs/stable/overview.html
校对:(虚位以待)
pandas
包含以下内容
- 面板数据标记功能,其中主要的是Series和DataFrame
- 索引功能,其中分为简单轴索引和多层/多级轴索引
- 用于聚合和转换数据集的引擎集成组
- 生成日期范围(日期范围)和自定义日期偏移,实现自定义频率
- 输入/输出工具:从平面文件(CSV,分隔,Excel 2003)加载表格数据,以及从快速高效的PyTables / HDF5格式保存和加载pandas对象。
- 用于存储大多数丢失或大部分不变的数据(某些固定值)的标准数据结构的高效内存“稀疏”版本
- 移动窗口统计(滚动平均,滚动标准偏差等)
- 静态和移动窗口线性回归和面板回归
维度 | 名称 | 描述 |
---|---|---|
1 | Series | 1维标记同类型数组 |
2 | DataFrame | 普通2维标记大小可变的表格结构,且列能有不同类型 |
3 | Panel | 普通3维标记,大小可变 |
学习 pandas 数据结构的最好方法是将其作为低维数据的灵活容器。例如,DataFrame 是 Series 的容器,Panel 是 DataFrame 对象的容器。我们希望能够以类似字典的方式从这些容器中插入和删除对象。
此外,我们将对通用API函数(其考虑时间序列和横截面数据集的典型取向)采取明智的默认行为。当使用ndarrays存储2维和3维数据时,用户在编写函数时会考虑数据集的方向,轴被认为或多或少相等(除非C-或Fortran连续性对性能至关重要)。在pandas中,轴旨在为数据提供更多的语义意义;即,对于特定数据集,可能存在定向数据的“正确”方式。因此,目标是减少在下游功能中编码数据转换所需的精神努力量。
例如,对于表格数据(DataFrame),考虑索引(行)和列而不是轴0和轴1更具语义上的帮助。并且遍历DataFrame的列,因此导致更可读的代码:
for col in df.columns:
series = df[col]
# do something with series
所有的pandas数据结构都是值可变的(它们包含的值可以改变),但不总是size-mutable。Series的长度不可更改,但是,可以将列插入到DataFrame中。然而,绝大多数方法产生新对象并且保持输入数据不变。一般来说,我们喜欢有利于不变性。
Pandas问题和想法的第一站是Github问题跟踪器。如果你有一个一般的问题,pandas社区专家可以通过Stack Overflow回答。
更长的讨论发生在开发人员邮件列表上,而Lambda Foundry的商业支持查询应发送到:支持@ lambdafoundry 。
pandas于2008年4月由AQR资本管理。它是在2009年年底开源的。2011年底,AQR继续为发展提供资源,并继续提供今天的错误报告。
自2012年1月起,Lambda Foundry一直提供开发资源,以及商业支持,培训和pandas咨询。
pandas只是由一组世界各地的人像你一样,贡献了新的代码,错误报告,修复,评论和想法。完整的列表可以在Github上找到。
=======
License
=======
pandas is distributed under a 3-clause ("Simplified" or "New") BSD
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
BSD-compatible licenses, are included. Their licenses follow the pandas
license.
pandas license
==============
Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team
All rights reserved.
Copyright (c) 2008-2011 AQR Capital Management, LLC
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of the copyright holder nor the names of any
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
About the Copyright Holders
===========================
AQR Capital Management began pandas development in 2008. Development was
led by Wes McKinney. AQR released the source under this license in 2009.
Wes is now an employee of Lambda Foundry, and remains the pandas project
lead.
The PyData Development Team is the collection of developers of the PyData
project. This includes all of the PyData sub-projects, including pandas. The
core team that coordinates development on GitHub can be found here:
http://github.com/pydata.
Full credits for pandas contributors can be found in the documentation.
Our Copyright Policy
====================
PyData uses a shared copyright model. Each contributor maintains copyright
over their contributions to PyData. However, it is important to note that
these contributions are typically only changes to the repositories. Thus,
the PyData source code, in its entirety, is not the copyright of any single
person or institution. Instead, it is the collective copyright of the
entire PyData Development Team. If individual contributors want to maintain
a record of what changes/contributions they have specific copyright on,
they should indicate their copyright in the commit message of the change
when they commit the change to one of the PyData repositories.
With this in mind, the following banner should be used in any source code
file to indicate the copyright and license terms:
#-----------------------------------------------------------------------------
# Copyright (c) 2012, PyData Development Team
# All rights reserved.
#
# Distributed under the terms of the BSD Simplified License.
#
# The full license is in the LICENSE file, distributed with this software.
#-----------------------------------------------------------------------------
Other licenses can be found in the LICENSES directory.