Time Deltas

原文:http://pandas.pydata.org/pandas-docs/stable/timedeltas.html

译者:飞龙 UsyiyiCN

校对:(虚位以待)

注意

从v0.15.0开始,我们引入了一个新的标量类型Timedelta,它是datetime.timedelta的子类,并且行为类似,但允许与np.timedelta64类型以及主机的自定义表示,解析和属性。

Timedeltas是时间差,以差单位表示,例如。天,小时,分钟,秒。他们可以是积极的和消极的。

Parsing

您可以通过各种参数构造Timedelta标量:

# strings
In [1]: Timedelta('1 days')
Out[1]: Timedelta('1 days 00:00:00')

In [2]: Timedelta('1 days 00:00:00')
Out[2]: Timedelta('1 days 00:00:00')

In [3]: Timedelta('1 days 2 hours')
Out[3]: Timedelta('1 days 02:00:00')

In [4]: Timedelta('-1 days 2 min 3us')
Out[4]: Timedelta('-2 days +23:57:59.999997')

# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [5]: Timedelta(days=1, seconds=1)
Out[5]: Timedelta('1 days 00:00:01')

# integers with a unit
In [6]: Timedelta(1, unit='d')
Out[6]: Timedelta('1 days 00:00:00')

# from a timedelta/np.timedelta64
In [7]: Timedelta(timedelta(days=1, seconds=1))
Out[7]: Timedelta('1 days 00:00:01')

In [8]: Timedelta(np.timedelta64(1, 'ms'))
Out[8]: Timedelta('0 days 00:00:00.001000')

# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [9]: Timedelta('-1us')
Out[9]: Timedelta('-1 days +23:59:59.999999')

# a NaT
In [10]: Timedelta('nan')
Out[10]: NaT

In [11]: Timedelta('nat')
Out[11]: NaT

DateOffsets日, 小时, 分钟, 第二, Milli, Micro, Nano)也可用于建筑。

In [12]: Timedelta(Second(2))
Out[12]: Timedelta('0 days 00:00:02')

此外,标量中的操作产生另一个标量Timedelta

In [13]: Timedelta(Day(2)) + Timedelta(Second(2)) + Timedelta('00:00:00.000123')
Out[13]: Timedelta('2 days 00:00:02.000123')

to_timedelta

警告

在0.15.0 pd.to_timedelta之前,将为标量输入返回Series,对于标量输入返回np.timedelta64现在,对于类列输入,将返回TimedeltaIndex,对于系列输入,返回Series,对于标量输入,返回Timedelta

pd.to_timedelta的参数现在为(arg, unit ='ns', box = True) t5>,之前是(arg, box = True, unit ='ns')因为这些更合乎逻辑。

使用顶层pd.to_timedelta,您可以将标量,数组,列表或系列从已识别的timedelta格式/值转换为Timedelta类型。如果输入是一个系列,它将构造系列,如果输入是类标量,则构造系列,否则输出TimedeltaIndex

您可以将单个字符串解析为Timedelta:

In [14]: to_timedelta('1 days 06:05:01.00003')
Out[14]: Timedelta('1 days 06:05:01.000030')

In [15]: to_timedelta('15.5us')
Out[15]: Timedelta('0 days 00:00:00.000015')

或字符串的列表/数组:

In [16]: to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
Out[16]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None)

unit关键字参数指定Timedelta的单位:

In [17]: to_timedelta(np.arange(5), unit='s')
Out[17]: TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None)

In [18]: to_timedelta(np.arange(5), unit='d')
Out[18]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

Timedelta limitations

Pandas表示使用64位整数的纳秒分辨率的Timedeltas因此,64位整数限制确定Timedelta限制。

In [19]: pd.Timedelta.min
Out[19]: Timedelta('-106752 days +00:12:43.145224')

In [20]: pd.Timedelta.max
Out[20]: Timedelta('106751 days 23:47:16.854775')

Operations

您可以操作系列/数据帧,并通过在datetime64[ns]系列或Timestamps上的减法运算构造timedelta64[ns]

In [21]: s = Series(date_range('2012-1-1', periods=3, freq='D'))

In [22]: td = Series([ Timedelta(days=i) for i in range(3) ])

In [23]: df = DataFrame(dict(A = s, B = td))

In [24]: df
Out[24]: 
           A      B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days

In [25]: df['C'] = df['A'] + df['B']

In [26]: df
Out[26]: 
           A      B          C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05

In [27]: df.dtypes
Out[27]: 
A     datetime64[ns]
B    timedelta64[ns]
C     datetime64[ns]
dtype: object

In [28]: s - s.max()
Out[28]: 
0   -2 days
1   -1 days
2    0 days
dtype: timedelta64[ns]

In [29]: s - datetime(2011, 1, 1, 3, 5)
Out[29]: 
0   364 days 20:55:00
1   365 days 20:55:00
2   366 days 20:55:00
dtype: timedelta64[ns]

In [30]: s + timedelta(minutes=5)
Out[30]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [31]: s + Minute(5)
Out[31]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [32]: s + Minute(5) + Milli(5)
Out[32]: 
0   2012-01-01 00:05:00.005
1   2012-01-02 00:05:00.005
2   2012-01-03 00:05:00.005
dtype: datetime64[ns]

使用timedelta64[ns]系列的标量操作:

In [33]: y = s - s[0]

In [34]: y
Out[34]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

支持具有NaT值的timedeltas系列:

In [35]: y = s - s.shift()

In [36]: y
Out[36]: 
0      NaT
1   1 days
2   1 days
dtype: timedelta64[ns]

可以使用np.nan将元素设置为NaT:类似于数据时间:

In [37]: y[1] = np.nan

In [38]: y
Out[38]: 
0      NaT
1      NaT
2   1 days
dtype: timedelta64[ns]

操作数也可以以相反的顺序出现(使用Series操作的单个对象):

In [39]: s.max() - s
Out[39]: 
0   2 days
1   1 days
2   0 days
dtype: timedelta64[ns]

In [40]: datetime(2011, 1, 1, 3, 5) - s
Out[40]: 
0   -365 days +03:05:00
1   -366 days +03:05:00
2   -367 days +03:05:00
dtype: timedelta64[ns]

In [41]: timedelta(minutes=5) + s
Out[41]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

min, max和对应的idxmin, idxmax t3 >在帧上支持操作:

In [42]: A = s - Timestamp('20120101') - Timedelta('00:05:05')

In [43]: B = s - Series(date_range('2012-1-2', periods=3, freq='D'))

In [44]: df = DataFrame(dict(A=A, B=B))

In [45]: df
Out[45]: 
                  A       B
0 -1 days +23:54:55 -1 days
1   0 days 23:54:55 -1 days
2   1 days 23:54:55 -1 days

In [46]: df.min()
Out[46]: 
A   -1 days +23:54:55
B   -1 days +00:00:00
dtype: timedelta64[ns]

In [47]: df.min(axis=1)
Out[47]: 
0   -1 days
1   -1 days
2   -1 days
dtype: timedelta64[ns]

In [48]: df.idxmin()
Out[48]: 
A    0
B    0
dtype: int64

In [49]: df.idxmax()
Out[49]: 
A    2
B    0
dtype: int64

在系列上也支持min, max, idxmin, idxmax 标量结果将是Timedelta

In [50]: df.min().max()
Out[50]: Timedelta('-1 days +23:54:55')

In [51]: df.min(axis=1).min()
Out[51]: Timedelta('-1 days +00:00:00')

In [52]: df.min().idxmax()
Out[52]: 'A'

In [53]: df.min(axis=1).idxmin()
Out[53]: 0

你可以填写timedeltas。整数将被解释为秒。你可以传递timedelta来获得一个特定的值。

In [54]: y.fillna(0)
Out[54]: 
0   0 days
1   0 days
2   1 days
dtype: timedelta64[ns]

In [55]: y.fillna(10)
Out[55]: 
0   0 days 00:00:10
1   0 days 00:00:10
2   1 days 00:00:00
dtype: timedelta64[ns]

In [56]: y.fillna(Timedelta('-1 days, 00:00:05'))
Out[56]: 
0   -1 days +00:00:05
1   -1 days +00:00:05
2     1 days 00:00:00
dtype: timedelta64[ns]

您还可以对Timedeltas进行否定,乘法和使用abs

In [57]: td1 = Timedelta('-1 days 2 hours 3 seconds')

In [58]: td1
Out[58]: Timedelta('-2 days +21:59:57')

In [59]: -1 * td1
Out[59]: Timedelta('1 days 02:00:03')

In [60]: - td1
Out[60]: Timedelta('1 days 02:00:03')

In [61]: abs(td1)
Out[61]: Timedelta('1 days 02:00:03')

Reductions

timedelta64[ns]的数值缩减操作将返回Timedelta对象。像评估时一样,跳过NaT

In [62]: y2 = Series(to_timedelta(['-1 days +00:00:05', 'nat', '-1 days +00:00:05', '1 days']))

In [63]: y2
Out[63]: 
0   -1 days +00:00:05
1                 NaT
2   -1 days +00:00:05
3     1 days 00:00:00
dtype: timedelta64[ns]

In [64]: y2.mean()
Out[64]: Timedelta('-1 days +16:00:03.333333')

In [65]: y2.median()
Out[65]: Timedelta('-1 days +00:00:05')

In [66]: y2.quantile(.1)
Out[66]: Timedelta('-1 days +00:00:05')

In [67]: y2.sum()
Out[67]: Timedelta('-1 days +00:00:10')

Frequency Conversion

版本0.13中的新功能。

Timedelta系列,TimedeltaIndexTimedelta标量可以通过除以另一个timedelta或由特定timedelta类型的astyping转换为其他“频率”。这些操作产生系列并传播NaT - > nan注意,numpy标量除以真正除法,而astyping等效于floor除法。

In [68]: td = Series(date_range('20130101', periods=4)) - \
   ....:      Series(date_range('20121201', periods=4))
   ....: 

In [69]: td[2] += timedelta(minutes=5, seconds=3)

In [70]: td[3] = np.nan

In [71]: td
Out[71]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[ns]

# to days
In [72]: td / np.timedelta64(1, 'D')
Out[72]: 
0    31.000000
1    31.000000
2    31.003507
3          NaN
dtype: float64

In [73]: td.astype('timedelta64[D]')
Out[73]: 
0    31.0
1    31.0
2    31.0
3     NaN
dtype: float64

# to seconds
In [74]: td / np.timedelta64(1, 's')
Out[74]: 
0    2678400.0
1    2678400.0
2    2678703.0
3          NaN
dtype: float64

In [75]: td.astype('timedelta64[s]')
Out[75]: 
0    2678400.0
1    2678400.0
2    2678703.0
3          NaN
dtype: float64

# to months (these are constant months)
In [76]: td / np.timedelta64(1, 'M')
Out[76]: 
0    1.018501
1    1.018501
2    1.018617
3         NaN
dtype: float64

timedelta64[ns]系列除以整数或整数系列产生另一个timedelta64[ns] dtypes系列。

In [77]: td * -1
Out[77]: 
0   -31 days +00:00:00
1   -31 days +00:00:00
2   -32 days +23:54:57
3                  NaT
dtype: timedelta64[ns]

In [78]: td * Series([1, 2, 3, 4])
Out[78]: 
0   31 days 00:00:00
1   62 days 00:00:00
2   93 days 00:15:09
3                NaT
dtype: timedelta64[ns]

Attributes

您可以使用属性days,seconds,microseconds,nanoseconds直接访问TimedeltaTimedeltaIndex的各种组件。这些与datetime.timedelta返回的值相同,例如,.seconds属性表示秒数> = 0,这些是根据Timedelta是否有符号来签名的。

这些操作也可以通过Series.dt属性直接访问。

注意

请注意,属性不是Timedelta的显示值。使用.components可检索显示的值。

对于Series

In [79]: td.dt.days
Out[79]: 
0    31.0
1    31.0
2    31.0
3     NaN
dtype: float64

In [80]: td.dt.seconds
Out[80]: 
0      0.0
1      0.0
2    303.0
3      NaN
dtype: float64

您可以直接访问标量Timedelta的字段值。

In [81]: tds = Timedelta('31 days 5 min 3 sec')

In [82]: tds.days
Out[82]: 31

In [83]: tds.seconds
Out[83]: 303

In [84]: (-tds).seconds
Out[84]: 86097

您可以使用.components属性访问缩小形式的timedelta。这会返回类似于SeriesDataFrame索引。这些是Timedelta显示的值。

In [85]: td.dt.components
Out[85]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0  31.0    0.0      0.0      0.0           0.0           0.0          0.0
1  31.0    0.0      0.0      0.0           0.0           0.0          0.0
2  31.0    0.0      5.0      3.0           0.0           0.0          0.0
3   NaN    NaN      NaN      NaN           NaN           NaN          NaN

In [86]: td.dt.components.seconds
Out[86]: 
0    0.0
1    0.0
2    3.0
3    NaN
Name: seconds, dtype: float64

TimedeltaIndex

版本0.15.0中的新功能。

要生成具有时间delta的索引,可以使用TimedeltaIndextimedelta_range构造函数。

使用TimedeltaIndex,您可以传递类似字符串,Timedeltatimedeltanp.timedelta64对象。传递np.nan/pd.NaT/nat将表示缺少的值。

In [87]: TimedeltaIndex(['1 days', '1 days, 00:00:05',
   ....:                 np.timedelta64(2,'D'), timedelta(days=2,seconds=2)])
   ....: 
Out[87]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
                '2 days 00:00:02'],
               dtype='timedelta64[ns]', freq=None)

date_range类似,您可以构建TimedeltaIndex的正常范围:

In [88]: timedelta_range(start='1 days', periods=5, freq='D')
Out[88]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')

In [89]: timedelta_range(start='1 days', end='2 days', freq='30T')
Out[89]: 
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
                '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
                '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
                '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
                '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
                '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
                '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
                '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
                '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
                '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
                '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
                '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
                '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
                '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
                '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
                '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
                '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='30T')

Using the TimedeltaIndex

类似于其他类datetime索引,DatetimeIndexPeriodIndex,您可以使用TimedeltaIndex作为pandas对象的索引。

In [90]: s = Series(np.arange(100),
   ....:            index=timedelta_range('1 days', periods=100, freq='h'))
   ....: 

In [91]: s
Out[91]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
1 days 05:00:00     5
1 days 06:00:00     6
                   ..
4 days 21:00:00    93
4 days 22:00:00    94
4 days 23:00:00    95
5 days 00:00:00    96
5 days 01:00:00    97
5 days 02:00:00    98
5 days 03:00:00    99
Freq: H, dtype: int64

选择工作类似,强制在字符串喜欢和切片:

In [92]: s['1 day':'2 day']
Out[92]: 
1 days 00:00:00     0
1 days 01:00:00     1
1 days 02:00:00     2
1 days 03:00:00     3
1 days 04:00:00     4
1 days 05:00:00     5
1 days 06:00:00     6
                   ..
2 days 17:00:00    41
2 days 18:00:00    42
2 days 19:00:00    43
2 days 20:00:00    44
2 days 21:00:00    45
2 days 22:00:00    46
2 days 23:00:00    47
Freq: H, dtype: int64

In [93]: s['1 day 01:00:00']
Out[93]: 1

In [94]: s[Timedelta('1 day 1h')]
Out[94]: 1

此外,您可以使用部分字符串选择,范围将被推断:

In [95]: s['1 day':'1 day 5 hours']
Out[95]: 
1 days 00:00:00    0
1 days 01:00:00    1
1 days 02:00:00    2
1 days 03:00:00    3
1 days 04:00:00    4
1 days 05:00:00    5
Freq: H, dtype: int64

Operations

最后,TimedeltaIndexDatetimeIndex的组合允许某些NaT保留的组合操作:

In [96]: tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'])

In [97]: tdi.tolist()
Out[97]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]

In [98]: dti = date_range('20130101', periods=3)

In [99]: dti.tolist()
Out[99]: 
[Timestamp('2013-01-01 00:00:00', freq='D'),
 Timestamp('2013-01-02 00:00:00', freq='D'),
 Timestamp('2013-01-03 00:00:00', freq='D')]

In [100]: (dti + tdi).tolist()
Out[100]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]

In [101]: (dti - tdi).tolist()
Out[101]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]

Conversions

与上述Series上的频率转换类似,您可以转换这些索引以产生另一个索引。

In [102]: tdi / np.timedelta64(1,'s')
Out[102]: Float64Index([86400.0, nan, 172800.0], dtype='float64')

In [103]: tdi.astype('timedelta64[s]')
Out[103]: Float64Index([86400.0, nan, 172800.0], dtype='float64')

标量类型ops也工作。这些可能会返回不同的类型的索引。

# adding or timedelta and date -> datelike
In [104]: tdi + Timestamp('20130101')
Out[104]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)

# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [105]: (Timestamp('20130101') - tdi).tolist()
Out[105]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]

# timedelta + timedelta -> timedelta
In [106]: tdi + Timedelta('10 days')
Out[106]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)

# division can result in a Timedelta if the divisor is an integer
In [107]: tdi / 2
Out[107]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)

# or a Float64Index if the divisor is a Timedelta
In [108]: tdi / tdi[0]
Out[108]: Float64Index([1.0, nan, 2.0], dtype='float64')

Resampling

timeseries resampling类似,我们可以使用TimedeltaIndex重新采样。

In [109]: s.resample('D').mean()
Out[109]: 
1 days    11.5
2 days    35.5
3 days    59.5
4 days    83.5
5 days    97.5
Freq: D, dtype: float64