原文:http://pandas.pydata.org/pandas-docs/stable/timedeltas.html
校对:(虚位以待)
注意
从v0.15.0开始,我们引入了一个新的标量类型Timedelta
,它是datetime.timedelta
的子类,并且行为类似,但允许与np.timedelta64
类型以及主机的自定义表示,解析和属性。
Timedeltas是时间差,以差单位表示,例如。天,小时,分钟,秒。他们可以是积极的和消极的。
您可以通过各种参数构造Timedelta
标量:
# strings
In [1]: Timedelta('1 days')
Out[1]: Timedelta('1 days 00:00:00')
In [2]: Timedelta('1 days 00:00:00')
Out[2]: Timedelta('1 days 00:00:00')
In [3]: Timedelta('1 days 2 hours')
Out[3]: Timedelta('1 days 02:00:00')
In [4]: Timedelta('-1 days 2 min 3us')
Out[4]: Timedelta('-2 days +23:57:59.999997')
# like datetime.timedelta
# note: these MUST be specified as keyword arguments
In [5]: Timedelta(days=1, seconds=1)
Out[5]: Timedelta('1 days 00:00:01')
# integers with a unit
In [6]: Timedelta(1, unit='d')
Out[6]: Timedelta('1 days 00:00:00')
# from a timedelta/np.timedelta64
In [7]: Timedelta(timedelta(days=1, seconds=1))
Out[7]: Timedelta('1 days 00:00:01')
In [8]: Timedelta(np.timedelta64(1, 'ms'))
Out[8]: Timedelta('0 days 00:00:00.001000')
# negative Timedeltas have this string repr
# to be more consistent with datetime.timedelta conventions
In [9]: Timedelta('-1us')
Out[9]: Timedelta('-1 days +23:59:59.999999')
# a NaT
In [10]: Timedelta('nan')
Out[10]: NaT
In [11]: Timedelta('nat')
Out[11]: NaT
DateOffsets(日, 小时, 分钟, 第二, Milli, Micro, Nano
)也可用于建筑。
In [12]: Timedelta(Second(2))
Out[12]: Timedelta('0 days 00:00:02')
此外,标量中的操作产生另一个标量Timedelta
。
In [13]: Timedelta(Day(2)) + Timedelta(Second(2)) + Timedelta('00:00:00.000123')
Out[13]: Timedelta('2 days 00:00:02.000123')
警告
在0.15.0 pd.to_timedelta
之前,将为标量输入返回Series
,对于标量输入返回np.timedelta64
。现在,对于类列输入,将返回TimedeltaIndex
,对于系列输入,返回Series
,对于标量输入,返回Timedelta
。
pd.to_timedelta
的参数现在为(arg, unit ='ns', box = True) t5>
,之前是(arg, box = True, unit ='ns')因为这些更合乎逻辑。
使用顶层pd.to_timedelta
,您可以将标量,数组,列表或系列从已识别的timedelta格式/值转换为Timedelta
类型。如果输入是一个系列,它将构造系列,如果输入是类标量,则构造系列,否则输出TimedeltaIndex
。
您可以将单个字符串解析为Timedelta:
In [14]: to_timedelta('1 days 06:05:01.00003')
Out[14]: Timedelta('1 days 06:05:01.000030')
In [15]: to_timedelta('15.5us')
Out[15]: Timedelta('0 days 00:00:00.000015')
或字符串的列表/数组:
In [16]: to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
Out[16]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None)
unit
关键字参数指定Timedelta的单位:
In [17]: to_timedelta(np.arange(5), unit='s')
Out[17]: TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None)
In [18]: to_timedelta(np.arange(5), unit='d')
Out[18]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
Pandas表示使用64位整数的纳秒分辨率的Timedeltas
。因此,64位整数限制确定Timedelta
限制。
In [19]: pd.Timedelta.min
Out[19]: Timedelta('-106752 days +00:12:43.145224')
In [20]: pd.Timedelta.max
Out[20]: Timedelta('106751 days 23:47:16.854775')
您可以操作系列/数据帧,并通过在datetime64[ns]
系列或Timestamps
上的减法运算构造timedelta64[ns]
In [21]: s = Series(date_range('2012-1-1', periods=3, freq='D'))
In [22]: td = Series([ Timedelta(days=i) for i in range(3) ])
In [23]: df = DataFrame(dict(A = s, B = td))
In [24]: df
Out[24]:
A B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days
In [25]: df['C'] = df['A'] + df['B']
In [26]: df
Out[26]:
A B C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05
In [27]: df.dtypes
Out[27]:
A datetime64[ns]
B timedelta64[ns]
C datetime64[ns]
dtype: object
In [28]: s - s.max()
Out[28]:
0 -2 days
1 -1 days
2 0 days
dtype: timedelta64[ns]
In [29]: s - datetime(2011, 1, 1, 3, 5)
Out[29]:
0 364 days 20:55:00
1 365 days 20:55:00
2 366 days 20:55:00
dtype: timedelta64[ns]
In [30]: s + timedelta(minutes=5)
Out[30]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [31]: s + Minute(5)
Out[31]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
In [32]: s + Minute(5) + Milli(5)
Out[32]:
0 2012-01-01 00:05:00.005
1 2012-01-02 00:05:00.005
2 2012-01-03 00:05:00.005
dtype: datetime64[ns]
使用timedelta64[ns]
系列的标量操作:
In [33]: y = s - s[0]
In [34]: y
Out[34]:
0 0 days
1 1 days
2 2 days
dtype: timedelta64[ns]
支持具有NaT
值的timedeltas系列:
In [35]: y = s - s.shift()
In [36]: y
Out[36]:
0 NaT
1 1 days
2 1 days
dtype: timedelta64[ns]
可以使用np.nan
将元素设置为NaT
:类似于数据时间:
In [37]: y[1] = np.nan
In [38]: y
Out[38]:
0 NaT
1 NaT
2 1 days
dtype: timedelta64[ns]
操作数也可以以相反的顺序出现(使用Series操作的单个对象):
In [39]: s.max() - s
Out[39]:
0 2 days
1 1 days
2 0 days
dtype: timedelta64[ns]
In [40]: datetime(2011, 1, 1, 3, 5) - s
Out[40]:
0 -365 days +03:05:00
1 -366 days +03:05:00
2 -367 days +03:05:00
dtype: timedelta64[ns]
In [41]: timedelta(minutes=5) + s
Out[41]:
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
min, max
和对应的idxmin, idxmax t3 >在帧上支持操作:
In [42]: A = s - Timestamp('20120101') - Timedelta('00:05:05')
In [43]: B = s - Series(date_range('2012-1-2', periods=3, freq='D'))
In [44]: df = DataFrame(dict(A=A, B=B))
In [45]: df
Out[45]:
A B
0 -1 days +23:54:55 -1 days
1 0 days 23:54:55 -1 days
2 1 days 23:54:55 -1 days
In [46]: df.min()
Out[46]:
A -1 days +23:54:55
B -1 days +00:00:00
dtype: timedelta64[ns]
In [47]: df.min(axis=1)
Out[47]:
0 -1 days
1 -1 days
2 -1 days
dtype: timedelta64[ns]
In [48]: df.idxmin()
Out[48]:
A 0
B 0
dtype: int64
In [49]: df.idxmax()
Out[49]:
A 2
B 0
dtype: int64
在系列上也支持min, max, idxmin, idxmax
标量结果将是Timedelta
。
In [50]: df.min().max()
Out[50]: Timedelta('-1 days +23:54:55')
In [51]: df.min(axis=1).min()
Out[51]: Timedelta('-1 days +00:00:00')
In [52]: df.min().idxmax()
Out[52]: 'A'
In [53]: df.min(axis=1).idxmin()
Out[53]: 0
你可以填写timedeltas。整数将被解释为秒。你可以传递timedelta来获得一个特定的值。
In [54]: y.fillna(0)
Out[54]:
0 0 days
1 0 days
2 1 days
dtype: timedelta64[ns]
In [55]: y.fillna(10)
Out[55]:
0 0 days 00:00:10
1 0 days 00:00:10
2 1 days 00:00:00
dtype: timedelta64[ns]
In [56]: y.fillna(Timedelta('-1 days, 00:00:05'))
Out[56]:
0 -1 days +00:00:05
1 -1 days +00:00:05
2 1 days 00:00:00
dtype: timedelta64[ns]
您还可以对Timedeltas
进行否定,乘法和使用abs
:
In [57]: td1 = Timedelta('-1 days 2 hours 3 seconds')
In [58]: td1
Out[58]: Timedelta('-2 days +21:59:57')
In [59]: -1 * td1
Out[59]: Timedelta('1 days 02:00:03')
In [60]: - td1
Out[60]: Timedelta('1 days 02:00:03')
In [61]: abs(td1)
Out[61]: Timedelta('1 days 02:00:03')
对timedelta64[ns]
的数值缩减操作将返回Timedelta
对象。像评估时一样,跳过NaT
。
In [62]: y2 = Series(to_timedelta(['-1 days +00:00:05', 'nat', '-1 days +00:00:05', '1 days']))
In [63]: y2
Out[63]:
0 -1 days +00:00:05
1 NaT
2 -1 days +00:00:05
3 1 days 00:00:00
dtype: timedelta64[ns]
In [64]: y2.mean()
Out[64]: Timedelta('-1 days +16:00:03.333333')
In [65]: y2.median()
Out[65]: Timedelta('-1 days +00:00:05')
In [66]: y2.quantile(.1)
Out[66]: Timedelta('-1 days +00:00:05')
In [67]: y2.sum()
Out[67]: Timedelta('-1 days +00:00:10')
版本0.13中的新功能。
Timedelta系列,TimedeltaIndex
和Timedelta
标量可以通过除以另一个timedelta或由特定timedelta类型的astyping转换为其他“频率”。这些操作产生系列并传播NaT
- > nan
。注意,numpy标量除以真正除法,而astyping等效于floor除法。
In [68]: td = Series(date_range('20130101', periods=4)) - \
....: Series(date_range('20121201', periods=4))
....:
In [69]: td[2] += timedelta(minutes=5, seconds=3)
In [70]: td[3] = np.nan
In [71]: td
Out[71]:
0 31 days 00:00:00
1 31 days 00:00:00
2 31 days 00:05:03
3 NaT
dtype: timedelta64[ns]
# to days
In [72]: td / np.timedelta64(1, 'D')
Out[72]:
0 31.000000
1 31.000000
2 31.003507
3 NaN
dtype: float64
In [73]: td.astype('timedelta64[D]')
Out[73]:
0 31.0
1 31.0
2 31.0
3 NaN
dtype: float64
# to seconds
In [74]: td / np.timedelta64(1, 's')
Out[74]:
0 2678400.0
1 2678400.0
2 2678703.0
3 NaN
dtype: float64
In [75]: td.astype('timedelta64[s]')
Out[75]:
0 2678400.0
1 2678400.0
2 2678703.0
3 NaN
dtype: float64
# to months (these are constant months)
In [76]: td / np.timedelta64(1, 'M')
Out[76]:
0 1.018501
1 1.018501
2 1.018617
3 NaN
dtype: float64
将timedelta64[ns]
系列除以整数或整数系列产生另一个timedelta64[ns]
dtypes系列。
In [77]: td * -1
Out[77]:
0 -31 days +00:00:00
1 -31 days +00:00:00
2 -32 days +23:54:57
3 NaT
dtype: timedelta64[ns]
In [78]: td * Series([1, 2, 3, 4])
Out[78]:
0 31 days 00:00:00
1 62 days 00:00:00
2 93 days 00:15:09
3 NaT
dtype: timedelta64[ns]
您可以使用属性days,seconds,microseconds,nanoseconds
直接访问Timedelta
或TimedeltaIndex
的各种组件。这些与datetime.timedelta
返回的值相同,例如,.seconds
属性表示秒数> = 0,这些是根据Timedelta
是否有符号来签名的。
这些操作也可以通过Series
的.dt
属性直接访问。
注意
请注意,属性不是Timedelta
的显示值。使用.components
可检索显示的值。
对于Series
:
In [79]: td.dt.days
Out[79]:
0 31.0
1 31.0
2 31.0
3 NaN
dtype: float64
In [80]: td.dt.seconds
Out[80]:
0 0.0
1 0.0
2 303.0
3 NaN
dtype: float64
您可以直接访问标量Timedelta
的字段值。
In [81]: tds = Timedelta('31 days 5 min 3 sec')
In [82]: tds.days
Out[82]: 31
In [83]: tds.seconds
Out[83]: 303
In [84]: (-tds).seconds
Out[84]: 86097
您可以使用.components
属性访问缩小形式的timedelta。这会返回类似于Series
的DataFrame
索引。这些是Timedelta
的显示的值。
In [85]: td.dt.components
Out[85]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 31.0 0.0 0.0 0.0 0.0 0.0 0.0
1 31.0 0.0 0.0 0.0 0.0 0.0 0.0
2 31.0 0.0 5.0 3.0 0.0 0.0 0.0
3 NaN NaN NaN NaN NaN NaN NaN
In [86]: td.dt.components.seconds
Out[86]:
0 0.0
1 0.0
2 3.0
3 NaN
Name: seconds, dtype: float64
版本0.15.0中的新功能。
要生成具有时间delta的索引,可以使用TimedeltaIndex
或timedelta_range
构造函数。
使用TimedeltaIndex
,您可以传递类似字符串,Timedelta
,timedelta
或np.timedelta64
对象。传递np.nan/pd.NaT/nat
将表示缺少的值。
In [87]: TimedeltaIndex(['1 days', '1 days, 00:00:05',
....: np.timedelta64(2,'D'), timedelta(days=2,seconds=2)])
....:
Out[87]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00',
'2 days 00:00:02'],
dtype='timedelta64[ns]', freq=None)
与date_range
类似,您可以构建TimedeltaIndex
的正常范围:
In [88]: timedelta_range(start='1 days', periods=5, freq='D')
Out[88]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D')
In [89]: timedelta_range(start='1 days', end='2 days', freq='30T')
Out[89]:
TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00',
'1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00',
'1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00',
'1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00',
'1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00',
'1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00',
'1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00',
'1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00',
'1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00',
'1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00',
'1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00',
'1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00',
'1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00',
'1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00',
'1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00',
'1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00',
'2 days 00:00:00'],
dtype='timedelta64[ns]', freq='30T')
类似于其他类datetime索引,DatetimeIndex
和PeriodIndex
,您可以使用TimedeltaIndex
作为pandas对象的索引。
In [90]: s = Series(np.arange(100),
....: index=timedelta_range('1 days', periods=100, freq='h'))
....:
In [91]: s
Out[91]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
1 days 06:00:00 6
..
4 days 21:00:00 93
4 days 22:00:00 94
4 days 23:00:00 95
5 days 00:00:00 96
5 days 01:00:00 97
5 days 02:00:00 98
5 days 03:00:00 99
Freq: H, dtype: int64
选择工作类似,强制在字符串喜欢和切片:
In [92]: s['1 day':'2 day']
Out[92]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
1 days 06:00:00 6
..
2 days 17:00:00 41
2 days 18:00:00 42
2 days 19:00:00 43
2 days 20:00:00 44
2 days 21:00:00 45
2 days 22:00:00 46
2 days 23:00:00 47
Freq: H, dtype: int64
In [93]: s['1 day 01:00:00']
Out[93]: 1
In [94]: s[Timedelta('1 day 1h')]
Out[94]: 1
此外,您可以使用部分字符串选择,范围将被推断:
In [95]: s['1 day':'1 day 5 hours']
Out[95]:
1 days 00:00:00 0
1 days 01:00:00 1
1 days 02:00:00 2
1 days 03:00:00 3
1 days 04:00:00 4
1 days 05:00:00 5
Freq: H, dtype: int64
最后,TimedeltaIndex
与DatetimeIndex
的组合允许某些NaT保留的组合操作:
In [96]: tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'])
In [97]: tdi.tolist()
Out[97]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')]
In [98]: dti = date_range('20130101', periods=3)
In [99]: dti.tolist()
Out[99]:
[Timestamp('2013-01-01 00:00:00', freq='D'),
Timestamp('2013-01-02 00:00:00', freq='D'),
Timestamp('2013-01-03 00:00:00', freq='D')]
In [100]: (dti + tdi).tolist()
Out[100]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')]
In [101]: (dti - tdi).tolist()
Out[101]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')]
与上述Series
上的频率转换类似,您可以转换这些索引以产生另一个索引。
In [102]: tdi / np.timedelta64(1,'s')
Out[102]: Float64Index([86400.0, nan, 172800.0], dtype='float64')
In [103]: tdi.astype('timedelta64[s]')
Out[103]: Float64Index([86400.0, nan, 172800.0], dtype='float64')
标量类型ops也工作。这些可能会返回不同的类型的索引。
# adding or timedelta and date -> datelike
In [104]: tdi + Timestamp('20130101')
Out[104]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None)
# subtraction of a date and a timedelta -> datelike
# note that trying to subtract a date from a Timedelta will raise an exception
In [105]: (Timestamp('20130101') - tdi).tolist()
Out[105]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')]
# timedelta + timedelta -> timedelta
In [106]: tdi + Timedelta('10 days')
Out[106]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None)
# division can result in a Timedelta if the divisor is an integer
In [107]: tdi / 2
Out[107]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None)
# or a Float64Index if the divisor is a Timedelta
In [108]: tdi / tdi[0]
Out[108]: Float64Index([1.0, nan, 2.0], dtype='float64')
与timeseries resampling类似,我们可以使用TimedeltaIndex
重新采样。
In [109]: s.resample('D').mean()
Out[109]:
1 days 11.5
2 days 35.5
3 days 59.5
4 days 83.5
5 days 97.5
Freq: D, dtype: float64