PER-* and INST-* period types¶
HEC DSS uses a different convention than most date time libraries in use. The one that is difficult to deal with is the the end of interval labeling
For example, the monthly data in DSS is stored at the end of the month. Lets take a look
The monthly value for January 1990 is stored as the datetime of 31JAN1990 2400 which first off all cannot be parsed as such by python libraries. If you take the time as stored (long) and convert it to datetime in python it will be displayed as 01FEB1990 0000.
The last time stamp that is considered to belong to the month of January is 31JAN1990 2359.9999… to nano-second precision. However the timestamp stored in DSS files is actually the stroke of midnight of the end of January and python datetime libraries mark that as a February date.
[1]:
import pandas as pd
[2]:
pd.to_datetime('31JAN1990 2400')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
648 try:
--> 649 ret = self._build_naive(res, default)
650 except ValueError as e:
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
1234
-> 1235 naive = default.replace(**repl)
1236
ValueError: hour must be in 0..23
The above exception was the direct cause of the following exception:
ParserError Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
1367 else:
-> 1368 return DEFAULTPARSER.parse(timestr, **kwargs)
1369
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
650 except ValueError as e:
--> 651 six.raise_from(ParserError(str(e) + ": %s", timestr), e)
652
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\six.py in raise_from(value, from_value)
ParserError: hour must be in 0..23: 31JAN1990 2400
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
TypeError: invalid string coercion to datetime
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
648 try:
--> 649 ret = self._build_naive(res, default)
650 except ValueError as e:
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
1234
-> 1235 naive = default.replace(**repl)
1236
ValueError: hour must be in 0..23
The above exception was the direct cause of the following exception:
ParserError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9596/3106564415.py in <module>
----> 1 pd.to_datetime('31JAN1990 2400')
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
912 result = convert_listlike(arg, format)
913 else:
--> 914 result = convert_listlike(np.array([arg]), format)[0]
915
916 # error: Incompatible return value type (got "Union[Timestamp, NaTType,
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
406 errors=errors,
407 require_iso8601=require_iso8601,
--> 408 allow_object=True,
409 )
410
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object, allow_mixed)
2191 return values.view("i8"), tz_parsed
2192 except (ValueError, TypeError):
-> 2193 raise err
2194
2195 if tz_parsed is not None:
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object, allow_mixed)
2180 yearfirst=yearfirst,
2181 require_iso8601=require_iso8601,
-> 2182 allow_mixed=allow_mixed,
2183 )
2184 result = result.reshape(data.shape, order=order)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib._array_to_datetime_object()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib._array_to_datetime_object()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
1366 return parser(parserinfo).parse(timestr, **kwargs)
1367 else:
-> 1368 return DEFAULTPARSER.parse(timestr, **kwargs)
1369
1370
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
649 ret = self._build_naive(res, default)
650 except ValueError as e:
--> 651 six.raise_from(ParserError(str(e) + ": %s", timestr), e)
652
653 if not ignoretz:
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\six.py in raise_from(value, from_value)
ParserError: hour must be in 0..23: 31JAN1990 2400
[5]:
last_day = pd.to_datetime('31JAN1990')
last_day
[5]:
Timestamp('1990-01-31 00:00:00')
See this discussion here https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series on how to get end of month
[12]:
from pandas.tseries.offsets import MonthEnd
last_day + MonthEnd(0)
[12]:
Timestamp('1990-01-31 00:00:00')
None of those will give 31JAN1990 2400 as the answer. If you force the issue, e.g. the last nano-second of the month you will get this
[15]:
last_day + pd.offsets.MonthBegin() - pd.offsets.Nano()
[15]:
Timestamp('1990-01-31 23:59:59.999999999')
[16]:
pd.to_datetime('01FEB1990 0000')
[16]:
Timestamp('1990-02-01 00:00:00')
[18]:
pr = pd.period_range(start='01JAN1990',periods=2,freq='M')
pr
[18]:
PeriodIndex(['1990-01', '1990-02'], dtype='period[M]')
So what is the answer to this dilemma?¶
The best that can be done IMHO is to use the pandas PeriodIndex which signifies a range of time. E.g. the monthly data is read in with a period of ‘M’ and shifted by 1’M’ interval to the left (-1) for data stored in DSS.
This is not ideal because it uses the period_type ‘PER-AVG’ as the reason to shift the data and convert it to periods.
Also math operations are not easily done with data indexed by PeriodIndex vs DatetimeIndex. These should be handled by users as it depends upon the usecase.
One example would be to resample the datetimeindex’ed data to monthly and then add it to period indexed data.
Another issue is that when resampling the user would want use different options so that the
These issues are pandas issues and can be dealt with in the python or pandas space.
[ ]: