PER-* and INST-* period types

HEC DSS uses a different convention than most date time libraries in use. The one that is difficult to deal with is the the end of interval labeling

For example, the monthly data in DSS is stored at the end of the month. Lets take a look

The monthly value for January 1990 is stored as the datetime of 31JAN1990 2400 which first off all cannot be parsed as such by python libraries. If you take the time as stored (long) and convert it to datetime in python it will be displayed as 01FEB1990 0000.

The last time stamp that is considered to belong to the month of January is 31JAN1990 2359.9999… to nano-second precision. However the timestamp stored in DSS files is actually the stroke of midnight of the end of January and python datetime libraries mark that as a February date.

[1]:
import pandas as pd
[2]:
pd.to_datetime('31JAN1990 2400')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    648         try:
--> 649             ret = self._build_naive(res, default)
    650         except ValueError as e:

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
   1234
-> 1235         naive = default.replace(**repl)
   1236

ValueError: hour must be in 0..23

The above exception was the direct cause of the following exception:

ParserError                               Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
   1367     else:
-> 1368         return DEFAULTPARSER.parse(timestr, **kwargs)
   1369

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    650         except ValueError as e:
--> 651             six.raise_from(ParserError(str(e) + ": %s", timestr), e)
    652

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\six.py in raise_from(value, from_value)

ParserError: hour must be in 0..23: 31JAN1990 2400

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

TypeError: invalid string coercion to datetime

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    648         try:
--> 649             ret = self._build_naive(res, default)
    650         except ValueError as e:

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
   1234
-> 1235         naive = default.replace(**repl)
   1236

ValueError: hour must be in 0..23

The above exception was the direct cause of the following exception:

ParserError                               Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9596/3106564415.py in <module>
----> 1 pd.to_datetime('31JAN1990 2400')

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    912             result = convert_listlike(arg, format)
    913     else:
--> 914         result = convert_listlike(np.array([arg]), format)[0]
    915
    916     #  error: Incompatible return value type (got "Union[Timestamp, NaTType,

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    406         errors=errors,
    407         require_iso8601=require_iso8601,
--> 408         allow_object=True,
    409     )
    410

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object, allow_mixed)
   2191             return values.view("i8"), tz_parsed
   2192         except (ValueError, TypeError):
-> 2193             raise err
   2194
   2195     if tz_parsed is not None:

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object, allow_mixed)
   2180             yearfirst=yearfirst,
   2181             require_iso8601=require_iso8601,
-> 2182             allow_mixed=allow_mixed,
   2183         )
   2184         result = result.reshape(data.shape, order=order)

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib._array_to_datetime_object()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib._array_to_datetime_object()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
   1366         return parser(parserinfo).parse(timestr, **kwargs)
   1367     else:
-> 1368         return DEFAULTPARSER.parse(timestr, **kwargs)
   1369
   1370

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    649             ret = self._build_naive(res, default)
    650         except ValueError as e:
--> 651             six.raise_from(ParserError(str(e) + ": %s", timestr), e)
    652
    653         if not ignoretz:

c:\Users\psandhu\Miniconda3\envs\dev_pyhecdss\lib\site-packages\six.py in raise_from(value, from_value)

ParserError: hour must be in 0..23: 31JAN1990 2400
[5]:
last_day = pd.to_datetime('31JAN1990')
last_day
[5]:
Timestamp('1990-01-31 00:00:00')

See this discussion here https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series on how to get end of month

[12]:
from pandas.tseries.offsets import MonthEnd
last_day + MonthEnd(0)
[12]:
Timestamp('1990-01-31 00:00:00')

None of those will give 31JAN1990 2400 as the answer. If you force the issue, e.g. the last nano-second of the month you will get this

[15]:
last_day + pd.offsets.MonthBegin() - pd.offsets.Nano()
[15]:
Timestamp('1990-01-31 23:59:59.999999999')
[16]:
pd.to_datetime('01FEB1990 0000')
[16]:
Timestamp('1990-02-01 00:00:00')
[18]:
pr = pd.period_range(start='01JAN1990',periods=2,freq='M')
pr
[18]:
PeriodIndex(['1990-01', '1990-02'], dtype='period[M]')

So what is the answer to this dilemma?

The best that can be done IMHO is to use the pandas PeriodIndex which signifies a range of time. E.g. the monthly data is read in with a period of ‘M’ and shifted by 1’M’ interval to the left (-1) for data stored in DSS.

This is not ideal because it uses the period_type ‘PER-AVG’ as the reason to shift the data and convert it to periods.

Also math operations are not easily done with data indexed by PeriodIndex vs DatetimeIndex. These should be handled by users as it depends upon the usecase.

One example would be to resample the datetimeindex’ed data to monthly and then add it to period indexed data.

Another issue is that when resampling the user would want use different options so that the

These issues are pandas issues and can be dealt with in the python or pandas space.

[ ]: