PostgreSQL age() function: different/unexpected results when landing in dfferent month

问题: Today, I've encountered an unexplainable result in PostgreSQL 9.6 while running this query: SELECT age('2018-06-30','2018-05-19') AS one, age('2018-07-01','2018-05...

问题:

Today, I've encountered an unexplainable result in PostgreSQL 9.6 while running this query:

SELECT age('2018-06-30','2018-05-19') AS one,
       age('2018-07-01','2018-05-20') AS two; 

Expected results for both columns: 1 mon 11 days. However, only for the interval from 2018-05-19 to 2018-06-30, I get what I expect, while for 2018-05-20 till 2018-07-01 I'll get one day more: 1 mon 12 days

I don't get why this is the case and in my understanding, between 2018-05-20 2018-07-01 is just an interval of 1 mon 11 days and the Postgres result here is wrong.

I cannot find any in-depth information on how exactly the PostgreSQL-age(timestamp,timestamp) function works. However, I assumed that function does something like: Go from the start date in month steps forward till you reach the end month. From there, go to the day of the end date. Sum up months and days.

So, in my understanding, this is what should go on under the hood in my case (sorry, for being so verbose here, but I feel it's necessary):

Start at 2018-05-19. Go one month forward. Land at 2018-06-19. Walk N days forward till you've reached 2018-06-30:

1 day: 20
2 days: 21
3 days: 22
4 days: 23
5 days: 24
6 days: 25
7 days: 26
8 days: 27
9 days: 28
10 days: 29
11 days: 30

= 1 month 11 days.

For the time between 2018-05-20 and 2018-07-01 it should be almost the same:

Start at 2018-05-20. Go one month forward. Land at 2018-06-20. Walk N days forward till you've reached 2018-07-01:

1 day: 21
2 days: 22
3 days: 23
4 days: 24
5 days: 25
6 days: 26
7 days: 27
8 days: 28
9 days: 29
10 days: 30
11 days: 1

= 1 month 11 days.

Is this my mistake or one of PostgreSQL? Are there alternative functions/algorithms which work the way I described/expect?


回答1:

age is calculated by the timestamptz_age function in src/backend/utils/adt/timestamp.c. The comment says:

/* timestamptz_age()
 * Calculate time difference while retaining year/month fields.
 * Note that this does not result in an accurate absolute time span
 *  since year and month are out of context once the arithmetic
 *  is done.
 */

The code first converts the arguments to struct pg_tm variables tm1 and tm2 (struct pg_tm is similar to the C library's struct tm, but has additional time zone fields) and then calculates the difference tm per field.

In the case of age('2018-07-01','2018-05-20'), the relevant fields of that difference would look like this:

tm_mday = -19
tm_mon  =   2
tm_year =   0

Now negative fields are adjusted. for tm_mday, the code looks like this:

while (tm->tm_mday < 0)
{
    if (dt1 < dt2)
    {
        tm->tm_mday += day_tab[isleap(tm1->tm_year)][tm1->tm_mon - 1];
        tm->tm_mon--;
    }
    else
    {
        tm->tm_mday += day_tab[isleap(tm2->tm_year)][tm2->tm_mon - 1];
        tm->tm_mon--;
    }
}

Since dt1 > dt2, the else branch is taken, and the code adds the number of days in May (31) and reduces the month by 1, ending up with

tm_mday = 12
tm_mon  =  1
tm_year =  0

That is the result you get.

Now at first glance it seems that tm2->tm_mon isn't the right month to choose, and it would have been better to take the previous month of the left argument:

day_tab[isleap(tm1->tm_year)][(tm1->tm_mon + 10) % 12]

But I cannot say if that choice would be better in all cases, and in any event the comment indemnifies the function, so I'd hesitate to call it a bug.

You might want to take it up with the hackers mailing list.


回答2:

The above unexpected behaviour is not because of age() . But because of interval data type which will allows calculations. Below link contains the necessary explanation.

Odd month arithmetic

In your first one since two times are successive you don't see unexpected. But it second it is not. This tends to above odd month arithmetic behaviour


回答3:

For those interested: I think I've found a workaround for the problem, using a function which gives me the desired result. It works according to my own tests, even for leap years, but unfortunately, I cannot guarantee that it will always work. It also seems a little bit hacky.

CREATE OR REPLACE FUNCTION age_forward ("endDate" date,"startDate" date) 
     RETURNS interval AS $$

     /*

     Basic approach: actually do a culculation like this:
     SELECT age('2018-07-01','2018-06-01') + ((30 - 20) + 1||' days')::interval;

     So, basically:
     (1) truncate start and end to month level, so always FIRST of month
     (2) add one month to the start month
     (3) calculate the days
     (4) add the days as string and build the interval

     The crucial part is 3: calculate the days

     We do it like this:
     - get the number of days for the month in question. The month in question is the month BEFORE the end month. For our example it is JUNE
     - subtract the start date day number from the number of days (here 20)
     - add the end date day number (here 1)

     */

     SELECT CASE 

        /* First step: Check if the startDate day number is lower or equal the endDate day number.
           If this is the case: Do vanilla age(). Works perfectly here
        */

        WHEN (date_part('day', "startDate" )::integer) <= date_part('day', "endDate" )::integer

        THEN age("endDate","startDate")

        /* Special case to treat here: startDate day number is greater than endDate day number. Do the algorithm described above */

        ELSE  age
               (

                  date_trunc('month', "endDate"::date), /* Go just till month level, always using '1' as day */

                  date_trunc('month', "startDate"::date)
                  + '1 mons'::interval
                  /* Add one month so that interval to look for will become actually shorter for now. */
                ) 
              + 
                (
                   (

                     /* Calculate the last day of the month previous to the end month. See https://stackoverflow.com/questions/28186014/how-to-get-the-last-day-of-month-in-postgres  */
                     (date_part('day',(date_trunc('month', (date_trunc('month', "endDate"::date) - '1 mons'::interval)  ) + interval '1 month' - interval '1 day')::date))::integer

                     - 

                     /* endDate day number subtracted */
                     date_part('day', "startDate" )::integer
                   )

                   /* endDate day number added */
                   + date_part('day', "endDate" )::integer||' days'

                )::interval
        END

$$ LANGUAGE sql;
  • 发表于 2018-07-08 10:03
  • 阅读 ( 324 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除