Properly using PERCENTILE_CONT - Oracle SQL

问题: I am trying to calculate the following: The average of the dataset The median of the dataset The top 20% of the dataset The bottom 20% of the dataset My...

问题:

I am trying to calculate the following:

  • The average of the dataset
  • The median of the dataset
  • The top 20% of the dataset
  • The bottom 20% of the dataset

My dataset looks like this:

| Part | Step | Step_Start | Part_Finish |   TheTime   |
|:----:|:----:|:----------:|:-----------:|:-----------:|
|   1  |  200 |  15-Aug-18 |  19-Jun-19  | 307.4926273 |
|   2  |  200 |  7-Jun-19  |  19-Jun-19  |  11.4434375 |
|   3  |  200 |  17-Sep-18 |   4-Feb-19  | 139.4360417 |
|   4  |  200 |  30-Jan-19 |   4-Feb-19  | 4.356666667 |
|   5  |  200 |  1-Oct-18  |  18-Feb-19  | 139.4528009 |
|   6  |  200 |  13-Feb-19 |  18-Feb-19  |   4.50375   |
|   7  |  200 |  17-Oct-18 |  28-Mar-19  | 161.7007176 |
|   8  |  200 |  12-Nov-18 |  28-Mar-19  |  135.630625 |
|   9  |  200 |  25-Oct-18 |  26-Feb-19  | 123.6026968 |
|  10  |  200 |  22-Feb-19 |  26-Feb-19  | 3.628090278 |
|  11  |  200 |  30-Oct-18 |   3-Jan-19  | 64.51466435 |
|  12  |  200 |  12-Dec-18 |   3-Jan-19  | 21.48703704 |
|  13  |  200 |  15-Nov-18 |  14-Jan-19  | 59.41373843 |
|  14  |  200 |  7-Jan-19  |  14-Jan-19  | 6.621828704 |
|  15  |  200 |  15-Nov-18 |  12-Jan-19  | 57.62283565 |
|  16  |  200 |  8-Jan-19  |  12-Jan-19  | 3.264398148 |
|  17  |  200 |  15-Nov-18 |   7-Mar-19  | 111.5082523 |
|  18  |  200 |  4-Mar-19  |   7-Mar-19  | 2.153587963 |
|  19  |  200 |  16-Nov-18 |  23-May-19  | 187.6931481 |
|  20  |  200 |  16-Nov-18 |   3-Jan-19  | 47.47916667 |
|  21  |  200 |  17-Dec-18 |   3-Jan-19  | 16.62722222 |
|  22  |  200 |  20-Nov-18 |  14-Feb-19  |  85.6115625 |
|  23  |  200 |  9-Feb-19  |  14-Feb-19  | 4.520787037 |
|  24  |  200 |  19-Nov-18 |  14-Jan-19  | 55.53342593 |
|  25  |  200 |  9-Jan-19  |  14-Jan-19  | 4.721400463 |
|  26  |  200 |  26-Nov-18 |   9-Jan-19  | 43.50748843 |
|  27  |  200 |  4-Jan-19  |   9-Jan-19  | 4.417164352 |
|  28  |  200 |  26-Nov-18 |  21-Jan-19  | 55.59988426 |
|  29  |  200 |  13-Jan-19 |  21-Jan-19  |    7.535    |
|  30  |  200 |  16-Jan-19 |  21-Jan-19  | 4.618796296 |
|  31  |  200 |  26-Nov-18 |  11-Jan-19  | 45.42148148 |
|  32  |  200 |  4-Jan-19  |  11-Jan-19  | 6.316921296 |
|  33  |  200 |  4-Dec-18  |  24-Jan-19  |  50.3669213 |
|  34  |  200 |  18-Jan-19 |  24-Jan-19  | 5.589467593 |
|  35  |  200 |  4-Dec-18  |  31-Jan-19  | 57.26877315 |
|  36  |  200 |  22-Jan-19 |  31-Jan-19  | 8.240034722 |
|  37  |  200 |  5-Dec-18  |  28-Jun-19  | 204.5283912 |
|  38  |  200 |  26-Jun-19 |  28-Jun-19  | 1.508252315 |
|  39  |  200 |  9-Feb-19  |  19-Feb-19  | 9.532893519 |
|  40  |  200 |  7-Dec-18  |  14-Feb-19  | 68.51900463 |
|  41  |  200 |  5-Feb-19  |  14-Feb-19  | 8.641076389 |
|  42  |  200 |  11-Dec-18 |  25-Jan-19  | 44.50501157 |
|  43  |  200 |  22-Jan-19 |  25-Jan-19  | 2.511435185 |
|  44  |  200 |  13-Dec-18 |  17-Jan-19  | 34.43806713 |
|  45  |  200 |  14-Jan-19 |  17-Jan-19  | 2.210972222 |
|  46  |  200 |  13-Dec-18 |  24-Jan-19  | 41.38921296 |
|  47  |  200 |  17-Jan-19 |  24-Jan-19  | 6.444664352 |
|  48  |  200 |  10-Jan-19 |   7-Feb-19  | 27.43130787 |
|  49  |  200 |  1-Feb-19  |   7-Feb-19  | 5.349189815 |
|  50  |  200 |  18-Dec-18 |   4-Feb-19  | 47.50416667 |
|  51  |  200 |  29-Jan-19 |   4-Feb-19  | 5.481979167 |
|  52  |  200 |  3-Jan-19  |  30-Jan-19  | 26.46112269 |
|  53  |  200 |  23-Jan-19 |  30-Jan-19  | 6.712175926 |
|  54  |  200 |  4-Jan-19  |   5-Feb-19  | 31.49590278 |
|  55  |  200 |  30-Jan-19 |   5-Feb-19  | 5.385798611 |
|  56  |  200 |  23-Jan-19 |  20-Mar-19  |  55.296875  |
|  57  |  200 |  21-Feb-19 |  20-Mar-19  | 26.06854167 |
|  58  |  200 |  22-Jan-19 |  14-Mar-19  | 50.57989583 |
|  59  |  200 |  8-Mar-19  |  14-Mar-19  | 5.147303241 |
|  60  |  200 |  22-Jan-19 |  21-Feb-19  | 29.46405093 |
|  61  |  200 |  14-Feb-19 |  21-Feb-19  | 6.701724537 |
|  62  |  200 |  24-Jan-19 |  23-Apr-19  | 88.50689815 |
|  63  |  200 |  17-Apr-19 |  23-Apr-19  | 5.725405093 |
|  64  |  200 |  28-Jan-19 |  21-Feb-19  | 23.50082176 |
|  65  |  200 |  13-Feb-19 |  21-Feb-19  | 7.115717593 |
|  66  |  200 |  31-Jan-19 |  28-Feb-19  | 27.55881944 |
|  67  |  200 |  25-Feb-19 |  28-Feb-19  | 2.633738426 |
|  68  |  200 |  31-Jan-19 |  27-Feb-19  | 26.46105324 |
|  69  |  200 |  23-Feb-19 |  27-Feb-19  | 3.531423611 |
|  70  |  200 |  1-Feb-19  |  28-Feb-19  | 26.45835648 |
|  71  |  200 |  27-Feb-19 |  28-Feb-19  | 0.471296296 |
|  72  |  200 |  6-Feb-19  |  27-Feb-19  | 20.54436343 |
|  73  |  200 |  23-Feb-19 |  27-Feb-19  | 3.598854167 |
|  74  |  200 |  6-Feb-19  |   5-Mar-19  | 26.54347222 |
|  75  |  200 |  28-Feb-19 |   5-Mar-19  | 4.303773148 |
|  76  |  200 |  12-Feb-19 |   6-Mar-19  | 21.56993056 |
|  77  |  200 |  1-Mar-19  |   6-Mar-19  | 4.597615741 |
|  78  |  200 |  12-Feb-19 |  14-Mar-19  | 29.50417824 |
|  79  |  200 |  7-Mar-19  |  14-Mar-19  | 6.083541667 |
|  80  |  200 |  28-Feb-19 |  28-Mar-19  |  27.5291088 |
|  81  |  200 |  25-Mar-19 |  28-Mar-19  | 2.637824074 |
|  82  |  200 |  29-Jan-19 |  28-Feb-19  | 29.34280093 |
|  83  |  200 |  21-Feb-19 |  28-Feb-19  | 6.233831019 |
|  84  |  200 |  19-Feb-19 |  30-Apr-19  | 69.51832176 |
|  85  |  200 |  7-Feb-19  |   5-Mar-19  | 25.74865741 |
|  86  |  200 |  27-Feb-19 |   5-Mar-19  | 5.380034722 |
|  87  |  200 |  21-Feb-19 |  21-Mar-19  | 27.56310185 |
|  88  |  200 |  19-Mar-19 |  21-Mar-19  | 1.161828704 |
|  89  |  200 |  26-Feb-19 |  28-Mar-19  | 29.41315972 |
|  90  |  200 |  22-Mar-19 |  28-Mar-19  | 5.673703704 |
|  91  |  200 |  26-Feb-19 |  28-Mar-19  |  29.5131713 |
|  92  |  200 |  20-Mar-19 |  28-Mar-19  | 7.073414352 |
|  93  |  200 |  28-Feb-19 |  15-Apr-19  | 45.63513889 |
|  94  |  200 |  5-Apr-19  |  15-Apr-19  | 9.479456019 |
|  95  |  200 |  1-Mar-19  |  29-Mar-19  | 27.54568287 |
|  96  |  200 |  25-Mar-19 |  29-Mar-19  | 3.044340278 |
|  97  |  200 |  4-Mar-19  |  27-Mar-19  | 22.52392361 |
|  98  |  200 |  21-Mar-19 |  27-Mar-19  | 5.074421296 |
|  99  |  200 |  14-Feb-19 |  19-Mar-19  | 32.54349537 |
|  100 |  200 |  13-Mar-19 |  19-Mar-19  | 5.265266204 |

My current SQL query looks like this:

SELECT
    Step,
    ROUND(MEDIAN(Part_Finish - Step_Start), 2) AS "The_Median",
    ROUND(AVG(Part_Finish - Step_Start), 2) AS "The_Average",
    PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Best_Time",
    PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Worst_Time"

FROM
    myTbl

GROUP BY
    Step

However, I am not sure if my results are correct, because I don't think I am using PERCENTILE_CONT() correctly. How can I use PERCENTILE_CONT() (or another method) to find the average or median (whichever is easier) "time to complete" based on the best 20% of the data, and the worst 20% of the data?

I would expect some results to look like this:

| Step | The_Average | The_Median | Best_Time | Worst_Time |
|:----:|:-----------:|:----------:|:---------:|:----------:|
|  200 |  < value >  |  < value > | < value > |  < value > |

where the < value > fields are the properly calculated average, median, and best and worst of the dataset. Best and worst being calculated by finding the average or median of the top 20% of the data (i.e., the smallest times) or the worst 20% of the data (i.e., the largest times)


回答1:

PERCENTILE_CONT is a window function, so if you just want a result set consisting of a single record with scalar values, you may try selecting distinct:

SELECT DISTINCT
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "The_median",
    ROUND(AVG(Part_Finish - Step_Start) OVER (ORDER BY Part_Finish - Step_Start), 2) AS "The_Average",
    PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Best_Time",
    PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Worst_Time"
FROM myTbl;

The reason for the above approach is that selecting PERCENTILE_CONT, a window function, over your entire table would just return the entire table as the result set. But, as you are using it, the values would always be the same for each record. Therefore, we can just take the distinct value to get a single result.

If you instead expect a different report for each Step value, then you should be using PARTITION BY in the calls to PERCENTILE_CONT, e.g.

PERCENTILE_CONT(0.5) WITHIN GROUP (PARTITION BY Step
    ORDER BY Part_Finish - Step_Start) AS "The_median"
  • 发表于 2019-07-05 22:37
  • 阅读 ( 139 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除