Friday, February 26, 2016

Project Server SQL tips

This post is going to ignore Project Online as I am still a little underwhelmed by the reporting capabilities of the Excel+ODATA Power-shennanigans.  Mostly I spend my BI days in SQL Reporting Services for our on-premise customers, and even our larger Project online customers are moving back to local BI storage and SQL RS.

Anyway, rant over.  

So SSRS has been my friend for some years now and I continue to find new and interesting ways to make it do the things i want.  However the real fun is down in the TSQL area.  Here is a little selection of things I've found that may help you optimise or improve your report outputs.


Large datasets and the AssignmentByDay data

We all know that the MSP_EPMAssignmentbyDay table is a beast and is the main limiting factor when we look at time by period reporting.  Yes it would be great if MS did an automatic aggregation to weekly periods as part of the publish job but that is never gonna happen.  However it is possible to get some performance out of this monster table.

First things first.  get your indexes in order!  

Second things second.  NEVER USE THE VIEW.  

The MSP_EPMAssignmentbyDay_Userview is where madness lives.  The view loops through a join to the MSP_EPMAssignementBaselineByDay table and aggregates the Baseline0-10 values by day - even if you dont use more than one baseline.  Therefore you are executing a huge amount of SQL processing time just to get the values out.

Suggestion:  If you need both PLAN and BASELINE0 values, just use the tables in your join, ensuring you select the correct BaselineNumber when joining to the AssignmentUID

SELECT             foo
FROM               MSP_EPMAssignment A
LEFT JOIN       MSP_EPMAssignmentByDay AD 
    ON                 A.AssignmentUID = AD.AssignmentUID
LEFT JOIN       MSP_EPMAssignmentBaselineByDay BD
    ON                 A.AssignmentUID = BD.AssignmentUID   as BD.BaselineNumber = 0

Then you can extract your Baseline data and Planned data by day from the source tables without the overhead of the other baseline queries that are held in the view.


Aggregation

Lets face it most people don't need by day information except when looking at near-field operational level reports such as who's doing what next week.  These reports tend to have a limited date range and therefore the data set is reduced anyway, normally by adding a WHERE into the JOIN

INNER JOIN (
                     SELECT foo 
                     FROM MSP_EPMAssignmentByDay 
                     WHERE TimeByDay >= @DateFrom 
                           AND TimeByDay <= @DateTo
                  ) AD on ..................

So summing up to Week or Month is by far and away the best method.  Doing this early can sometimes reap rewards too (see optimisation).

So just to make sure we know how to turn days into months, here's a sample :

SELECT                 SUM(ad.AssignmentWork) as WorkByMonth
                               ,DATEADD(Month,DATEDIFF(Month,0,ad.TimeByDay),0) as MonthPeriodStart
FROM                    foo ad


This MonthPeriodStart will return  YYYY-MM-01 00:00:00 type values which work great for reporting outputs as you can then fiddle with the date format output to display JAN-00 or similar.

Note:  It is always useful to aggregate early using a pre-select and join later (see below)


Optimisation

Pre-selecting from the AssignmentsbyDay

Sticking the ByDay table in a big long nasty TSQL JOIN can really bear down on your performance.  Therefore it can be useful to grab this data early, stick it in memory and then reference it.  Normal approaches can be #temptables, @tablevariables or Common Table Expressions (CTE).  I have a fondness for the CTE just because it is so useful in so many ways, especially its recursive nature (more to follow).

So to use the CTE as the pre-select you could do something like this:

DECLARE @DateFrom Date = 'datevalue'
DECLARE @DateTo     Date = 'datevalue'

; WITH Assn AS 
               (     SELECT foo 
                     FROM MSP_EPMAssignmentByDay 
                     WHERE TimeByDay >= @DateFrom 
                           AND TimeByDay <= @DateTo
               )

SELECT        Assn.Foo
FROM           Assn -- cte name
INNER JOIN   on more tables as per normal



If we compare this approach with two standard SELECT models we see the difference in the estimated execution plan performance (on a small dataset)


DECLARE @DateFrom Date = '2015-01-01'
DECLARE @DateTo     Date = '2016-01-01'

--Option 1 CTE pre-select

; WITH Assn AS 
               (     SELECT * 
                     FROM MSP_EPMAssignmentByDay 
                     WHERE TimeByDay >= @DateFrom 
                       AND TimeByDay <= @DateTo
               )
SELECT Assn.*
FROM MSP_EpmAssignment A
inner join         Assn  
                   on         assn.assignmentuid = a.assignmentuid 

--Option 2 Subquery for assignment by day data                     

SELECT
FROM ( SELECT
FROM MSP_EPMAssignmentByDay ad 
WHERE TimeByDay >= @DateFrom 
                        AND TimeByDay <= @DateTo
) ad
inner join        MSP_EpmAssignment A 
           on        ad.assignmentuid = a.assignmentuid 

--Option 3 Standard inline query                     

SELECT
FROM MSP_EPMAssignmentByDay ad
inner join MSP_EpmAssignment A 
on ad.assignmentuid = a.assignmentuid 
WHERE TimeByDay >= @DateFrom 
        AND TimeByDay <= @DateTo




(Also - spot the missing index!)

So we see that each in-line SQL execution costs 36% overall whereas the CTE pre-select costs just 29% which is a small but not insignificant win!

interestingly moving the WHERE clause into a subquery didn't really get us any joy.

Just for completeness, the temp table approach cost than all the other query methods (with this small dataset) although I would expect some gains when larger datasets are involved.  I have never seen it match CTE performance though but mileage may vary of course!


What happened to my months - filling in the gaps

When using [for example] a Matrix in SSRS or an Excel Pivot, if you are reviewing data from the AssignmentsByDay data and there are some periods where there are no data then you will lose months in the output.  This could be especially problematic if you are doing Missing Timesheet reports or Resource Capacity reporting where the fact that work/actual work may not exist in a period is actually the point of the report so missing it out is not valid.

So how to fix.  This is something I have been playing with recently and I am 100% not saying this is a perfect method but it's working right now.

So what we need is a fixed set of periods between @datefrom and @dateto 

Firstly I use a recursive CTE to build these weekly periods:

DECLARE @ViewFrom DATE = '2015-10-26'
DECLARE @MinDate date =@ViewFrom
DECLARE @MaxDate date =DATEADD(WEEK,27,@ViewFrom)

;WITH ctedaterange
     AS (
        SELECT        [Dates]=@MinDate
                UNION ALL
            -- recursion
                SELECT        DateAdd(WEEK,1,[dates])
                FROM           ctedaterange
                WHERE        [dates] < @MaxDate
          ) 


SELECT * FROM ctedaterange

This will build us a set of 27 weekly periods between 2015-10-26 onward 

Next we want to use this CTE and create a set of rows by Resource and "Dates" so that each resource has a date period associated. We do this with a horrible JOIN as follows


SELECT foo FROM MSP_EPMRESOURCE, ctedaterange

This will build us a set of 27 weekly periods between 2015-10-26 onward for each resource!

Once you have this you have a number of options to begin to join that data created to your ByDay planned data.  I find joining on a combination of ResourceUID and time periods fields works well.  Alternatively you can create a UNION on this data to your by period query. Either way you end up with a full set of periods even when no data is planned in.  

I am not going to go into examples here as this goes way too deep for this blog post.  Sorry ;o)


Parent Child relationships in Task structures

A question I've come across quite often over the last year is how would you join a Task with AssignmentWork on it to another task further up the task tree (but not directly the parent summary task).  This is a specific need seem by those that are performing integration to external systems such as SAP PS, specifically aggregating to a SAP WBS code entered into a summary tasks in a plan somewhere up the task tree (as SAP PS is at a higher level than detailed operational planning).

This is easy when the summary task is just one level up (via the TaskParentUID field).  However what happens when the aggregation point is "somewhere above the Task Assignment" level.

To do this we need to JOIN the Task with the Assignment with an summary level field somewhere above it.  So in this example we are going to JOIN Task to a LEVEL 3 Task.

SELECT          L3.foo, tv.foo

FROM             MSP_EPMTask tv

LEFT JOIN
( SELECT TaskUID,TaskName, TaskOutlineNumber, ProjectUID,TaskWBS
FROM MSP_EPMTask T 
WHERE TaskOutlineLevel = '3'            -- this gets the L3 task parents
                        )       L3

ON           CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.' 
                        = LEFT(tv.TaskOutlineNumber,LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))

AND L3.projectuid = tv.ProjectUID

So what we are doing here is using the TaskOutlineNumber (1.3.1.10.) from the TaskAssignment level and the L3 Summary level (1.3.1.) and joining on the 1.3.1. value.

So this works for a fixed value for the summary but you can also do a similar join approach to join onto the first, second, lowest summary above the Task Assignment that matches certain criteria], for example:  Find the first summary task above the Task Assignment where a SAP WBS code is entered.

To do this you need to use a RANK/PARTITION in the Summary JOIN

What RANK allows you to do is associate a numeric ranking to your SQL results table.  The Partition element allows you to Rank within a subset of the rows returned.  In this scenario the Partition is the Task Assignment UID as we want to find all the parents of that task and then select the one we want based on RANK.

Now again i am not gonna give you the complete view of this as it really depends on your data/query/requirements but here's a sample.


SELECT data.*
FROM
(  
                 SELECT   RANK() 
                 OVER      (
                                    PARTITION BY tv.TASKUID 
                                    ORDER BY       L3.TaskIndex DESC
                                  ) r
                                  , l3.taskname as SummaryName
                                  , tv.TaskName
                FROM        (      SELECT * 
                                          FROM    msp_EPMTASK 
                                          WHERE TaskIsSummary = 1 
                                                  and SAPWBS IS NOT NULL 
                                    )  L3 
                LEFT JOIN (     SELECT * 
                                         FROM    msp_EPMTASK 
                                         WHERE TaskIsSummary = 0 
                                    )   tv
               ON        CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.' 
                                    = LEFT(tv.TaskOutlineNumber,   
                                       LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))
    AND       L3.projectuid = tv.ProjectUID
     
) data


Now what this would return is a record for each summary above the task WHERE SAPWBS IS NOT NULL.  What we want is the first one (i.e. the lowest level task) where SAP WBS IS NOT NULL so one record per bottom level task.

To do this we add in a filter on the RANK alias r

WHERE data.r = 1 

and tadaa we have a table that shows the bottom level task and the lowest summary task where WBS Code is set.  Now if you include the TaskUIDs etc in the subselects you can join out to wherever you need from here.