Cognos Support


getCubeState	Check whether a dynamic cube is started, paused or stopped
getCubeMetrics	Check the metrics of a dynamic cube before or after loading incremental updates (see http://www.ibm.com/support/docview.wss?uid=swg27040417)
startCubes	Start or resume a published dynamic cube
forceStartCubes	Start a published dynamic cube using the parameter "startROLAPCubesAndSourceCubes"
stopCubes	Stop a dynamic cube
forceStopCubes	Stop a dynamic cube using the parameter "stopROLAPCubesImmediately"
restartCubes	Restart a dynamic cube
pauseCubes	Pause a dynamic cube to make it unavailable to report users (see http://www.ibm.com/support/docview.wss?uid=swg27040535).
incrementallyLoadCubes	Load incremental updates to a dynamic cube's data caches (see http://www.ibm.com/support/docview.wss?uid=swg27040417)
refreshCubeDataCache	Refresh the data cache for a dynamic cube
refreshCubeMemberCache	Refresh the member cache for a dynamic cube
refreshCubeSecurity	Refresh the security settings for a dynamic cube
clearCubeWorkloadLog	Remove the workload logs for a dynamic cube


-p port	Specifies the port to use. Default: 9300.
-s server	Specifies the server name to use. Default: "localhost"
-x output_file	Specifies the name of the output file to which to write the structured command results.
-l "namespace,userid,password"	Specifies the BI server login parameters. For example: -l “LDAP,admin,secret123"
-arg argName argValue	Specifies command arguments: transactionID - An optional argument for the incrementallyLoadCubes command.

Dynamic Cubes - Near Real Time and Cross Database Queries

Introduction

Purpose of Document

IBM Cognos Dynamic Cubes deliver tremendous performance gains for data at rest. In today’s changing business environments there is also a need to reduce how stale data is or how often data is refreshed. Examples of this requirement may include financial processes where an adjustment is made at an end of period and the results are visualized soon after the fact to ensure correct figures or in a sales environment where up to the minute data alongside historic data is required to help spot trends and improve business outcome.

This document assumes knowledge of IBM Cognos Dynamic Cubes and IBM Cognos Cube Designer and that the reader has an advanced level of knowledge of cube modelling. For more information on cube modelling see the IBM Cognos Dynamic Cubes Redbook and the Dynamic Cubes User Guide. There are two main concepts described in this document:

The ability to have high performance queries from cached and aggregate-aware historic data structures merged with near real-time structures of data where new data and deltas are applied.
The ability to do this spanning two supported Relational Database Management Systems (RDBMS).

Please note the introduction of a second RDBMS is usually due to the requirement for cross functional analysis – for example the billing warehouse is in DB2 whereas the returns warehouse is in SQL Server. In this case, IBM Cognos Dynamic Cubes can have base cubes with partially differing structures (grain and dimension number) within a virtual cube to show which customers invoices had all its line items returned. In this document we assume that the data structures are the same and that near real-time will be achieved by spanning different database platforms.

It is more likely that this technique of near real-time analysis would be performed within the same RDBMS, such as DB2. The decision to use two different RDBMS engines in this document was to demonstrate a more complex environment.

Approach To Near Real-Time Dynamic Cubes

The approach stems from one briefly described as Time Partitioning in the IBM Cognos Dynamic Cubes Redbook (see the Resources section at the end of this document) but with the delta cube set as near real-time. It is relatively simple and works on the premise that the largest volume of data is historic and does not change.

New data (inserts of new rows, deltas or removal of rows) tend to be only for the current period. Similarly the largest amount of data, and therefore the data most in need of caching, aggregates and other optimisations tends to be historic verses the proportionately smaller current period. The larger the amount of data, the longer it takes to re-populate caches, calculate in-database aggregates and load in-memory aggregates thus it is not always feasible to re-populate optimisations intra-day. By splitting data based on periodicity, the need to frequently reload these optimisations is reduced.

The solution is to split data based on volatility with non-changing data in one (or more) cubes and real-time data in another. This means that optimisations such as summary tables need not be loaded and calculated so frequently and real-time data is small enough to be aggregated on the fly. Although this applies equally to splitting data by volatile and static datasets, for this document we will use the analogy of historic and current period. This does not refer to the dates in the dataset but rather the date the changed data is entered - a delta is performed within the current period but may apply to a past transaction record.

Depending on data volumes involved we could have history, current month and today as separate dynamic cubes, with only the today dynamic cube not leveraging an aggregate cache. We could then combine the physical cubes together to create a set of virtual cubes, which would combine the all the data into one virtual cube for reporting.

For the sake of simplicity, in this document we will have only a historic cube that contains non-changing data and a real-time current period cube which is a cube built upon the changing data of the current period with no aggregates or data/result set cache. Figure 1 shows these historic and current period cubes can be merged into one virtual cube that the users can report from.

Figure 1 - Real-time Dynamic Cube with historic and current period cubes merged into a virtual cube

Note that a virtual cube in IBM Cognos BI 10.2 can currently only combine two cubes. However, one may be a virtual cube thereby allowing for more historic data cubes by chaining them together.

Environment

The environment for this exercise is a simple, single virtual machine running all the required components. Figure 2 shows the various pieces of this environment which include,

IBM Cognos Dynamic Query Analyzer
IBM Cognos BI 10.2 single server installation
Firefox 10.0.11 and Internet Explorer 8 web browsers
Microsoft IIS 7.5 Web Server
IBM DB2 10.1 running the IBM Cognos BI Content Store and containing 100 million rows of legacy data
Microsoft SQL Server 2008 housing the real time data set

Figure 2 - Diagram of environment used for this document

This virtual machine is also used for other workloads and is therefore larger than needed for the data volumes used here. For example, during this testing the system was actually using less than 8 GB of RAM.

Dataset Description

The dataset used for this document is a modified version of the standard IBM Cognos GO Sales sample, represented as a star schema. For the purposes of this document we are concentrating on the modelling and delivery techniques, rather than the actual data. Thfigure3erefore there may be some duplication of data due to the need to generate reasonable data volumes. There are two data sets - one is held in an IBM DB2 database and the other in a Microsoft SQL Server database.

DB2 dataset

Tables 1-8 below detail the structure of each dimension from the DB2 datasource. The number of members for each level in a dimension will help to provide context as to how sparse the data may be.

Table 1 - Products Dimension

Levels	Number of Members
Product Lines	5 members
Product Types	21 members
Product	115 members

Table 2 - Order Method Dimension

Levels	Number of Members
Order Methods	7 Members

Table 3 - Periods Dimension

Levels	Number of Members
Millennium	2 members
Year	112 members
Quarter	448 members
Month	1344 members
Day	Days in respective Month

Table 4 - Slice Dimension

Levels	Number of Members
Slice	63 members

Table 5 - Retailer Dimension By Type

Levels	Number of Members
Retailer Type	8 members
Retailer	109 members

Table 6 - Retailer Dimension By Location

Levels	Number of Members
Retailer Region	21 members
Retailer City	233 members
Retailer Site	391 members

Table 7 - Staff Dimension

Levels	Number of Members
City	28 members
Staff	104 members

Table 8 - MainFact fact table

Levels	Datatype
Quantity	bigint
Unit Cost	decimal (19,2)
Unit Price	decimal (19,2)
Unit Sale Price	decimal (19,2)
Sale Total	decimal (19,2)
Gross Profit	decimal (19,2)
Total Rows	bigint

Total number of rows in the MainFact fact table is 100,825,767.

SQLServer Dataset

Tables 9-16 below detail the structure of each dimension from the SQL Server datasource. The number of members for each level in a dimension will help to provide context as to how sparse the data may be.

Table 9 - Products Dimension

Levels	Number of Members
Product Line	5 members
Product Type	21 members
Product	115 members

Table 10 - Order Method Dimension

Levels	Number of Members
Order Method	7 members

Table 11 - Periods Dimension

Levels	Number of Members
Millennium	2 members
Year	112 members
Quarter	448 members
Month	1344 members
Day	Days in respective Month

Table 12 - Slice Dimension

Levels	Number of Members
Slice	One slice as RT Slice

Table 13 - Retailer Dimension By Type

Levels	Number of Members
Retailer Type	8 members
Retailer	109 members

Table 14 - Retailer Dimension By Location

Levels	Number of Members
Retailer Region	21 members
Retailer City	233 members
Retailer Site	391 members

Table 15 - Staff Dimension

Levels	Number of Members
City	28 members
Staff	104 members

Table 16 - MainFactRT fact table

Levels	Datatype
Quantity	bigint
Unit Cost	decimal (19,2)
Unit Price	decimal (19,2)
Unit Sale Price	decimal (19,2)
Sale Total	decimal (19,2)
Gross Profit	decimal (19,2)
Total Rows	bigint

Cube Design

The design of both cubes is the same with the only exceptions being the calculations used for the Millennium and Quarter levels where the resultant values are the same but RDBMS specific functions were used. The DB2 historic data set also has some optimisations added as part of the tuning exercise. These optimisations include database aggregates (summary tables) and in-memory aggregates as well as having the data and result set caches enabled.

This document will first test without the historic cube optimisations so that relevant comparisons of timings can be made.

The star schema with a single snowflake dimension is shown below in Figure 3 with the hierarchies and levels added. The diagram shows theMainFact table in the middle of the snowflake with the tables Large_Periods, Staff_Dimension, Order_Method_Dimension, Slice, Retailer_Dimension and Product_Dimension joined to it. The Product_Dimension table also has Product_Type joined to it, which in turn is joined to Product_Line completing the relationships required for Products_Dimension.

Figure 3 - Star schema diagram of historic data

Test Reports

The reports used for this test were designed with two purposes, to simulate simple ad-hoc navigation and to simulate a more complex analysis that may be saved as a report output in the future. Table 17 below lists the report names that will be referenced later in the document. The reports and data were created in an IBM Cognos Insight application for ease of result analysis.

Table 17 - Test Report Names

Report Number	Report Name
1	Basic Crosstab Products by Order Method
2	Drill Down on Products
3	Drill Up
4	Replace Columns
5	Drill Down to Years
6	Multi Level Set
7	Multi Level Calculate and Top
8	Real-time Test

Figure 4 is a simple report that shows an initial ad-hoc query for all data, the total number of rows for each product line by order method. Notice the highlighted circle which shows the total rows returned being 100,825,767.

Figure 4 – The Basic Crosstab Products by Order Method report

Figure 5 shows a simple crosstab report that simulates a drill down on the largest product line (Mortgages) to show the mortgage product types by order method.

Figure 5 – The Drill Down by Products report

Figure 6 demonstrates the user drilling back up to the original view of total number of rows for each product line by order method.

Figure 6 – The Drill Up report

Figure 7 shows a crosstab report that changes the report structure to simulate the user investigating over time - the total number of rows for each product line by time. In this case 1000s and 2000s columns refer to which century the date falls in. For instance 1996 would fall into the 1000s column.

Figure 7 – The Replace Columns report

In Figure 8, the user drills down to investigate the current millennium.

Figure 8 - The Drill Down to Years report

In Figure 9, the user lists the total rows by all product types for two years alongside two quarters and three months.

Figure 9 - The Multi Level Set report

Figure 10 shows the same report as Figure 9 but this time there is a calculation showing the percentage contribution of June 2012 to the overall year 2012 and then shows the top five product types based on their contribution.

Figure 10 – The Multi Level Calculate and Top report

Figure 11 shows the top five product types calculation that was used. This was accomplished by using the properties of the Set Definition and setting the Top or Bottom properties to a Type of Top and the Number of items to 5.

Figure 11 - Top five product type calculation used in Figure 10

Figure 12 is a simple report output listing total rows by slice where the real-time slice is differentiated.

Figure 12 - The Real-time Test report

The Approach To Timing

Timings are all taken using the total query time shown within Dynamic Query Analyser (DQA) when the report is run directly within the DQA interface (Figure 13).

Figure 13 - The total reported time for the query processing shown in Dynamic Query Analyzer

The two underlying RDBMS’s are not restarted between tests and it is possible that any caching of data or query plan at that layer may skew the results slightly. Therefore, report executions were performed three times and an average taken.

As the environment is virtual, it is also possible that there are variations in the results due to other workloads. Where appropriate, the Dynamic Cubes were restarted after each run, with the only exception being the merged virtual cube where restarting it would have negated the benefits of some of the caching at the historic cube layer.

Finally, after all tuning was performed, each run had two iterations - an initial run and a re-run so that the benefits of the data and result set caches could be seen.

All results from the last phase of tuning were copied into Final times (1st run) and Final times rerun for easy comparison across tuned historic data and near real-time and merged data. As previously mentioned, all the timings were recorded in an IBM Cognos Insight application for ease of result analysis.

The Historic Cube

Description

This star schema contains just over 100 million rows of static data and is modelled the same as any normal Dynamic Cube. The cube then had an optimisation cycle with workloads applied to improve performance.

Although 100 million rows is small it does give us a baseline - before tuning, the total average report query run time is 585.54 seconds (just under 10 minutes), long enough for us to see the impact of tuning and compare with the near real-time cube later.

Configuring the cube

A Dynamic Cube model was built on top of the star schema, deployed and the deployment was then used for the initial timings. The Aggregate Advisor was used to generate the summaries for both of the databases. In-memory aggregates and some manually created summaries that were not part of the advisor results were added from a previous exercise. The maximum amount of memory IBM Cognos BI uses for the aggregate cache was changed to 1024 MB from the default of 0 MB (Figure 14) in order to allow for sufficient memory to house the in-memory aggregates needed for the cube.

Figure 14 - Aggregate cache memory setting

For this volume of data the Aggregate Advisor recommended in-memory aggregates totalling 190Mb of RAM. For more information on this process and Dynamic Cubes overall please see chapter titled Cognos Dynamic Cubes administration in the IBM Cognos Dynamic Cubes User Guide or the IBM Cognos Dynamic Cubes Redbook.

Report timings

Figure 15 below shows the IBM Cognos Insight report displaying the report timings and demonstrates that the impact of any summary aggregate is substantial. Simply adding the in-database aggregates and using the built-in aggregate awareness of IBM Cognos Dynamic Cubes reduced the overall report run time from 585.54 seconds to 0.92 seconds. It also shows that the Drill Up report (Report 3) and the Multi Level Calculation and Top report (Report 7) never took more than .01 of a second.

Figure 15 – Tuning results in seconds

Changing the view to milliseconds (ms) shows fairly consistent performance for the Drill Up and the Multi Level Calculation and Top reports across all levels of tuning - between 6 and 7 milliseconds for the slowest query (Figure 16).

Figure 16 - Tuning results in milliseconds

The reasons why performance is similar across all levels of tuning is different for each report. The Drill Up report (Report 3) is actually able to reuse the result set from the Basic Crosstab Products by Order Method report (Report 1) thereby not requiring a query to the database and showing that the data cache and result set caches should aid ad-hoc query when drilling back to a prior view, or when outputting the results to a different format. For the Multi Level Calculation on Top report (Report 7), the query time is also consistent. This is because the data cache from the Multi Level Set report (Report 6) can be re-used but there is an extra five milliseconds for the calculation and the topcount functions to be performed on top of the data cache. This not only demonstrates the value of the hybrid query engine but also shows how analysis would benefit as users perform calculations and other operations.

As illustrated in Figure 17, the average final times on the re-run of the Drill Up report (Report 3) is nearly a millisecond more than that of just using in-database aggregates and that the in-memory result is also slightly slower. Both Reports 1 and 3 are fundamentally the same query - the reason for this variance is partly due to fluctuations in the environment but also shows how analysis would benefit as users perform calculations and other operations.

Figure 17 - Report execution results for Reports 1 and 3

Once in-memory aggregates are added, the calculation performance is fast enough that the need to create a result set cache versus re-calculating and retrieving data from a data cache is negated because the query length has dropped from 340 milliseconds down to 73.99 milliseconds. The Minimum query execution time before a result set is considered for caching (milliseconds) setting determines the performance characteristic that drives the creation of a result set cache. This setting can be found in the Tuning properties of the Query Service using IBM Cognos Administration and defaults to 50 milliseconds (Figure 18).

Figure 18 - Dynamic Cube advanced setting for minimum query execution time for caching

Using the result set cache incurs less cost but for the sake of a query of less than 50 ms (retrieve from data cache at 0.88 ms vs. an initial in memory aggregate calculation of 73.99 ms) the maintenance is not worth it.

The impact of using the result set cache across all reports becomes clearer if we ignore the column labelled 1 No Aggregates run, as this skews the times due to the need to query 100+ million rows directly. Figure 19 shows some of the results. The initial run of the Basic Crosstab Products by Order Method report (Report 1) shows an initial run time of 122,268.60 ms when run without any form of pre-created aggregates. Compare that to the second execution where in database aggregates were leveraged, dropping the report execution time down to 340.94 ms and the third execution using in memory aggregates which dropped the report execution time down to 73.99 ms. Similar results are seen with the Replace Columns report (Report 4) where the initial run time was 242,232.15 ms, second run was 205.37 ms and third run was 12.05 ms.

Figure 19 - Report timings showing initial run versus cache leveraged results

Interestingly, for the Drill Down to Years report (Report 5), the addition of in-memory aggregates increased the overall run time from 74.01 ms to 102.407 ms. This is due to overzealous in-database summary creation at a finer grain. This matched the report better than the in-memory aggregate and was selected first. In practice this is not a good scheme as it requires maintenance of an in-database summary that is of little value. For ad-hoc queries, a larger in-database aggregate may produce better results. For the purpose of this document this will be ignored, as the difference in average query time is less than 50 milliseconds and re-runs effectively use caches that negate this anomaly.

Ultimately the query performance benefits from the use of aggregates. Caching and hybrid query are significant across all reports, reducing the overall run times down to milliseconds from minutes.

The Near Real-Time Cube

Description

The near real-time cube is identical in structure to the historic data set. The base table is populated with just over 1.5 million rows to simulate mid-day load and a stored procedure is used to add data at a rate of 362 rows per second.

With the exception of the member cache that is required by Dynamic Cubes, all caching is disabled and aggregates are not used.

Configuring the real-time cube

Once the cube is published and running we need to disable caching. In IBM Cognos Administration select the Status tab and choose System. Click on the server to drill into, then click on the Dispatcher and select QueryService. From here you can administer any Dynamic Cube that has been published to that Dispatcher. Right-click on the QueryService, choose Set properties from the menu (Figure 20) and then select the Settings tab (Figure 21).

Figure 20 - Set properties for QueryService in IBM Cognos Administration

Figure 21 – Settings tab for the Query Service

Under the Settings tab, click on the Edit... link for the Dynamic cube configurations entry (Figure 22) and then click the Edit configuration icon for the real-time cube, which in this case is named MainFactRT (Figure 23).

Figure 22 - QueryService Settings tab

Figure 23 - Dynamic Cube configuration

Ensure the Disable result set cache check box is enabled, the Data cache size limit (MB) is set to 0 and the Disable external aggregates checkbox is enabled (Figure 24). Once set, click OK three times to return to the main screen and save the changes.

Figure 24 - Dynamic Cube properties

We need to wait for the changes to be written to the Content Store and then filter back to the Dynamic Cube server - waiting for 30 seconds should be sufficient. We can then restart the cube and once restarted, we can see this is now working in near real-time based on the report outputs. In IBM Cognos Administration select the Status tab and choose System. Click on the server to drill into and then click on the Dispatcher and select QueryService. Right click on the Dynamic Cube, in this case MainFactRT, and select Restart (Figure 25).

Figure 25 - QueryService menu showing restart for real-time cube

As the real time database is loaded with data, you can see the number of rows for the corresponding product and order methods slowing increasing, we now have a system that is mimicking a real world scenario of real time data being loaded into a production database.

Report timings

As there are no in-database or in-memory optimisations, these are excluded from the results. The re-run does show an improvement on the second run, however this is due to the Dynamic Cube resources being loaded and prepared as well as improved RDBMS query run and plan caching at the data layer. Figure 26 shows the report execution time differences between the first run and the average of the next six report executions. Query performance increased by 45% on average

Figure 26 - Real-time report timings

This can be proven to a certain degree within Dynamic Query Analyzer. For one example report run, the data retrieval time for the query against the database was 361 ms against a total processing time of just under 379 ms. The second run was 60 ms database time against 93 ms total processing time. In both cases, approximately 30ms of processing took place outside of the data query. These results can be seen in Figure 27 below.

Figure 27 - Dynamic Query Analyzer results showing data retrieval times across runs

The XCubeRetrieval process represents a query against the underlying RDBMS. Analysing the queries issued, we find that they are identical across query runs. Example of a query used:

SELECT 
 "Order_Method_Dimension"."Order_Method_Key" AS "Order_Method_Dimension_Order_Method_Key" ,
 "Product_Type"."Product_Type_Code" AS "Product_Dimension_Product_Type_Code" ,
 SUM("MainFactRT"."Total Rows")AS "Total_Rows" 
FROM 
 "DQMSampleDataRT"."dbo"."Order_Method_Dimension" "Order_Method_Dimension"
  INNER JOIN "DQMSampleDataRT"."dbo"."MainFactRT" "MainFactRT"
 ON "Order_Method_Dimension"."Order_Method_Key" = "MainFactRT"."Order_Method_Key"
 INNER JOIN "DQMSampleDataRT"."dbo"."Product_Dimension" "Product_Dimension" 
 ON "Product_Dimension"."Product_Key" = "MainFactRT"."Product_Key" 
 INNER JOIN "DQMSampleDataRT"."dbo"."Product_Type" "Product_Type" 
 ON "Product_Dimension"."Product_Type_Code" = "Product_Type"."Product_Type_Code" 
 INNER JOIN "DQMSampleDataRT"."dbo"."Product_Line" "Product_Line" 
 ON "Product_Type"."Product_Line_Code" = "Product_Line"."Product_Line_Code" 
WHERE 
 "Product_Line"."Product_Line_Code" IN (1 )
GROUP BY 
 "Order_Method_Dimension"."Order_Method_Key",
 "Product_Type"."Product_Type_Code"

When issuing the same query within the RDBMS tools we can prove that the database is in fact completing the requests quicker for the same client. Figure 28 is taken directly from the RDBMS query tool statistics and does not include any IBM Cognos components. It shows the timings for this query are 855 ms for the first run followed by 149 ms to 158 ms for subsequent runs even though the data is changing. One can conclude, without further investigation, that the RDBMS is holding some of the data pages in cache, the new data is still in cache and the queries are executed in close succession time wise, hence the improvement in performance.

Figure 28 - Execution timings for query executed in RDBMS tool

As other queries are run this benefit disappears so by the next iteration we see the same effect. Only on the third restart of the Dynamic Cube and run did the results appear consistent between first and second run.

Further tracing at the RDBMS may prove the differences however as the results are, at most, a second or two apart and as this document focuses on Dynamic Cubes and not the underlying RDBMS, this anomaly will be ignored.

The expectation is that subsequent queries may slow over time on the real-time cube as data is being added and the base queries take longer. For results within our virtualized environment all queries running against 1.5 to 3 million rows of incrementing data took between 87 ms and 1.7 seconds which is acceptable performance for our infrastructure.

The Virtual Cube

The virtual cube plays a vital part in the system as this is where we add the tuned static data with the changing data by merging the historic and near real-time cubes into one cube. For our tests, the historic cube has been left fully cached to simulate a real world scenario while the reports are run twice in succession and the process repeated three times for each run.

How to create and configure the virtual cube

Using Cube Designer go to the Project Explorer pane, right-click on Model, select New, then select Virtual Cube (Figure 29).

Figure 29 – Create a new virtual cube in Cube Designer

In the Select Base Cubes dialog, select the historic cube Large Sales Dynamic Cube and the real-time cube MainFactRT (Figure 30).

Figure 30 - Select Base Cubes dialog with Large Sales Dynamic Cube and MainFactRT cube selected

If all the data was in the same database then the dimensions would be the same and match automatically. In some cases, as in this one, they are named differently because they come from different sources. We need to define how the virtual dimension will merge the underlying cube dimensions and measures - reference dimensions should be merged on keys. This is done in the properties of the virtual cube that was created (Figure 31).

Figure 31 - Virtual cube dimension mapping

We need to remove all real-time dimensions that do not match the historic ones by deleting them from the editor pane. Next ensure the like dimensions are mapped properly between the two physical cubes. Figure 32 shows the mapping dialog used to map the like dimensions between the Large Sales Dynamic Cube and the MainFactRT cube. Both cubes contain a product dimension, however the Large Sales Dynamic Cube contains a dimension called Products while the MainFactRT cubes product dimension is called Product_Dimension. To map the dimension select the ellipses corresponding to the cube for the dimension you wish to merge, then from the radio dialog select the dimension from that cube that you wish to merge.

Figure 32 - Virtual cube dimension mapping select source dialog

The next step is to ensure the hierarchies are mapped in the virtual dimensions as well. Double-click on each virtual dimension and ensure the proper hierarchies are mapped. If hierarchies have the same name then this will be automatically done for you, otherwise you’ll have to ensure the proper hierarchies have been selected by selecting the ellipses under each dimension from the base cube and select the correct hierarchy. In Figure 33 below, you can see that the Order Methods dimension in the virtual cube has the Order Methods hierarchy from one cube and the Order_Method_Dimension from the other cube mapped.

Figure 33 - Virtual cube dimension properties

Finally, check the levels in each hierarchy to ensure they align as well, by double clicking on each virtual hierarchy. Figure 34 shows levels used in the Periods and Large_Periods hierarchies are merged together into the Periods hierarchy in the virtual cube. You can see that in this case the level names all match, however if in your cube they do not you will have to select the appropriate level name that corresponds to the desired level to be used for mapping in the virtual cube.

Figure 34 - Virtual cube periods hierarchy level mapping dialog

For this near real-time example the level names should all match but if we were merging different subject areas within the same database or from separate ones, they may not match and there may be extra dimensions and levels within hierarchies. We need to ensure that the measures are properly merged using the sum operator. In Figure 35 below, the virtual measure Quantity is selected and the property Merge Operator is changed to Sum. This will ensure that when Quantity is used, the result is a sum operation between the values pulled from the two underlying cubes.

Figure 35 - Virtual Cube measures mapping dialog

Calculated measures such as Total Cost can also be used in a host of areas. Instead of storing measure values in the RDBMS and incurring the overhead of retrieving the totals as well as the computation overhead of aggregates and summaries, a calculated measure may be used that multiplies Quantity and Unit Cost. For multi-subject areas such as sales and returns, a calculated measure would be used for order quantity minus returns.

Another good example for performance is in the case of averages such as Average Sales Total. Instead of having the database calculate the averages and potentially avoid summary usage (the base fact is needed to calculate an average), it is good practice to include a total row measure in the summaries and base tables. We can then use a calculated measure for Average by dividing the desired measure by the total rows. This can then efficiently use summaries and caches that may already exist. For more information on the use of summaries and caches please see the IBM Cognos Dynamic Cubes Redbook referenced in the Resources section at the end of this document.

Finally the result set cache and data cache need to be disabled on the virtual cube. For more information on these settings refer to the steps the section titled Configuring the Real-Time Cube up to the editing the virtual cube definition. Figure 36 below shows the properties for the real time cube. The Disable result set cache check box is enabled and the Data cache size limit (MB) set to 0 will disable both the result set and data caches.

Figure 36 - Virtual cube settings to disable the result set cache and data cache

In a system where there is more than one historic cube - for example, cubes named History All Years, History Month and Real-Time - it may prove better to have a virtual cube for the two historic cubes with the data cache and result set enabled. The Real-Time cube could move daily rebuilds of the entire history aggregates to the end of a month which would result in only having to re-build the current monthly optimisations instead of a having to re-build each day. This saves memory and processing as the historic and monthly cubes would have both of their caches disabled and the overhead of recalculating the data cache from the optimized cubes may be minimal. A second virtual cube would then be used to merge the historic virtual cube and the real-time cube in the same way as described in this document.

Report timings for virtual cube

As can be seen from the results shown below in Figure 37, the overall query times for a report are at most 1.31 seconds, demonstrating how the same reports can efficiently run against the virtual cube made up of historic and real time data. Across the result set of over 100 million rows with changing data, this is deemed to be acceptable on this infrastructure.

Figure 37 - Virtual cube report timings first run versus rerun

If we compare the first run to that of the second and third runs, we can see a subtle difference in consistency (Figure 38). The first query run on some reports appear to be a second or so slower than subsequent reruns. In this test, the cube was not restarted so that the benefits of the historic caches can be fully realized, hence the second and third runs are more consistent as the Dynamic Cube server resources and RDBMS page caching start to take effect. Slight anomalies can be seen at the millisecond level, these may simply be the virtualized environment competing for resources

Figure 38 - Virtual cube first, second and third run timings

Overall Report Timings

When evaluating results across all scenarios, the total average report run time for the 100+ million row dataset with no changing data or optimisation took nearly 10 minutes (586 seconds). Once the changing data had been added and the system optimised, the total average report run time fell to just under 4 seconds (3870.17 milliseconds).

If we simply look at the total average report run time for the optimised historic data, the result was just under 15.85 ms so one could deduce that the addition of near real-time data and virtual cube added 3854.32 ms to the processing of the queries. The actual tested total average run time of the near real-time cube on its own was greater at 4096.02 ms. With the odd exception, this result can be seen across the individual reports (Figure 39).

Figure 39 - Historic, Real-time and Virtual cube timings for all reports

The results below show that the addition of near real-time data in a virtual cube has made little or no impact. Figure 40 shows that with the exception three reports (Multi Level Set, Multi Level Calculation and Top and Basic Crosstab Products by Order Method), the addition of real-time data with the cube optimizations actually resulted in a decrease in report execution times ranging from 28 milliseconds to 151 milliseconds. Even those reports that increased in execution time only saw an increase between 14 milliseconds and 147 milliseconds. These are values which will have no real impact on the actual users.

Figure 40 - Time difference between Virtual Cube and Real-time cube

In this example most average query times actually reduced in comparison to the base real-time runs. This is attributed in part to the ability to parallel process virtual cubes (and can also happen with Dynamic Cubes) but also the effect of not restarting the cubes after each run meaning that the underlying RDBMS is able to perform better on the same connections verses new connections. This effect can be seen below in Figure 41 where the first run shows that the total execution time for the reports against the virtual cube took 4337 milliseconds, versus 3437 milliseconds for the real-time cube. Figure 42 shows the second run and we see the performance improvement where the total report execution on the virtual cube was 3581 milliseconds versus 4110 milliseconds for the real-time cube.

Figure 41 - First run report timings of a virtual cube versus real-time cube

Figure 42 - Second run report timings of a virtual cube versus real-time cube

Overall the results show that a running system with connections created against the underlying database can run near real-time queries on volumes greater than 100 million rows. In fact, ad-hoc queries should be possible on even larger volumes providing the real-time dataset is small enough to return the results within acceptable times.

Near Real-Time Cube Configuration

Throughout this document the term near real-time has been used. This is partly because the data is not streaming in real time. We need to query data at the point an output is requested and latency in the query may mean that the data has already changed marginally. This is of course pedantic as in most cases being able to request results and have a view within a second or so is considered to be real-time. The main reason for stating that this is near real-time is because the dimensional framework - all the dimensional members and their attributes - are loaded at cube start-up time. In the example used in this document, this is not an issue as there are no new members created - all real-time data uses keys that already exist (such as Products). Even times and dates would normally be fully populated. Should there be a requirement for the member cache to be refreshed to accommodate new members, IBM Cognos BI provides the functionality to do so.

When refreshing the member cache, the new cache is loaded into memory from the database and then swapped with the running cache when ready. This has the performance advantage of user requests not being queued while the member refresh queries are running, however more RAM is required on the server hosting the IBM Cognos BI Dispatcher that is housing the cube in order to hold two copies of the member cache during the refresh.

The most effective method to refresh the member cache is to create an administration task than can then be scheduled or triggered. From IBM Cognos Administration, select the Configuration tab and in the left pane select Content Administration. Click the New Query service administration task dropdown and then select Dynamic cube... from the menu that appears (Figure 43).

Figure 43 - New Query Service administration task for Dynamic Cube refresh

Give the new task a name - Reload Dimensions is being used here - and if desired a description and screen tip. Click the Next button (Figure 44).

Figure 44 – New Query Service Administration Task wizard dialog to specify the task name

Select the Refresh member cache option from the Operation drop-down and ensure the Server Group and Dispatcher are properly set. Create this task only for the real-time cube as this action will force a refresh of the virtual cube member cache. Do not do this against the historic cube as it would force a reload of all the in-memory aggregates and invalidate the cache, thereby impacting historic performance. This is done by ensuring the only the MainFactRT cube is selected from the list in the Cubes section (Figure 45).

Figure 45 - Dynamic Cube task options completed

Click the Next button and in the Select an action screen, set the Action to Save and schedule and click Finish (Figure 46).

Figure 46 – Save and schedule for the new administration task for the Query Service

The report shown in Figure 47 was run before running the administration task and shows the real-time slice of data (excluding the seeded rows) with 1,600,409 rows of data in the Historical Cube and the number of rows being added into our real time database (represented by RT Slice) is 47,421.

Figure 47 - Report displaying members before task execution

The administration task is executed and for this model the total refresh time is approximately 2 seconds. For a larger dimensional framework with millions of members, this time could increase so testing should be performed. After the administration task has run we can now see the new member RT New Member with 1 row and the RT Slice now having 72,302 rows of data (Figure 48).

Figure 48 - Report displaying RT New Member added after task execution

It is worth emphasising that this administration task can be scheduled to run and the interval can be down to the minute or based on an external trigger. For a system where new member inserts can be detected, the trigger could be used to ensure the dimensional framework is only updated when required. For example, Event Studio could detect a change in the count of members and then execute an administration job to refresh the member cache. For small dimensional frameworks, the overhead of running the queries against the dimension tables using Event Studio, may negate the benefit of simply refreshing the member cache on a time scheduled basis.

Roll-Over Of Data

As the performance of the real-time cube is dependent on the volume of data and the RDBMS in which it is stored, a suitable mechanism for moving the committed, real-time rows to the historic data set needs to be put in place. As a roll-over also requires the historic aggregates to be updated and the in-memory aggregates to be reloaded, the frequency of this needs to be balanced with real-time data volumes.

Although more complex, this is where using multiple historic cubes may be more advantageous by reducing interim roll-over times. The process of rolling data over is a simple concept and may use ETL tools through to simple SQL scripts to perform the following tasks.

Move data from the real-time fact table to the historic fact.
If the dimensions are not the same, add any new members to the historic dimensions.
Update the historic in-database aggregates.
Issue a restart command for all the cubes.

Simply refreshing the member cache on the historic and real-time cubes would have a similar effect to restarting those cubes, in that all the optimizations and caches will be refreshed and may minimize downtime at the expense of the extra RAM needed to process the change.

In order to create a job to restart all the cubes, create three separate query service administration tasks and add them to a single job. The process for creating the administration tasks was described in the previous section with only the tasks selected from the Operation dropdown list being different. The first task should stop the virtual cube using the Stop immediately or the Stop after active tasks complete operation, the second task should restart the base cubes using the Restart operation and ensuring the base cubes are selected and the third task should start the virtual cube using the Start operation.

Next create a new job (Figure 49) to combine the three tasks to be executed in a specific order and in the Name field type Restart Near Real-time (Figure 50). To have the tasks executed in sequence, add them to the job in the order you wish them to execute and ensure the In Sequence radio button is selected (Figure 51).

Figure 49 - Cognos Administration new job creation

Figure 50 - New Job creation name and description

Figure 51 - New Job with tasks in execution order

This job can now be used to restart the whole set of cubes manually by running the job interactively or based on a schedule. In many instances, the most useful scheduling option is through the use of a trigger as the job can be invoked by an external process such as an ETL process.

It should be noted that the need to first stop the virtual cube is due to IBM Cognos BI 10.2 requiring all dependant virtual cubes to be stopped before a cube can be restarted.

The Impact Of A Multi-Cube System

Through this document you have seen that the use of virtual cubes has minimal overhead on the query times in general. However there is a cost associated with the use of multiple cubes and, in this case, a virtual cube. This cost includes an increase in processing requirements as more threads are required to handle parallel execution. Dynamic Cubes benefit from more CPU computing power and this can play a significant part in improving the performance of a single, complex query.

It should be noted that for a multi-cube system, there is the requirement for sufficient RAM. In IBM Cognos BI 10.2, each cube requires its own dimension member cache. If the member cache is refreshed while a cube is running, there also needs to be sufficient RAM to store a copy of the member cache. This is to minimise the impact on queries as, once loaded, the new member cache is swapped for the current member cache. In practice this means that for this near real-time method, up to six times the member cache may be required. For smaller systems this might have minimal impact however for large systems the impact on RAM requirement is increased substantially.

For more information on server sizing see the article Dynamic Cubes Hardware Sizing Recommendations referenced in the Resources section of this article.

Conclusion

In conclusion, the Dynamic Cube capability within IBM Cognos BI 10.2 is not only able to deliver significant performance gains on large datasets facilitating ad-hoc queries but is also able to add near real-time information to that in order to help with, for example, reporting on large volumes where iterative adjustment and report cycles need to be made.

Although the addition of the real-time data did, as expected, decrease the overall performance of the system, given the right balance of historic to real-time data on a suitable infrastructure containing an RDBMS that delivers the required query performance and a suitable regime for data management, it is possible to cater for scenarios requiring near real-time data with acceptable response times.

In addition, the use of differing RDBMS platforms shows that, providing the dimension tables are replicated and the main factual data can be retrieved from a single table, it would be possible to do this using a filtered operational system (or mirrored transactional database) and a warehouse.

It is important to note that the impact on the operational system would need to be measured for potential problems which mirroring technology may alleviate but still provide near real-time reporting. IBM Cognos BI versions 10.2 and 10.2.1 only support star and snowflake schemas for good reasons and even though it may be possible to use views on a transactional system to create a star structure for the real-time element, it is not advised as the performance of such views can impact query times.

Although not required, the use of two RDBMS has shown that it is possible for a virtual cube to merge information from two disparate data warehouses. This aids multi-subject area analytics in organisations that have multiple data warehouse strategies or where one application such as the financials system and another such as the asset management system are both provided with their own out of the box warehouses.

Resources

IBM Cognos Dynamic Cubes Redbook

Thursday 26 January 2017

How to auto refresh IBM Dynamic Cube

Thursday 19 January 2017

Dynamic Cubes - Near Real Time and Cross Database Queries

Introduction

Purpose of Document

Approach To Near Real-Time Dynamic Cubes

Figure 1 - Real-time Dynamic Cube with historic and current period cubes merged into a virtual cube

Environment

Figure 2 - Diagram of environment used for this document

Dataset Description

DB2 dataset

Table 1 - Products Dimension

Table 2 - Order Method Dimension

Table 3 - Periods Dimension

Table 4 - Slice Dimension

Table 5 - Retailer Dimension By Type

Table 6 - Retailer Dimension By Location

Table 7 - Staff Dimension

Table 8 - MainFact fact table

SQLServer Dataset

Table 9 - Products Dimension

Table 10 - Order Method Dimension

Table 11 - Periods Dimension

Table 12 - Slice Dimension

Table 13 - Retailer Dimension By Type

Table 14 - Retailer Dimension By Location

Table 15 - Staff Dimension

Table 16 - MainFactRT fact table

Cube Design

Figure 3 - Star schema diagram of historic data

Test Reports

Table 17 - Test Report Names

Figure 4 – The Basic Crosstab Products by Order Method report

Figure 5 – The Drill Down by Products report

Figure 6 – The Drill Up report

Figure 7 – The Replace Columns report

Figure 8 - The Drill Down to Years report

Figure 9 - The Multi Level Set report

Figure 10 – The Multi Level Calculate and Top report

Figure 11 - Top five product type calculation used in Figure 10

Figure 12 - The Real-time Test report

The Approach To Timing

Figure 13 - The total reported time for the query processing shown in Dynamic Query Analyzer

The Historic Cube

Description

Configuring the cube

Figure 14 - Aggregate cache memory setting

Report timings

Figure 15 – Tuning results in seconds

Figure 16 - Tuning results in milliseconds

Figure 17 - Report execution results for Reports 1 and 3

Figure 18 - Dynamic Cube advanced setting for minimum query execution time for caching

Figure 19 - Report timings showing initial run versus cache leveraged results

The Near Real-Time Cube

Description

Configuring the real-time cube

Figure 20 - Set properties for QueryService in IBM Cognos Administration

Figure 21 – Settings tab for the Query Service

Figure 22 - QueryService Settings tab

Figure 23 - Dynamic Cube configuration

Figure 24 - Dynamic Cube properties

Figure 25 - QueryService menu showing restart for real-time cube

Report timings

Figure 26 - Real-time report timings

Figure 27 - Dynamic Query Analyzer results showing data retrieval times across runs

Figure 28 - Execution timings for query executed in RDBMS tool

The Virtual Cube

How to create and configure the virtual cube

Figure 29 – Create a new virtual cube in Cube Designer

Figure 30 - Select Base Cubes dialog with Large Sales Dynamic Cube and MainFactRT cube selected

Figure 31 - Virtual cube dimension mapping

Figure 32 - Virtual cube dimension mapping select source dialog

Figure 33 - Virtual cube dimension properties

Figure 34 - Virtual cube periods hierarchy level mapping dialog

Figure 35 - Virtual Cube measures mapping dialog

Figure 36 - Virtual cube settings to disable the result set cache and data cache

Report timings for virtual cube

Figure 37 - Virtual cube report timings first run versus rerun

Figure 38 - Virtual cube first, second and third run timings