Simulation and Optimization
As processes are designed and built but before they are deployed for real use, it is desirable to know ahead of time how they might perform in reality. To help achieve this, IBPM provides a simulation capability built into the IBPM Process Designer which is called the Optimizer. The Optimizer can be used to examine how a process may perform by using simulated executions with hypothetical data. In addition, the Optimizer can be used with actual data collected over time from runs of the processes in production. Using this data, the Optimizer can visualize how the processes actually ran and provide visual indications of which areas may be good candidates for further improvement.
Drilling down further into the notion of simulation, we see that it is the fictitious execution of multiple processes. The IBM product pretends (internally) to execute the steps in the process and calculates timings and other numerical data. After the simulation has ended, the results are then shown to the user in graphs, charts and tables.
A key notion of a simulation is that it pretend to run faster than it would in reality. For example, if a step in a process would take an hour for a person to complete, simulating that step should not take an hour … rather, the product's simulator should internally track it as though it had taken an hour.
Defining simulation values
Think of each step in the process as having associated simulation attributes that the designer of the process explicitly provides. These attributes declare how long it may take to execute.
If we had a simple process that looks as follows:
We see that it is composed of three steps …
Step 2 and
Step 3. If we define that each step in the process will take 10 minutes to execute then when we run a simulation, we find that the average duration of a process is 30 minutes.
We can define simulation attributes of a BPD step by selecting the simulation tab in the Properties view:
Saying that a step in the process takes a fixed amount of time may not be that useful so we can define that a step can take a dynamic/variable amount of time based on three available distribution types:
Fixed distribution type
The fixed distribution time is the simplest of all. Its value is a constant (or fixed). For a wait or duration time, this means that the wait or duration will always be the constant value supplied.
Uniform distribution type
The Uniform distribution type says that its value will between a lower and an upper bound but the probability of any given value is evenly distributed across the range. For example, if the lower bound is 5 minutes and the upper bound is 15 minutes, then any individual value will be equally likely between these ranges. The probability of the value being 5 minutes is the same as that of being 10 minutes or that of being 15 minutes.
Normal distribution type
This is the most complex of the types of distribution. The distribution is in the form of a bell curve with the center of the bell curve set to be the
Average value. The slope of the curve is based upon the
Standard Deviation value. The
Range is an amount of time that is the threshold above and below the average and marks the possible start and ends of the time interval. For example, an average of 10 with a range of 5 will produce a series of values between 10-5=5 and 10+5=15. The distribution of these values is governed by the standard deviation. If the range should be negative (for example, average=10, range=20 which gives a lower bound of -10) this obviously has no meaning in simulation so any distribution values less than zero are also ignored.
The value chosen will be between the lower and upper range but the probability of any specific value will not be a constant across the range. The probability will be as described by the curve.
Gateways and simulation
When a gateway is added into a process:
the simulation settings for such an activity allow us to define a percentage of time one path or another is followed. A percentage is simply another expression for a probability. A value of 10% Is the same as saying 1-in10. It is important to realize that during simulation, application data for a process is completely ignored and, as such, gateway paths are followed based on statistical likely-hood and not on actual evaluated conditions. It is the choice of the designer to choose the statistical distribution to be assigned to each of the possible paths.
Arrival rate of simulation items
When a simulation is executed, imagine we wish to process 100 requests and then end the simulation. One of the first questions that comes to mind is "how fast" do we want these requests to appear to arrive? If we are running a simulation, it doesn't seem to make much sense that they should all arrive simultaneously as that would not happen in reality.
The Start Event BPMN symbol of a process can be used to describe the arrival rate of the simulated processes. This has an attribute called the "Firing Delay". If we defined this as "Fixed" with a value of 30 minutes, we would effectively be simulating 100 processes being started in total with each process arriving every 30 minutes. Of course, we could use other distribution values to represent alternate arrival distributions as opposed to a constant process arriving each unit of time.
Simulating the cost of execution
Since Human Tasks are performed by staff members and each staff member has a salary, there has to be a cost associated with execution a human task. For example, if a person is paid $20/hour and it takes them 30 minutes to perform a task, then the cost of performing that particular task would be $10.
In IBPM, we associated an activity with a Participant Group which defines the staff members that are eligible to perform that task. One of the configurable attributes of a Participant Group is the idea of a cost per hour. This attribute defines the cost of using a member of that group for an hour long period.
Since we can now model the expected duration of an activity and have also defined the staff members for that activity, we can calculate the cost of executing that activity. This can be reflected in the simulation results.
Available staff members for tasks
For simulation purposes, we can also define the capacity of a Participation Group.
This is the number of staff members that are contained within that group. This comes into play when there is more work to be done than there are people to perform it. Here is an example.
Imagine it takes 10 minutes to review a work request. Imagine that there 5 members in the team that can review work requests. If 5 requests arrive then all 5 members of the team may be busy reviewing a request. What happens if a 6th request arrives while the previous 5 are still being worked? The answer is simple, the 6th request will have to wait until one of the preceding requests completes at which time releasing the staff member from their previous task to work on the new task.
The Capacity of the Participation Group defines the number of staff in the group. It can be set to an estimated number within the simulation settings of the group or else it can be set to the exact names/number of the users that are actually defined to the group. Using an estimated capacity allows us to execute "what if" scenarios such as increasing or decreasing the number of staff members in the group.
A worker in a group is unlikely to be able to devote 100% of their day simply to working on IBPM tasks. They will have other responsibilities such as team meetings, education, management reports and more. The Availability setting of the Participant Group defines a percentage of their working hours that can be spent working on IBPM originated tasks.
We are human and can't work at 100% efficiency. We take bathroom breaks, chat with colleagues and go for coffee. To allow for this in the simulations, we can define an attribute called Efficiency which is again expressed as a percentage of available working time.
Simulation Analysis Scenarios
A simulation scenario is basically a collection of settings that are used to govern the execution of the simulation. A scenario describes what we are going to simulate and for how long we want that simulation to run.
A simulation analysis scenario is created from the Processes category within the library:
The scenario is given a name:
Within the details of the scenario a variety of attributes can be supplied.
First there is the simulated start date and time. This is the date and time when the simulation is set to begin. No processes will be simulated as having started before this time. This will be the epoch from which start events will be calculated.
When a simulation is to be performed, there must be a point at where the simulation should end. There are two exclusive choices for this. The first is a run time. This is the amount of elapsed time starting at the start time over which the simulation is to execute. For example, if we say that the start date/time will be 2012-08-25 from 09:00 and will run for 8 hours then the simulation will conclude at 2012-08-26 at 17:00. The number of instances of processes started will be governed by the firing rate of the process start event.
An alternative to duration is the idea of stopping after some number of instances of a named process have completed. For example, we can say that the simulation should end after 100 instances of process ABC have come to an end.
Within a simulation scenario we must supply the names of the Process Applications, Processes and Simulation Profiles. These are not optional. These will be the entities that are simulated.
Imagine we wish to run simulations with different sets of data to be able to ask and answer "what if" kinds of questions. We can create a named grouping of simulation settings. This grouping is called a profile.
Associated with elements in the BPD diagram are simulation profiles:
A simulation profile is the set of all simulating settings for a BPD. There can be multiple profiles and hence multiple sets of simulation settings.
Once the environment for simulation has been created, we are now ready to execute those simulations and examine the results. Switching to the Optimize perspective shows the simulation details.
In this perspective, there are a number of views each with their own tab titles.
The Analysis Scenarios allows us to select a Scenario for execution and then execute it:
Clicking on the
Calculate button causes the simulation metrics to be calculated.
The interpretation of the results of the simulation is combination of examining the BPD diagram, the
Heatmap Settings and the Live Reports view.
First, let us examine the Heatmap Settings.
In this area of the window the items marked in blue are changeable values. It is not known why they are not pull-downs.
The Visualization mode is of primary importance. It contains the following:
Selecting one of these report types changes the content in both the BPD diagram and the Live Reports view.
The first category is
Wait Time is the time that a task had to wait for execution because a staff member was not available to work upon it.
The Execution Time is the time taken to work on a task after a staff member became available.
The Total Time is the time taken including both waiting and execution for the task to complete.
Depending on the Visualization Mode selected, the BPD diagram will also change to be annotated with metrics about the simulation:
This provides an easy to read examination of the process as a whole to understand where time was spent.
IBPM uses "heatmap" diagrams to draw attention to important areas. Heatmap diagrams use saturation of color to depict the highs and lows. Activities will have richer colors around them if they need more attention than activities with softer colors.
Clicking in the BPD diagram will produce different reports in the Live Reports section.
Clicking in the white space of the process as a whole will produce a summary of the overall execution of the process.
From here we can get a great overall picture of the process including the costs, average durations and breakdown of the activities.
Clicking on a lane in the process will produce a report for all the activities contained in that lane including their details.
Finally, clicking on an activity step will produce a report on just that single activity.
Details are shown for various steps. These details include:
Another category of visualization is Path based.
This shows us visually the paths taken by the instances of the process. Heat-maps are used to indicate the relative frequency of paths taken. There are three path choices:
These filter all paths, only happy paths and only exception paths when choosing to show the heat-maps.
Optimizing a Process
Simulation is great for thinking about and asking questions for future processes or future changes to processes but this is only part of the story. As equally important a question is to determine where within the process improvements can be made. If a concern is raised that it is taking too long for customers to get their ordered products, unless you can tell where within that process time is being spent you won't know which areas would benefit from improvement.
The Optimizer component of IBPM serves double duty beyond that of being a simulation engine. The Optimize can retrieve historical data from the Performance Data Warehouse and use that information to allow the process analyst to examine the reality of previously executed processes. The data that can be examined includes the same information as was previously seen in the simulations.
There is some setup required in order to have Process Server capture this information.
First, Auto-tracking must be enabled. This is set in the properties of the Pool:
This is enabled by default.
The tracking definitions must be sent to the Performance Data Warehouse:
In the Optimized Analysis Scenarios:
|| |Start Event|Firing Delay| |Activity|Execution Time| |Condition|Outgoing Flow Percentages| |End Event|Firing Delay|
The areas of simulation and optimization make great reading and theory but an ounce of tutorial has been found to help many folks understand these idea with greater clarity. In addition, working through a tutorial can result in more solidity when trying to remember these features in the future.
Experiment 1 – A basic start
In order to better understand the nature of simulation and optimization, here is a simple tutorial. First we build a trivial process that looks as follows:
We accept all the defaults. Now it is time to change some settings. For our purposes, we will change the arrival rate of new process instances to be one new process every 5 minutes.
We will also say that the time it takes "Step 1" to execute will be 10 minutes.
We now have a bare bones process model as well as a simulation model set up. In order to be able to run a simulation, we now switch to the Optimizer view. We select "Single Simulation" in mode and create a new simulation scenario. The simulation scenario describes what we are going to simulate and for how long.
We are now ready to run a simulation. Before actually running it, what do you think the outcome may show?
After running the simulation by clicking the calculate button:
We can look at the Live Reports tab for the process as a whole.
One of the more interesting tables is the Instance Analysis table:
This shows that 200 instances were completed (which was the number we asked to run) and that the minimum and maximum run-times for all instances were both 10 minutes. This means that each instance of the process took exactly 10 minutes to execute. This also makes sense to us as there is only one step in the process and that step is simulated to be exactly 10 minutes long. So … is this what we expected to see? The answer would have been yes … had it not been for the notion that we configured a new process to arrive every 5 minutes. If there had only been two process instances, the first one would arrive at 09:00 and take 10 minutes to complete finishing at 09:10. The second process would arrive at 09:05 … but … and here is the interesting thing, would we not have assumed that it could not complete until the first one had finished? If we had thought that would have happened, we would have been wrong. Clicking on an individual step in our process, in our case Step 1, we can see the simulated items:
And there we see our answer. Step 1 was defined to be executed by members of a participant group called "All Users" which contains an unlimited number of theoretical users. As such, if a user is needed to perform that step there will never be a case that none are available.
We can constrain the number of users available in our simulation scenario:
Re-running our simulation now shows us something new.
We now see that the time taken to complete the process is trending upwards and there is great deviation in completion minimum and maximum times.
Let us now change the starting rate of new process instances. Currently, it is set at a fixed rate of one new process every 5 minutes. Let us change this to be one new process on average every 10 minutes. The key words here are "on average". This means on occasion the next process instance will start in less than 10 minutes and on other occasions the next process instance will start in more than 10 minutes. Changing the process start distribution type to Uniform, we can control these settings:
In our example, we say that the average will be 10 minutes with a minimum of 8 minutes and a maximum of 10 minutes. Running a simulation and looking at the data, we get the following:
This is our most interesting graph yet. What it is showing is that the duration of process instances is no longer cleanly predictable but instead shows quite a broad range of results. If we run the simulation again, we will get different values because of the randomization of start times for the process instances.
If we look further in the table, we see the following:
What we see is that the waiting time for this step in the process is not zero. This means that on occasion this step was waiting for staff members to become available for the process to continue. If we think back to what we previously did, we said that this step only has one person available to do it. If we increased the number of staff members available to do this work, we will see the process completion rate become even again.
We will now change some more settings. First we will add a new swim lane and two participant groups. Let us call them "My Team 1" and "My Team 2". We will associated each swim lane with a team. We will also add a second step.
For the second step, we will also define it as having a simulation value of 10 minutes exactly just as we did with Step 1.
In the simulation scenario, we will define each of the two teams as having only one member.
Again, we will re-run the tests. This time, we want to examine another aspect of the simulation tooling called heat-maps. In the Heat-map Settings, ensure that the Visualization mode is set to "Wait Time" and that the scaling range is less than 30 minutes.
Looking at the BPD diagram in Optimizer mode we see an interesting piece of information:
We see that Step 1 has been highlighted using heat map visualization as one that needs our attention.
In our story, we already know where our bottlenecks are because we artificially created the process to illustrate such things. However, imagine a much larger process with potentially a lot of activities within in. By simply glancing at heat map diagrams like this, we can immediately see which areas of our process could use attention.
Next we increase the capacity of My Team 1 to be 2 members and rerun.
Notice that the heat map indication has moved from Step 1 to Step 2 indicating that we have resolved the bottleneck for Step 1 but now we have a bottleneck in Step 2. We could continue this onwards but the goal here was to show how heat maps can be used to illustrate where attention should be provided.
Experiment 4 – Path analysis
Again following on from the last experiment, we are now going to look at decision gateways. We add a gateway into the process flow to end up looking as follows:
What this represents is a simple process that has a decision within it. In the simulation settings for the gateway we define that 25% of the time, the decision will direct us to Step 2.
Since these are exclusive decisions, this means that 75% of the time, the decision will take the other path. Having made this change, once again we execute the simulation.
In the Visualization mode, we select Path:
The BPD diagram now shows us the path's taken by the process, again visualized by heat maps and labels:
We can quickly see the weight of execution in different paths