This paper focuses on the analysis of execution traces for real-time systems. Kernel tracing can provide useful information, without having to instrument the applications studied. However, the generated traces are often very large. The challenge is to retrieve only relevant data in order to find quickly complex or erratic real-time problems. We propose a new approach to help finding those problems. First, we provide a way to define the execution model of real-time tasks with the optional suggestions of a pattern discovery algorithm. Then, we show the resulting real-time jobs in a Comparison View, to highlight those that are problematic. Once some jobs that present irregularities are selected, different analyses are executed on the corresponding trace segments instead of the whole trace. This allows saving huge amount of time and execute more complex analyses. Our main contribution is to combine the critical path analysis with the scheduling information to detect scheduling problems. The efficiency of the proposed method is demonstrated with two test cases, where problems that were difficult to identify were found in a few minutes.
Real-time systems are characterized by their timing constraints. They are composed of real-time tasks that will each generate a sequence of jobs with a priority and a deadline.
The moment at which a new job has to be executed is called the arrival time, and the moment at which a job actually starts to be executed is called the start time. If the jobs arrive at fixed interval, the task is called periodic; otherwise it is called sporadic. Periodic tasks are often driven by timer, like the processing of video frames. On the other hand, sporadic tasks are often driven by interrupts, like the response to a user action. In both cases, there will be deadlines, but it will be deadlines relative to the start time for the sporadic tasks and absolute deadlines for the periodic ones.
To avoid unwanted consequences, those deadlines must be met by the real-time jobs. However, when only a few deadlines are missed, it can be hard to identify the underlying cause, due to the numerous components involved in the systems and their interactions. Because of this intermittent problem occurrence, profiling tools may have difficulty to pinpoint the source of the problem. The numerous jobs that went according to the specifications will be taken into account in the resulting statistics and hide the rare problematic ones.
In that situation, tracing can be useful or even essential. It consists in collecting selected events during the execution of a program and the time at which they occurred. It is then possible to analyse the interesting parts of the resulting trace. However, the trace can be very large and it can be difficult to identify those interesting parts. This motivates the need for specialized tools to help developers, by guiding them to efficiently find the problems.
Our objective is to develop such a tool. To test our concepts, we used a Linux kernel with the PREEMPT_RT patch. This was shown to provide excellent real-time response with the Linux kernel. In addition, we used the LTTng tracer, characterised by a very low overhead [
Because they are frequent in real-time systems, we decided to focus on problems related to scheduling and priority. The scheduler is the component that selects the threads that will be executed next on the CPUs. Each thread has a scheduling policy and a static priority that are used by the scheduler to take decisions. To be able to analyse the various tasks, we need a method that can support the different scheduling policies available on Linux. In addition, we want to detect priority inversions, when a higher priority task is needlessly waiting on a lower priority task, usually because the latter is holding a lock while waiting on a third task of medium priority. We also want to support the different protocols used to avoid this inversion, like the Priority Inheritance Protocol (PIP) and the Priority Ceiling Protocol (PCP) [
We first present related work in the fields of real-time tracing and pattern discovery. We then describe our new approach to efficiently solve scheduling problems with a real-time task in four steps. The first step is to let the users define the execution model of the task jobs with the optional help of a pattern discovery algorithm. Based on that model, the second step is to locate all the corresponding jobs in the trace. The third step is to select interesting jobs from a Comparison View that highlight those that are problematic. The last step is to execute different analyses on the corresponding trace segments. These analyses include our main contribution which is to combine the critical path analysis with the scheduling information to quickly detect scheduling problems.
Thereafter, the paper continues with the different options offered to the users and the implications in terms of execution time. We also show the different views that have been prototyped to display the results. Then, we present different examples, typical of industrial problems, that can be efficiently solved using our tool. We conclude on the possible next steps for future work.
In this section, we will review the studies that focus precisely on trace analysis for real-time systems. First, a pattern language is defined in [
In [
A method to extract useful metrics based on kernel traces is defined in [
A similar method is used in [
Multiple visual tools have been developed to display traces, like Tracealyser [
To find the real-time jobs in the trace, users must provide the corresponding definition in term of a list of tracing events that occur in order. When users do not know what events are involved in the execution of a real-time task, the first step is to suggest them some possible definitions in a graphical interface. This is the pattern discovery step. The users will then select a pattern that will later be used to find all the jobs.
Before explaining the algorithm, we will start with few definitions illustrated in Figure
Graph showing events in the trace as a function of time, as well as the two occurrences of a given episode.
Some algorithms work with the timestamps to find periodic patterns. However, because we want to support sporadic tasks, we wanted an algorithm based on the events order and not on a specific period. Also, to simplify the problem and increase the robustness, we choose to force the pattern to be on a specific thread and not on the whole trace. Real-time tasks divided among several different threads, and even processes, are relatively rare and follow much more complex patterns. On the other side, the algorithms are much more complex when some events are considered to have occurred in parallel, which is often the case with multiple threads.
Based on the previous criteria, we decided to use the
In a few words, the
As explained, the algorithm takes a sequence and a support threshold to output frequent episodes. In our case, we have complex events with fields and timestamps. To convert them to an ordered sequence of elements, we simply preserve the order of the events and drop the timestamps. These latter can provide useful information but, as previously explained, we want to support sporadic tasks, and we prefer to use the timestamps only in the analysis phase, once all the jobs are found.
Moreover, because we are wanting simple elements to compare, we decide to use only the event types. In fact, we have additional information available since each event can carry a payload in the event fields. While in some cases the event fields can denote a subtype (e.g., sys_read or sys_write instead of syscall), in other cases they can specify an instance (e.g., which timer was set or just expired) or simply some useful statistics (e.g., number of bytes transferred). It would have been interesting to also use the information from the various event fields, but the difficulty is to automatically identify the relevant ones. Moreover, the interesting fields can vary depending on the context. For example, the
Because the trace can be very large, we also add a maximum number of events for which the patterns are searched. Only the intervals in the trace of this number of events will be kept in memory. This will result in a faster search and will prevent memory problems. In practice, the interesting sequences are rather short and there is a huge gap between the number of events needed to have a few repetitions and the default maximum number. This approach will thus lead to a valid result, despite this limitation, as long as the maximum number of events is reasonably large.
To use the algorithm, we must define a support threshold. We offer two options to the users. First, they can directly define the minimum number of repetitions. That way, users do not have to know how many events are in the trace and how many events are included in the pattern. They only need to have an idea of how many jobs of the tasks are present in the trace segment, in order to specify a lower bound. The higher this bound is, the faster the algorithm will be. Indeed, the episodes will be dropped more quickly because their support will sooner be under the support threshold, and there will be fewer frequent elements to iterate at each step. We can see in Table
Example of a list of occurrences of events with their considered status according to a support threshold of 2.
Count | Element (event type) | Status |
---|---|---|
6 | sched_switch | Frequent |
2 | kmem_cache_alloc | Frequent |
2 | hrtimer_start | Frequent |
1 | mm_page_alloc | Under the threshold: no need to test |
Graph showing events in the trace as a function of time, as well as the two occurrences of a given episode that should be reported as the used support threshold has a value of 2.
Graph showing events in the trace as a function of time, as well as the only occurrence of a given episode that should not be reported as the used support threshold has a value of 2.
If the user does not know the number of repetitions, he can also define directly the number of frequent basic elements (i.e., episode of one event type) to be included by the algorithm. We also add a mode to force the pattern to start with a
Despite the threshold option, this algorithm can lead to exponential computation time growth if too few branches are removed. Indeed, if the support threshold is defined sufficiently high, the episodes will soon be discarded, which means that there will be fewer matches. However, if it is not the case, this can lead to some problems with the calculation time. To avoid that, we add a computation time limit that can be set by the users. The results up to that point will still be available.
The presentation of the resulting patterns was one of the challenges. It needed to let the user easily select one of them and load it in the pattern matching interface for the next step. There are some cases where the same event is really frequent in the trace. This results in many discovered episodes with almost only this event type. This was usually not relevant and was harder to present. To avoid this, we allow only one element of each type in an episode. The users can add more afterwards, in the pattern matching phase. Also, to avoid having too many results, we present only the largest patterns. Obviously, if a longer episode is supported, its subepisodes are also supported. The users will just have to select the containing episode and delete the unwanted events with the pattern editing interface in the next step.
The pattern matching is the phase where all occurrences of a pattern in a given trace will be found. It will normally correspond to the real-time jobs, but as it is not always the case, we will use the term
To define the execution model, the first possibility is to load the pattern discovered with the previous phase, if the user is not familiar with the events that define a task. Otherwise, there is an interface to define or edit the pattern. There are two main options offered to the users in this pattern matching dialog. The first option is the same TID mode, which is selected when the executions must start and end on the same thread. It is the most frequent case as real-time tasks are usually a simple task on a single thread. The other option is the different TIDs mode, which means that the executions can start and end on different threads. This case can be useful when a parent thread is creating a child and the execution will end on the child thread, like a real-time timer. Even in the same TID mode, it is possible to support executions on multiple threads, but each execution will stay on the same thread. This will be useful in case there is a thread pool. To define the start and end TIDs set, users can supply them directly or click on the corresponding lines in the main view of Trace Compass showing the different threads.
The events are defined using the event name and the event fields. The model definition also supports some basic operations. First, the keyword $tid will mean the execution TID. For example, the event
For the case where the start and end TIDs are the same, there is a graphical interface to add, remove, and change the order of the events in the definition. Each valid TID will have its own state machine instance to detect executions. Those instances are stored within
In case there is one or more
For the case where the start and end TIDs can be different, only start and end events definitions are supported. In addition, there are two different lists for the TIDs, one for the start and one for the end. While iterating over the trace events, we will first try to match the start definition and then the end definition. Depending on if we are matching the start or end, we will discard the event read if its TID is not in the corresponding TIDs list. Once both events are matched, the execution will be added to the list of valid executions and we will restart the pattern.
To know on which thread an event occurs, we need to keep some information. In fact, because the events in LTTng are collected by CPU core, and not by threads, we keep the running TIDs by CPU in a table. Each time a
By default, the complete trace is processed, but it is also possible to process only a segment of it. To do so, the users have two choices. First, they can determine directly the time range, either selecting it graphically or typing it. Only the events within the time range will be processed. Otherwise, they can select the maximum number of executions to detect. The first method is usually preferred when users can identify an interesting portion of the trace and the second method to avoid having memory problems for very large traces. Furthermore, the two can be combined and used simultaneously.
Predefined models are offered to help the users to write the matching definitions. Those include an option to include all events for a TID in a single execution. This can be useful to obtain statistics and execute the analyses on a manually selected trace segment corresponding to the complete thread execution. Other predefined definitions include the
Another useful option is to have nested executions, to define events to match at different levels. For example, first level executions can be defined by
We will define an
Once we have all the trace segments corresponding to the execution of the real-time jobs, they are displayed in the
This first view, shown in Figure
By default, the executions are sorted by duration, starting with the longest. This metric is based on the elapsed time and includes the time when the thread is not running, either blocked (waiting for some resources) or preempted (it could run, but other tasks are running). This facilitates the search for problems by starting to analyse the executions that take the most time first. Otherwise, it is also possible to sort the executions by total running time, total preempted time, or starting time. The running time can be useful if it is a low priority task and it is normal to be preempted. On the contrary, the preempted time can be preferred if the running time varies, but the task is of high priority and expected not to be preempted. Finally, the starting time can be used to see the difference between consecutive executions. Those times are calculated with the
The view is also synchronized with the other views in Trace Compass. That way, it is easy to click on an execution and see what was happening at that time on the system. For example, the
This view, illustrated in Figure
Once the user finds a suspicious or problematic execution, the goal is to further analyse it. The first step is to use the critical path analysis in Trace Compass which provides useful information about the significant dependencies of a thread. When the analysed thread is blocked, the view shows the resources or threads after which it waits. When a thread on the critical path is preempted, it may be complex to retrieve the priorities of the different threads running during each preemption. You can however check the
Without the critical path analysis, it would still be possible to show the other threads running when the execution of interest is preempted. However, in combination with the critical path, it is also possible to see the running threads when the various threads involved in the critical path are preempted. This means that if the analysed thread was waiting for a resource, and the thread owning this resource is preempted, it will be possible to analyse this scheduling. For example, if the execution thread was waiting for a message and the thread that would eventually send the message is preempted, then the running threads at that moment will be shown with their priorities. If the priority of a running thread is lower then the priority of the analysed thread, it will be displayed in a different colour to show that there is a priority inversion. There is also an option to select the CPUs of interest for the running threads. This can be useful if the system uses different groups of cores,
To know the scheduling priorities of the threads, we keep a list for each TID of priority changes in the form of ordered timestamps with corresponding priority. We build that list at the same time as searching for the execution patterns, to avoid the cost of reading the trace twice. This is mainly done with the
Another mode of this view is to show all the threads that interact with the execution thread. This can be useful to understand the system without looking at all the threads that are not related. Internally, it uses the dependencies graphs calculated with the critical path in Trace Compass. There can be more than one graph in the case where the threads are not all linked.
Two options are offered. It is first possible to show the threads that interact directly with the selected one. For example, a thread can be wakeup because another thread releases the futex it was waiting for. The other option is to also show the indirect relations. For example, if thread A is interacting with thread B that is interacting with thread C, then thread C will be shown as indirectly related to A.
To populate this information from the dependencies graph, we first get the graph containing the selected thread. Then, in the case of direct interactions, we just add threads linked from the selected thread within the execution time range. For the indirect interactions, we cannot take all the threads in the graph because this covers more than just the interactions within the time range of interest. Instead, we populate sets with related threads. When a thread A is interacting with thread B in the time range of interest, we check if A or B is in existing sets. If not, we create a new set with the two threads. If only one of them is in a set, we add the other to the same set. If they are in different sets, we merge them.
To avoid iterating through all sets to search if a thread is present, we store the information in a
It can be useful in some situations to have more information than only the critical path and related threads. The goal here is to present different kernel facilities related to a specific job execution. Those will be presented in a timeline. The time range can match an execution of the
Three options are proposed. First, there is the high resolution timer (hrtimer). It can be in the state
Then, there is the futex. It can be in the states
Finally, there is the queue. It can be in 4 different states. When a receiver tries to send a message, it will normally result in the state
To obtain the information at the start of the time range of the execution, there is an option to select the maximum number of preceeding events to process in the trace, before the start of the desired range. This is a good compromise between the analysis time and the completeness of the information. The different state machines for each resource are kept in hashmap, and each state change occurs in constant time. Thus, the time complexity grows linearly with the number of events in the trace.
The traces used to test the tool were generated with LTTng tracer version 2.6 with all kernel events enabled on a Linux Preempt-RT Kernel version 3.12. We used a 4-physical-core, 2.67 GHz, machine with 6 Gb of RAM. The first set of data was 5 real-time threads preempting each other, and traces from 483 k to 20.6 M events were collected. The second set of data was traces with up to 16042 different TIDs. Also, traces with various scheduling policies were collected but show no significant difference in performance.
First, we can see in Figure
Average time consumed by the pattern discovery algorithm for varying thresholds for a given trace.
The number of possibilities grows rapidly, as the factorial of the number of basic elements. That explains the general exponential trend, even if most branches are not checked due to the lack of support for the corresponding episode. For example, with the simple case presented in Table
Figure
The detection of executions relies on various factors. The first test was to compare it with the reading of the trace. During the detection, we try to parse the events only if necessary, and we ensure we are only parsing the name and the content once. We compare the executions detection of two models with reading the events name only and with reading the name and the content. The first model is defined by the start and the end of a
The first model tested returns a few hundred executions and the second one returns up to 300,000 executions with the bigger trace, which has more than 20 million events. Each of those executions, as presented in Figure
Time taken for our execution detection (nanosleep analysis and mq_send analysis) compared to trace reading in Trace Compass.
The second test compares the detection to other analyses in Trace Compass. The results are shown in Figure
Time taken for our execution detection (nanosleep analysis and mq_send analysis) compared to other analyses in Trace Compass.
Furthermore, the fields matching appears not to significantly change the execution time, even if we need to read the content of the event. This is explained by the fact that reading the content takes approximately 30% more time but only approximately one percent of the events are concerned, which would lead to a one-third of a percent increase. This one percent is already a high percentage because, to be concerned by the field matching, an event must be on the relevant TID, in the right state, and must have matching fields. However, the worst case would be reading all events, approximately leading to a 30% increase.
The previous tests were made with the same TID mode, which is more complex than the different TIDs mode. However, we tested with a trace containing more than 8000 valid TIDs to do the matches, and the same TID mode was significantly slower, around 13%. This is because we need to create an instance by TID and to use a hashmap to match the tid with the instance. The results are shown in Table
Execution time for the two modes (same TID mode and different TIDs mode) compared to trace reading (in s).
Read name | Read content | Same TID | Diff TID | |
---|---|---|---|---|
Average | 2,303 | 3,326 | 2,918 | 2,576 |
STD | 0,025 | 0,025 | 0,023 | 0,026 |
For all these tests, we compare the number of executions detected with the number of events shown by the Trace Compass
The views appear to bring another limitation. They can lead to memory problems if there are too many lines, and they can take a long time to refresh. However, there is no point in displaying thousands of executions on separate rows. Thus, the problem will only occur when the default limits are increased. All our views use the same structure, inherited from Trace Compass and follow the same trend. We can see the results in Figure
Time consumed in drawing Comparison View in Trace Compass as a function of increasing number of executions.
Two cases are presented to show the usage of the proposed tool and how it can help developers to quickly find problems.
In the first example, extracted from an industrial use case, a task is initiated from a real-time timer each 250
With our tool, we define the job execution as the interval between the end of the code execution of two consecutive threads created by the timer. This way, if the problem occurs either with the timer thread or with the created threads, it will be detected as we can observe in Figure
Definition of job execution as the interval between the end of the code execution of two consecutive threads created by the timer.
Perspective Time View that helps identify problematic executions using a global perspective.
Running the analysis extracts the executions and highlights the longer executions. We can see in Figure
Control Flow View showing the gap between executions.
Control Flow View showing the preemption of the timer thread.
Comparison View in Trace Compass showing the difference in job execution times and statuses allowing the user to identify the most time consuming jobs.
Alternatively, we can also display the critical path of one of the longest executions and the complementary information. That informs us that the execution was preempted because another thread had a higher priority. It was a configuration problem, because this thread was not supposed to have a higher priority in that situation. The main difficulty lied in the fact that a very large number of threads were involved, designed by different programmers. Once the problem and its origin were pinpointed by the tool, the remedy was simple to devise.
This is a synthetic case, to show the usage of more advanced features. In that case, a high priority task is waiting for a message, but the thread supposed to send the message is preempted by other tasks. There is no priority inheritance with message queues, because we only know afterwards which message goes from which thread to which other thread. In that case, it can be considered as a priority inversion, because the higher priority thread is indirectly waiting for medium priority threads which are preempting the low priority thread.
The first step is to define our model. We use the pattern discovery tool with 12 basic events. This gives us many possible patterns, including the one we where looking for, based on the high resolution timer (
Critical Flow View showing that the task thread (TID 3988) was blocked by a thread (TID 3950) that was preempted.
Critical Path Complement View showing that the thread blocked was of a lower priority than other threads that preempted it.
Extended Time View showing message queues and allowing seeing if there are multiple threads waiting to receive or send a message.
Instead of using the critical path, another way of finding that the problem is caused by the thread with TID 3950 would have been to use the
We show that the search of periodic executions can be useful to efficiently find problems in real-time systems. Running the analysis to find scheduling problems and computing the critical path analysis for the whole trace would have resulted in more complexity to find the interesting results. In fact, when a task with higher priority is ready to run, the short time before it is scheduled can be considered as a priority inversion, if another task is running, and returning all that information would have resulted in a lot of noise. Furthermore, the part of the trace corresponding to the repetitive task of interest could be only a subset of the events that occur on the thread. It is then easier to define the model of the task and show the results only for the outlier executions.
The two test cases presented show that the comparison view can be effective to find scheduling problems, when comparing various executions of a periodic task. None of the tools presented in Section
The other views developed have also been used to find problems like priority inversion and could probably be extended. For example, analyses on the cache memory or on the communication between machines would be interesting added values.
The automatic detection of real-time tasks is a field that deserves further work. With many threads, it can be difficult to identify quickly which are the threads of interest. Often, the real-time tasks will have a periodic pattern and will use high resolution timers. It would be interesting to explore the detection of those patterns to allow focusing directly on the corresponding threads. Otherwise, it would also be possible to use the thread priorities to locate real-time threads. From there, there is more work to be done to let users define the job executions, without having an extensive knowledge of kernel tracing.
The present work could be refined to simplify its use for common cases not requiring the advanced functionalities. In addition, some concepts could be decoupled from specific hard-coded events, in order to generalize the procedure to use the same tool for different tracers and for custom structures.
Furthermore, the memory scalability can be problematic for very large traces, because the information concerning valid executions is kept in memory. Instead of only limiting the number of executions or events used, it could be interesting to write the data in a structure similar to the state history tree used by the
We demonstrated that a real-time specific kernel trace analysis tool can be used to quickly find complex real-time problems. It was shown that a general model defined by the user, combined with a comparison view, can be very effective to pinpoint the problematic job executions. We also presented a case where the model ability to define a job execution, starting and ending on different threads, was useful. Moreover, we developed an approach to present various possible execution models to the user using pattern discovery. Finally, we presented some interesting avenues to extend the critical path analysis in order to detect scheduling problems. All these approaches have also been tested, and the performance measurements were presented, to show in which conditions they are the most efficient.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors are grateful to Francis Giraldeau, Raphaël Beamonte, and Geneviève Bastien for the reviews and useful comments. This research is supported by OPAL-RT, CAE, the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Consortium for Research and Innovation in Aerospace in Québec (CRIAQ).