Perfetto GPU Flow Artifacts

Introduction

Performance software engineers often need to profile and analyze the performance of their applications. One of the tools that can serve for this purpose is Perfetto, a powerful open-source performance tracing system. However, Perfetto trace rendering can sometimes be confusing to read, especially for the flows that indicates the dependency between different events.

In this blog post, I would like to discuss the root cause of the problem and suggest a potential solution to fix the issue.

Perfetto Flow Artifacts

Sometimes, we will see flows in Perfetto traces that do not match our expectation. For example, one CUDA kernel launch function from CPU can only launch one CUDA kernel on GPU. But in some cases, we can see multiple flows from one CUDA kernel launch function on CPU to multiple CUDA kernels on GPU, which is not possible in reality. Moreover, in some cases, we can see CUDA synchronization events flow from GPU back to a CPU function that completes even before the CUDA synchronization function call from CPU, which is also not possible in reality.

An Artifact of Multiple Events Submitted from One Event

Since Perfetto renders traces from raw trace data, previously I do not know if the problem is caused by the trace rendering or the trace data collection. So this time, I did a little bit of investigation.

There are mainly three types of events in the trace data: complete events, flow start events, and flow end events. The complete events are represented by "ph": "X", the flow start events are represented by "ph": "s", and the flow end events are represented by "ph": "f". "ts" is the time stamp of the event, and "dur" is the duration of the event. The flow start and end events with the same "id" are considered as a flow, and the time span of the flow is from the time stamp of the flow start event to the time stamp of the flow end event. The args field of each event contains the metadata of the event, which will not be used for Perfetto trace rendering.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"ph": "X", "cat": "cpu_op", "name": "MseLossBackward0", "pid": 237, "tid": 257,
"ts": 919730262716.414, "dur": 79.231,
"args": {
"External id": 514,"Sequence number": 59, "Fwd thread id": 1, "Record function id": 0, "Concrete Inputs": [""], "Input type": ["float"], "Input Strides": [[]], "Input Dims": [[]], "Ev Idx": 1
}
},
{
"ph": "f", "id": 1, "pid": 237, "tid": 257, "ts": 919730262716.414,
"cat": "fwdbwd", "name": "fwdbwd", "bp": "e"
},
{
"ph": "s", "id": 15, "pid": 237, "tid": 237, "ts": 919730261596.889,
"cat": "fwdbwd", "name": "fwdbwd"
},

Perfetto will use "id" to associate the flow start and end events. However, there is no data specifying which complete events the flow start and end events are associated with. Therefore, Perfetto will just match the flow start and end events into the complete events whose time span covers the flow start time stamp or flow end time stamp. Normally, this should work pretty well.

However, what if I have two complete events which execute sequentially and there is no time gap between the end of the first complete event and the start of the second complete event? For example, we have two complete events with the following time stamps and durations, and each of them has an associated flow start event at the end of the complete event.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"ph": "X", "cat": "cpu_op", "name": "MseLossBackward0", "pid": 237, "tid": 257,
"ts": 100, "dur": 10,
},
{
"ph": "X", "cat": "cpu_op", "name": "MseLossBackward1", "pid": 237, "tid": 257,
"ts": 110, "dur": 10,
},
{
"ph": "s", "id": 1, "pid": 237, "tid": 257, "ts": 110,
"cat": "fwdbwd", "name": "fwdbwd0",
},
{
"ph": "s", "id": 2, "pid": 237, "tid": 257, "ts": 120,
"cat": "fwdbwd", "name": "fwdbwd1",
},

Because we know or we assume the flow start event will happen at the end of the complete event, we know that fwdbwd0 is associated with MseLossBackward0 and fwdbwd1 is associated with MseLossBackward1. However, Perfetto does not know this assumption and it also cannot make this assumption because it is completely possible that the flow start event can happen at any time during the complete event. Therefore, in this case, there is an ambiguity for Perfetto to determine whether fwdbwd0 is associated with MseLossBackward0 or MseLossBackward1, because fwdbwd0 has the same time stamp as the start time stamp of MseLossBackward1 and the same time stamp as the end time stamp of MseLossBackward0. Therefore, Perfetto will have a chance to associate both fwdbwd0 and fwdbwd1 to MseLossBackward1, which results in an artifact of multiple flows from one complete event.

A trick to fix this issue is to subtract a very small value from the time stamp of the flow start event, so that the flow start event will be unambiguously associated with the complete event that it is supposed to be associated with. For example, we can subtract 0.001 microsecond from the time stamp of the flow start event, which is negligible for human eyes but can help Perfetto to correctly associate the flow start event with the complete event.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"ph": "X", "cat": "cpu_op", "name": "MseLossBackward0", "pid": 237, "tid": 257,
"ts": 100, "dur": 10,
},
{
"ph": "X", "cat": "cpu_op", "name": "MseLossBackward1", "pid": 237, "tid": 257,
"ts": 110, "dur": 10,
},
{
"ph": "s", "id": 1, "pid": 237, "tid": 257, "ts": 109.999,
"cat": "fwdbwd", "name": "fwdbwd0",
},
{
"ph": "s", "id": 2, "pid": 237, "tid": 257, "ts": 119.999,
"cat": "fwdbwd", "name": "fwdbwd1",
},
```

In this case, there is no ambiguity for Perfetto to associate fwdbwd0 with MseLossBackward0 and fwdbwd1 with MseLossBackward1. Consequently, the artifact of multiple flows from one complete event will be resolved.

Conclusions

There is nothing wrong with the Perfetto trace data or rendering. It is just Perfetto does not have enough domain specific knowledge. Remember, Perfetto is a general purpose trace rendering engine, and it is not only used for rendering CPU and GPU events, but also used for rendering events from other domains.

Author

Lei Mao

Posted on

02-20-2026

Updated on

02-20-2026

Licensed under


Comments