Per this article, bpftrace lacks ability to map stack frames to functions if a sw was built without frame-pointers.
There're ebpf-based scripts for profiling called bcc-tools https://github.com/iovisor/bcc/ There's -f switch so their output can be passed to perf flamegraph.
There's a script offcputime that can give that. But even more interesting is offwaketime which combines stacks of the task that slept (bottom half) and the task that woke it up (top half), separated with --. Note: order of execution of the top half is reversed compared to the other half, i.e. execution starts at the top and goes down.
Example (profiling and creating a flamegraph):
offwaketime -kf 30 > out.txt
flamegraph --color=chain --countname=us < out.txt > out.svg
- there doesn't seem to be a way to run a profilee under one of these scripts, though they can be attached by pid.
- Amounts of time code took is often greater than the amount of time profiling took. I think it is because the time is a sum from various threads that may have executed the code. Like, if the same stack simultaneously appears in threads 1, 2, 3 and they're all blocked, I think in results is shown just one stack with the sum from all 3 threads. So if you measured 30 sec., and all that time they were blocked, you may get one stack with 90 seconds sleeping time. Not sure this is correct though, I did not check that.
bcc has a tool deadlock for detecting potential deadlocks. Haven't used it.
- example in docs https://github.com/iovisor/bcc/blob/master/tools/offcputime_example.txt
- a blog post by Brendan Gregg on that matter http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html