|
| 1 | +# Branch Prediction Performance Testing |
| 2 | + |
| 3 | +This guide explains how to measure the performance impact of branch prediction hints (likely/unlikely) using `perf` and Renaissance benchmarks. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +1. **Renaissance benchmarks JAR** - Will be automatically downloaded to this directory if not present |
| 8 | + - Version 0.16.1 will be downloaded from: https://github.com/renaissance-benchmarks/renaissance/releases |
| 9 | + - Manual download is not required |
| 10 | + |
| 11 | +2. **perf tool** - Install if not present: |
| 12 | + ```bash |
| 13 | + # For AWS EC2 instances: |
| 14 | + sudo apt-get install linux-tools-$(uname -r) linux-tools-aws |
| 15 | + |
| 16 | + # For generic Ubuntu/Debian: |
| 17 | + sudo apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r) |
| 18 | + ``` |
| 19 | + |
| 20 | +3. **perf permissions** - Allow non-root perf access: |
| 21 | + ```bash |
| 22 | + sudo sysctl -w kernel.perf_event_paranoid=1 |
| 23 | + ``` |
| 24 | + |
| 25 | +## Quick Start |
| 26 | + |
| 27 | +### Option 1: Single Test Run |
| 28 | + |
| 29 | +Test the current build: |
| 30 | + |
| 31 | +```bash |
| 32 | +./test_branch_prediction_perf.sh [benchmark] [test_name] |
| 33 | +``` |
| 34 | + |
| 35 | +Examples: |
| 36 | +```bash |
| 37 | +# Test with default benchmark (akka-uct) |
| 38 | +./test_branch_prediction_perf.sh |
| 39 | + |
| 40 | +# Test with specific benchmark |
| 41 | +./test_branch_prediction_perf.sh finagle-chirper optimized |
| 42 | + |
| 43 | +# Test another benchmark |
| 44 | +./test_branch_prediction_perf.sh scala-kmeans baseline |
| 45 | +``` |
| 46 | + |
| 47 | +### Option 2: Automated Comparison (Recommended) |
| 48 | + |
| 49 | +Compare main branch (baseline) vs jb/likely branch (optimized): |
| 50 | + |
| 51 | +```bash |
| 52 | +./compare_branch_prediction.sh [benchmark] |
| 53 | +``` |
| 54 | + |
| 55 | +This will: |
| 56 | +1. Build and test the optimized version (jb/likely) |
| 57 | +2. Switch to main and build baseline |
| 58 | +3. Test baseline version |
| 59 | +4. Switch back to jb/likely |
| 60 | +5. Display comparison table |
| 61 | + |
| 62 | +## Understanding the Output |
| 63 | + |
| 64 | +### perf stat Metrics |
| 65 | + |
| 66 | +The test collects these key metrics: |
| 67 | + |
| 68 | +- **branch-misses**: Number of mispredicted branches |
| 69 | + - Lower is better |
| 70 | + - Likely/unlikely hints should reduce this |
| 71 | + |
| 72 | +- **L1-icache-load-misses**: L1 instruction cache misses |
| 73 | + - Lower is better |
| 74 | + - Better code layout can improve this |
| 75 | + |
| 76 | +- **instructions**: Total instructions executed |
| 77 | + - Should be similar between runs |
| 78 | + |
| 79 | +- **cycles**: Total CPU cycles |
| 80 | + - Lower is better overall |
| 81 | + |
| 82 | +### Output Files |
| 83 | + |
| 84 | +All results are saved to `perf_results/`: |
| 85 | + |
| 86 | +- `{test_name}_stat.txt`: Summary statistics |
| 87 | +- `{test_name}_record.data`: Raw perf recording |
| 88 | +- `{test_name}_record_report.txt`: Detailed report filtered to libjavaProfiler.so |
| 89 | + |
| 90 | +## Detailed Analysis |
| 91 | + |
| 92 | +### View Interactive Report |
| 93 | + |
| 94 | +```bash |
| 95 | +perf report -i perf_results/optimized_record.data --dsos=libjavaProfiler.so |
| 96 | +``` |
| 97 | + |
| 98 | +This shows hotspots and call graphs for the profiler library. |
| 99 | + |
| 100 | +### Compare Specific Functions |
| 101 | + |
| 102 | +```bash |
| 103 | +# Baseline |
| 104 | +perf report -i perf_results/baseline_record.data \ |
| 105 | + --dsos=libjavaProfiler.so \ |
| 106 | + --stdio | grep -A 10 "functionName" |
| 107 | + |
| 108 | +# Optimized |
| 109 | +perf report -i perf_results/optimized_record.data \ |
| 110 | + --dsos=libjavaProfiler.so \ |
| 111 | + --stdio | grep -A 10 "functionName" |
| 112 | +``` |
| 113 | + |
| 114 | +### Manual perf Commands (as in PR description) |
| 115 | + |
| 116 | +If you want to run perf manually: |
| 117 | + |
| 118 | +```bash |
| 119 | +# Start Renaissance benchmark |
| 120 | +java -agentpath:ddprof-lib/build/lib/main/release/linux/x64/libjavaProfiler.so \ |
| 121 | + -jar ~/renaissance-gpl-0.16.1.jar \ |
| 122 | + akka-uct -r 9999 & |
| 123 | +PID=$! |
| 124 | + |
| 125 | +# Collect statistics for 60 seconds |
| 126 | +perf stat -e branch-misses,L1-icache-load-misses,instructions \ |
| 127 | + -p ${PID} -- sleep 60 |
| 128 | + |
| 129 | +# Record detailed data |
| 130 | +perf record -e L1-icache-load-misses -g -p ${PID} -- sleep 60 |
| 131 | + |
| 132 | +# View report |
| 133 | +perf report --dsos=libjavaProfiler.so |
| 134 | +``` |
| 135 | + |
| 136 | +## Recommended Benchmarks |
| 137 | + |
| 138 | +These Renaissance benchmarks work well for profiler testing: |
| 139 | + |
| 140 | +- **akka-uct** (default): Actor-based workload, good CPU usage |
| 141 | +- **finagle-chirper**: High concurrency, network-like patterns |
| 142 | +- **scala-kmeans**: CPU-intensive computation |
| 143 | +- **future-genetic**: Async/future-heavy workload |
| 144 | + |
| 145 | +## Interpreting Results |
| 146 | + |
| 147 | +### What to Look For |
| 148 | + |
| 149 | +1. **Branch Miss Reduction**: |
| 150 | + - A reduction in branch-misses indicates better prediction |
| 151 | + - Even 1-2% improvement is meaningful at scale |
| 152 | + |
| 153 | +2. **I-cache Impact**: |
| 154 | + - L1-icache-load-misses changes indicate code layout effects |
| 155 | + - May increase slightly if code size grows |
| 156 | + |
| 157 | +3. **Overall Performance**: |
| 158 | + - Check cycles and instructions |
| 159 | + - IPC (instructions per cycle) = instructions / cycles |
| 160 | + - Higher IPC is better |
| 161 | + |
| 162 | +### Expected Impact |
| 163 | + |
| 164 | +Branch prediction hints typically provide: |
| 165 | +- 1-5% reduction in branch misses for hot paths |
| 166 | +- Minimal impact on total instruction count |
| 167 | +- Slight improvement in overall cycles (if hints are correct) |
| 168 | + |
| 169 | +## Troubleshooting |
| 170 | + |
| 171 | +### Permission Denied |
| 172 | + |
| 173 | +```bash |
| 174 | +sudo sysctl -w kernel.perf_event_paranoid=1 |
| 175 | +``` |
| 176 | + |
| 177 | +Or run the script with sudo (not recommended). |
| 178 | + |
| 179 | +### Renaissance JAR Not Found |
| 180 | + |
| 181 | +The script will automatically download it. If automatic download fails: |
| 182 | +- Check your internet connection |
| 183 | +- Download manually from: https://github.com/renaissance-benchmarks/renaissance/releases |
| 184 | +- Place in the same directory as the test scripts |
| 185 | + |
| 186 | +### Profiler Library Not Built |
| 187 | + |
| 188 | +```bash |
| 189 | +./gradlew ddprof-lib:build |
| 190 | +``` |
| 191 | + |
| 192 | +### High Variance in Results |
| 193 | + |
| 194 | +- Increase test duration in the script (default: 60s) |
| 195 | +- Ensure system is idle (no other heavy processes) |
| 196 | +- Run multiple iterations and average results |
| 197 | +- Pin CPU affinity for more stable results |
| 198 | + |
| 199 | +## Advanced Usage |
| 200 | + |
| 201 | +### Multiple Runs for Statistical Confidence |
| 202 | + |
| 203 | +```bash |
| 204 | +for i in {1..5}; do |
| 205 | + ./test_branch_prediction_perf.sh akka-uct "run_$i" |
| 206 | +done |
| 207 | +``` |
| 208 | + |
| 209 | +Then average the results. |
| 210 | + |
| 211 | +### Different Perf Events |
| 212 | + |
| 213 | +Edit the script to add more events: |
| 214 | + |
| 215 | +```bash |
| 216 | +perf stat -e branch-misses,branches,branch-load-misses \ |
| 217 | + -e L1-dcache-load-misses,L1-dcache-loads \ |
| 218 | + -e LLC-load-misses,LLC-loads \ |
| 219 | + -e cpu-cycles,instructions \ |
| 220 | + ... |
| 221 | +``` |
| 222 | + |
| 223 | +See available events: `perf list` |
| 224 | + |
| 225 | +### Flame Graphs |
| 226 | + |
| 227 | +Generate flame graphs for visual analysis: |
| 228 | + |
| 229 | +```bash |
| 230 | +perf record -F 999 -g -p ${PID} -- sleep 60 |
| 231 | +perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg |
| 232 | +``` |
| 233 | + |
| 234 | +Requires: https://github.com/brendangregg/FlameGraph |
0 commit comments