Skip to content

Commit ca3b296

Browse files
jbachorikclaude
andcommitted
Add branch prediction benchmarking infrastructure
Scripts to measure impact of likely/unlikely hints using perf and Renaissance benchmarks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent d8291c5 commit ca3b296

4 files changed

Lines changed: 691 additions & 0 deletions

File tree

ddprof-lib/benchmarks/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# DDProf Benchmarks
2+
3+
This directory contains various benchmarks and performance tests for the Java profiler.
4+
5+
## Available Benchmarks
6+
7+
### branch-prediction/
8+
9+
Performance testing for branch prediction hints (likely/unlikely) using perf and Renaissance benchmarks.
10+
11+
See [branch-prediction/PERF_TESTING.md](branch-prediction/PERF_TESTING.md) for detailed usage instructions.
12+
13+
Quick start:
14+
```bash
15+
cd branch-prediction
16+
./test_branch_prediction_perf.sh
17+
```
18+
19+
## Requirements
20+
21+
- Linux with perf support (bare metal or appropriate VM instance type)
22+
- Java 8 or later
23+
- Built profiler library (run `./gradlew ddprof-lib:build` from repository root)
24+
25+
## Adding New Benchmarks
26+
27+
When adding new benchmarks to this directory:
28+
29+
1. Create a subdirectory with a descriptive name
30+
2. Include a README or detailed documentation
31+
3. Make scripts executable and self-contained
32+
4. Auto-download dependencies when possible
33+
5. Update this README with a brief description
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Branch Prediction Performance Testing
2+
3+
This guide explains how to measure the performance impact of branch prediction hints (likely/unlikely) using `perf` and Renaissance benchmarks.
4+
5+
## Prerequisites
6+
7+
1. **Renaissance benchmarks JAR** - Will be automatically downloaded to this directory if not present
8+
- Version 0.16.1 will be downloaded from: https://github.com/renaissance-benchmarks/renaissance/releases
9+
- Manual download is not required
10+
11+
2. **perf tool** - Install if not present:
12+
```bash
13+
# For AWS EC2 instances:
14+
sudo apt-get install linux-tools-$(uname -r) linux-tools-aws
15+
16+
# For generic Ubuntu/Debian:
17+
sudo apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r)
18+
```
19+
20+
3. **perf permissions** - Allow non-root perf access:
21+
```bash
22+
sudo sysctl -w kernel.perf_event_paranoid=1
23+
```
24+
25+
## Quick Start
26+
27+
### Option 1: Single Test Run
28+
29+
Test the current build:
30+
31+
```bash
32+
./test_branch_prediction_perf.sh [benchmark] [test_name]
33+
```
34+
35+
Examples:
36+
```bash
37+
# Test with default benchmark (akka-uct)
38+
./test_branch_prediction_perf.sh
39+
40+
# Test with specific benchmark
41+
./test_branch_prediction_perf.sh finagle-chirper optimized
42+
43+
# Test another benchmark
44+
./test_branch_prediction_perf.sh scala-kmeans baseline
45+
```
46+
47+
### Option 2: Automated Comparison (Recommended)
48+
49+
Compare main branch (baseline) vs jb/likely branch (optimized):
50+
51+
```bash
52+
./compare_branch_prediction.sh [benchmark]
53+
```
54+
55+
This will:
56+
1. Build and test the optimized version (jb/likely)
57+
2. Switch to main and build baseline
58+
3. Test baseline version
59+
4. Switch back to jb/likely
60+
5. Display comparison table
61+
62+
## Understanding the Output
63+
64+
### perf stat Metrics
65+
66+
The test collects these key metrics:
67+
68+
- **branch-misses**: Number of mispredicted branches
69+
- Lower is better
70+
- Likely/unlikely hints should reduce this
71+
72+
- **L1-icache-load-misses**: L1 instruction cache misses
73+
- Lower is better
74+
- Better code layout can improve this
75+
76+
- **instructions**: Total instructions executed
77+
- Should be similar between runs
78+
79+
- **cycles**: Total CPU cycles
80+
- Lower is better overall
81+
82+
### Output Files
83+
84+
All results are saved to `perf_results/`:
85+
86+
- `{test_name}_stat.txt`: Summary statistics
87+
- `{test_name}_record.data`: Raw perf recording
88+
- `{test_name}_record_report.txt`: Detailed report filtered to libjavaProfiler.so
89+
90+
## Detailed Analysis
91+
92+
### View Interactive Report
93+
94+
```bash
95+
perf report -i perf_results/optimized_record.data --dsos=libjavaProfiler.so
96+
```
97+
98+
This shows hotspots and call graphs for the profiler library.
99+
100+
### Compare Specific Functions
101+
102+
```bash
103+
# Baseline
104+
perf report -i perf_results/baseline_record.data \
105+
--dsos=libjavaProfiler.so \
106+
--stdio | grep -A 10 "functionName"
107+
108+
# Optimized
109+
perf report -i perf_results/optimized_record.data \
110+
--dsos=libjavaProfiler.so \
111+
--stdio | grep -A 10 "functionName"
112+
```
113+
114+
### Manual perf Commands (as in PR description)
115+
116+
If you want to run perf manually:
117+
118+
```bash
119+
# Start Renaissance benchmark
120+
java -agentpath:ddprof-lib/build/lib/main/release/linux/x64/libjavaProfiler.so \
121+
-jar ~/renaissance-gpl-0.16.1.jar \
122+
akka-uct -r 9999 &
123+
PID=$!
124+
125+
# Collect statistics for 60 seconds
126+
perf stat -e branch-misses,L1-icache-load-misses,instructions \
127+
-p ${PID} -- sleep 60
128+
129+
# Record detailed data
130+
perf record -e L1-icache-load-misses -g -p ${PID} -- sleep 60
131+
132+
# View report
133+
perf report --dsos=libjavaProfiler.so
134+
```
135+
136+
## Recommended Benchmarks
137+
138+
These Renaissance benchmarks work well for profiler testing:
139+
140+
- **akka-uct** (default): Actor-based workload, good CPU usage
141+
- **finagle-chirper**: High concurrency, network-like patterns
142+
- **scala-kmeans**: CPU-intensive computation
143+
- **future-genetic**: Async/future-heavy workload
144+
145+
## Interpreting Results
146+
147+
### What to Look For
148+
149+
1. **Branch Miss Reduction**:
150+
- A reduction in branch-misses indicates better prediction
151+
- Even 1-2% improvement is meaningful at scale
152+
153+
2. **I-cache Impact**:
154+
- L1-icache-load-misses changes indicate code layout effects
155+
- May increase slightly if code size grows
156+
157+
3. **Overall Performance**:
158+
- Check cycles and instructions
159+
- IPC (instructions per cycle) = instructions / cycles
160+
- Higher IPC is better
161+
162+
### Expected Impact
163+
164+
Branch prediction hints typically provide:
165+
- 1-5% reduction in branch misses for hot paths
166+
- Minimal impact on total instruction count
167+
- Slight improvement in overall cycles (if hints are correct)
168+
169+
## Troubleshooting
170+
171+
### Permission Denied
172+
173+
```bash
174+
sudo sysctl -w kernel.perf_event_paranoid=1
175+
```
176+
177+
Or run the script with sudo (not recommended).
178+
179+
### Renaissance JAR Not Found
180+
181+
The script will automatically download it. If automatic download fails:
182+
- Check your internet connection
183+
- Download manually from: https://github.com/renaissance-benchmarks/renaissance/releases
184+
- Place in the same directory as the test scripts
185+
186+
### Profiler Library Not Built
187+
188+
```bash
189+
./gradlew ddprof-lib:build
190+
```
191+
192+
### High Variance in Results
193+
194+
- Increase test duration in the script (default: 60s)
195+
- Ensure system is idle (no other heavy processes)
196+
- Run multiple iterations and average results
197+
- Pin CPU affinity for more stable results
198+
199+
## Advanced Usage
200+
201+
### Multiple Runs for Statistical Confidence
202+
203+
```bash
204+
for i in {1..5}; do
205+
./test_branch_prediction_perf.sh akka-uct "run_$i"
206+
done
207+
```
208+
209+
Then average the results.
210+
211+
### Different Perf Events
212+
213+
Edit the script to add more events:
214+
215+
```bash
216+
perf stat -e branch-misses,branches,branch-load-misses \
217+
-e L1-dcache-load-misses,L1-dcache-loads \
218+
-e LLC-load-misses,LLC-loads \
219+
-e cpu-cycles,instructions \
220+
...
221+
```
222+
223+
See available events: `perf list`
224+
225+
### Flame Graphs
226+
227+
Generate flame graphs for visual analysis:
228+
229+
```bash
230+
perf record -F 999 -g -p ${PID} -- sleep 60
231+
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
232+
```
233+
234+
Requires: https://github.com/brendangregg/FlameGraph

0 commit comments

Comments
 (0)