Profiling Guest Programs

Profiling is one of the most important tools for understanding and optimizing zkVM guest code. This guide shows you how to generate cycle-count profiles and visualize them using flamegraphs.

Background

Profiling tools like pprof and perf allow collecting performance information over the entire execution of your program. RISC Zero has experimental support for generating pprof files for cycle counts.

How Profiling Works

Sampling CPU profilers record the current call stack at regular intervals to show where your program spends its time. RISC Zero’s profiler captures the call stack at every cycle of program execution.

The zkVM profiler captures the call stack at every cycle, not at sampling intervals. This is practical because zkVM executions are short and synchronous, with no measurement overhead.

Prerequisites

Install RISC Zero tools

Follow the installation guide if you haven’t already.

Install Go

The pprof tool is bundled with Go. Install Go for your platform.

Generating Profiles

Basic Usage

Set the RISC0_PPROF_OUT environment variable to specify where to write profiling data:

RISC0_PPROF_OUT=./profile.pb RISC0_DEV_MODE=1 cargo run

Always profile in dev mode to avoid unnecessary proving time. Use RISC0_DEV_MODE=1 to enable it.

Example: Profiling ECDSA Verification

# In the RISC Zero repository
cd examples/ecdsa/k256
RISC0_PPROF_OUT=ecdsa_verify.pb RISC0_DEV_MODE=1 cargo run

Environment Variables

Variable	Purpose	Example
`RISC0_PPROF_OUT`	Output file path for profiling data	`./profile.pb`
`RISC0_DEV_MODE`	Enable dev mode (skip proving)	`1`
`RISC0_INFO`	Show execution statistics	`1`
`RISC0_PPROF_ENABLE_INLINE_FUNCTIONS`	Track inlined functions (slower)	`yes`

Visualizing Profiles

Starting the pprof Web Interface

After generating a profile, visualize it with:

go tool pprof -http=127.0.0.1:8000 profile.pb

Then open http://localhost:8000 in your browser.

Flamegraph View

The flamegraph is one of the most useful visualizations. Access it at:

http://localhost:8000/ui/flamegraph

In a flamegraph:

The x-axis represents the proportion of total cycles
The y-axis represents the call stack depth
Wider sections indicate more cycles spent
Click on sections to zoom in

Example Flamegraph

When viewing a flamegraph from ECDSA signature verification, you’ll typically see that the lincomb (linear combination) operation accounts for over 95% of the total cycle count in ECDSA verification.

Profiling Example: Fibonacci

The profiling example compares three Fibonacci implementations:

Implementation 1: Basic Iterative

#[inline(never)]
pub fn fibonacci_1(n: u32) -> u64 {
    let (mut a, mut b) = (0, 1);
    if n <= 1 {
        return n as u64;
    }
    let mut i = 2;
    while i <= n {
        let c = a + b;
        a = b;
        b = c;
        i += 1;
    }
    b
}

Implementation 2: Loop Unrolling

#[inline(never)]
pub fn fibonacci_2(n: u32) -> u64 {
    let (mut a, mut b) = (0, 1);
    if n <= 1 {
        return n as u64;
    }
    let mut i = 2;
    while i <= n {
        if i + 5 <= n {
            // Compute 5 iterations at once
            let c = a + b;
            let d = b + c;
            let e = c + d;
            let f = d + e;
            let g = e + f;
            a = f;
            b = g;
            i += 5;
        } else {
            let c = a + b;
            a = b;
            b = c;
            i += 1;
        }
    }
    b
}

Implementation 3: Matrix Exponentiation

use nalgebra::Matrix2;

#[inline(never)]
pub fn fibonacci_3(n: u32) -> u64 {
    Matrix2::new(1, 1, 1, 0).pow(n - 1)[(0, 0)]
}

Use #[inline(never)] on functions you want to see clearly in the profile, preventing the compiler from inlining them.

Running the Example

cd examples/profiling
RISC0_PPROF_OUT=./profile.pb RISC0_DEV_MODE=1 cargo run
go tool pprof -http=127.0.0.1:8000 profile.pb

The flamegraph will show the relative performance of the three implementations, allowing you to compare algorithmic approaches.

Advanced Features

Tracking Inline Functions

Compilers often inline functions, which can make profiles less detailed. If you’ve compiled with debug symbols, enable inline function tracking:

RISC0_PPROF_ENABLE_INLINE_FUNCTIONS=yes RISC0_PPROF_OUT=./profile.pb cargo run

Enabling inline function tracking makes the profiler significantly slower. Only use it when you need the extra detail.

Other pprof Views

The pprof web interface provides several visualization options:

Top: List of functions by cycle count
Graph: Call graph with cycle counts
Peek: Examine specific functions
Source: View source code with cycle annotations (requires debug symbols)
Disassemble: View assembly with cycle counts

Refer to the pprof documentation for details on each view.

Profiling Best Practices

Always profile in dev mode

Use RISC0_DEV_MODE=1 to skip proving and focus on execution performance.

Profile representative workloads

Ensure your test inputs are similar to production workloads in size and complexity.

Look for the widest sections first

In flamegraphs, optimize the widest sections first for maximum impact.

Compare before and after

pprof can compare two profiles to show improvements:

go tool pprof -http=:8000 -base=before.pb after.pb

Use cycle counting for micro-benchmarks

For detailed micro-benchmarks, use env::cycle_count() directly:

use risc0_zkvm::guest::env;

let start = env::cycle_count();
// Operation to benchmark
let end = env::cycle_count();
println!("Operation took {} cycles", end - start);

Interpreting Results

Understanding Cycle Counts

Remember that cycle counts are directly proportional to proving time. A function that takes 1 million cycles will require roughly twice as long to prove as a function that takes 500,000 cycles.

Page-In/Page-Out Detection

If you see functions with names like page_in or page_out consuming significant cycles, consider:

Reducing memory usage
Improving memory locality
Using more compact data structures

See the Optimization Guide for strategies.

Cryptographic Operations

If cryptographic operations dominate your profile, check if you’re using precompiled implementations. Precompiles can reduce cycle counts by 10-100x for supported operations.

Next Steps

Apply insights from profiling using the Optimization Guide
Explore Precompiles for cryptographic acceleration
Learn about GPU Acceleration for faster proving

Documentation Index

​Profiling Guest Programs

​Background

​How Profiling Works

​Prerequisites

​Generating Profiles

​Basic Usage

​Example: Profiling ECDSA Verification

​Environment Variables

​Visualizing Profiles

​Starting the pprof Web Interface

​Flamegraph View

​Example Flamegraph

​Profiling Example: Fibonacci

​Implementation 1: Basic Iterative

​Implementation 2: Loop Unrolling

​Implementation 3: Matrix Exponentiation

​Running the Example

​Advanced Features

​Tracking Inline Functions

​Other pprof Views

​Profiling Best Practices

​Interpreting Results

​Understanding Cycle Counts

​Page-In/Page-Out Detection

​Cryptographic Operations

​Next Steps

Profiling Guest Programs

Background

How Profiling Works

Prerequisites

Generating Profiles

Basic Usage

Example: Profiling ECDSA Verification

Environment Variables

Visualizing Profiles

Starting the pprof Web Interface

Flamegraph View

Example Flamegraph

Profiling Example: Fibonacci

Implementation 1: Basic Iterative

Implementation 2: Loop Unrolling

Implementation 3: Matrix Exponentiation

Running the Example

Advanced Features

Tracking Inline Functions

Other pprof Views

Profiling Best Practices

Interpreting Results

Understanding Cycle Counts

Page-In/Page-Out Detection

Cryptographic Operations

Next Steps