Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/risc0/risc0/llms.txt

Use this file to discover all available pages before exploring further.

Profiling Guest Programs

Profiling is one of the most important tools for understanding and optimizing zkVM guest code. This guide shows you how to generate cycle-count profiles and visualize them using flamegraphs.

Background

Profiling tools like pprof and perf allow collecting performance information over the entire execution of your program. RISC Zero has experimental support for generating pprof files for cycle counts.

How Profiling Works

Sampling CPU profilers record the current call stack at regular intervals to show where your program spends its time. RISC Zero’s profiler captures the call stack at every cycle of program execution.
The zkVM profiler captures the call stack at every cycle, not at sampling intervals. This is practical because zkVM executions are short and synchronous, with no measurement overhead.

Prerequisites

1
Install RISC Zero tools
2
Follow the installation guide if you haven’t already.
3
Install Go
4
The pprof tool is bundled with Go. Install Go for your platform.

Generating Profiles

Basic Usage

Set the RISC0_PPROF_OUT environment variable to specify where to write profiling data:
RISC0_PPROF_OUT=./profile.pb RISC0_DEV_MODE=1 cargo run
Always profile in dev mode to avoid unnecessary proving time. Use RISC0_DEV_MODE=1 to enable it.

Example: Profiling ECDSA Verification

# In the RISC Zero repository
cd examples/ecdsa/k256
RISC0_PPROF_OUT=ecdsa_verify.pb RISC0_DEV_MODE=1 cargo run

Environment Variables

VariablePurposeExample
RISC0_PPROF_OUTOutput file path for profiling data./profile.pb
RISC0_DEV_MODEEnable dev mode (skip proving)1
RISC0_INFOShow execution statistics1
RISC0_PPROF_ENABLE_INLINE_FUNCTIONSTrack inlined functions (slower)yes

Visualizing Profiles

Starting the pprof Web Interface

After generating a profile, visualize it with:
go tool pprof -http=127.0.0.1:8000 profile.pb
Then open http://localhost:8000 in your browser.

Flamegraph View

The flamegraph is one of the most useful visualizations. Access it at:
http://localhost:8000/ui/flamegraph
In a flamegraph:
  • The x-axis represents the proportion of total cycles
  • The y-axis represents the call stack depth
  • Wider sections indicate more cycles spent
  • Click on sections to zoom in

Example Flamegraph

When viewing a flamegraph from ECDSA signature verification, you’ll typically see that the lincomb (linear combination) operation accounts for over 95% of the total cycle count in ECDSA verification.

Profiling Example: Fibonacci

The profiling example compares three Fibonacci implementations:

Implementation 1: Basic Iterative

#[inline(never)]
pub fn fibonacci_1(n: u32) -> u64 {
    let (mut a, mut b) = (0, 1);
    if n <= 1 {
        return n as u64;
    }
    let mut i = 2;
    while i <= n {
        let c = a + b;
        a = b;
        b = c;
        i += 1;
    }
    b
}

Implementation 2: Loop Unrolling

#[inline(never)]
pub fn fibonacci_2(n: u32) -> u64 {
    let (mut a, mut b) = (0, 1);
    if n <= 1 {
        return n as u64;
    }
    let mut i = 2;
    while i <= n {
        if i + 5 <= n {
            // Compute 5 iterations at once
            let c = a + b;
            let d = b + c;
            let e = c + d;
            let f = d + e;
            let g = e + f;
            a = f;
            b = g;
            i += 5;
        } else {
            let c = a + b;
            a = b;
            b = c;
            i += 1;
        }
    }
    b
}

Implementation 3: Matrix Exponentiation

use nalgebra::Matrix2;

#[inline(never)]
pub fn fibonacci_3(n: u32) -> u64 {
    Matrix2::new(1, 1, 1, 0).pow(n - 1)[(0, 0)]
}
Use #[inline(never)] on functions you want to see clearly in the profile, preventing the compiler from inlining them.

Running the Example

cd examples/profiling
RISC0_PPROF_OUT=./profile.pb RISC0_DEV_MODE=1 cargo run
go tool pprof -http=127.0.0.1:8000 profile.pb
The flamegraph will show the relative performance of the three implementations, allowing you to compare algorithmic approaches.

Advanced Features

Tracking Inline Functions

Compilers often inline functions, which can make profiles less detailed. If you’ve compiled with debug symbols, enable inline function tracking:
RISC0_PPROF_ENABLE_INLINE_FUNCTIONS=yes RISC0_PPROF_OUT=./profile.pb cargo run
Enabling inline function tracking makes the profiler significantly slower. Only use it when you need the extra detail.

Other pprof Views

The pprof web interface provides several visualization options:
  • Top: List of functions by cycle count
  • Graph: Call graph with cycle counts
  • Peek: Examine specific functions
  • Source: View source code with cycle annotations (requires debug symbols)
  • Disassemble: View assembly with cycle counts
Refer to the pprof documentation for details on each view.

Profiling Best Practices

1
Always profile in dev mode
2
Use RISC0_DEV_MODE=1 to skip proving and focus on execution performance.
3
Profile representative workloads
4
Ensure your test inputs are similar to production workloads in size and complexity.
5
Look for the widest sections first
6
In flamegraphs, optimize the widest sections first for maximum impact.
7
Compare before and after
8
pprof can compare two profiles to show improvements:
9
go tool pprof -http=:8000 -base=before.pb after.pb
10
Use cycle counting for micro-benchmarks
11
For detailed micro-benchmarks, use env::cycle_count() directly:
12
use risc0_zkvm::guest::env;

let start = env::cycle_count();
// Operation to benchmark
let end = env::cycle_count();
println!("Operation took {} cycles", end - start);

Interpreting Results

Understanding Cycle Counts

Remember that cycle counts are directly proportional to proving time. A function that takes 1 million cycles will require roughly twice as long to prove as a function that takes 500,000 cycles.

Page-In/Page-Out Detection

If you see functions with names like page_in or page_out consuming significant cycles, consider:
  • Reducing memory usage
  • Improving memory locality
  • Using more compact data structures
See the Optimization Guide for strategies.

Cryptographic Operations

If cryptographic operations dominate your profile, check if you’re using precompiled implementations. Precompiles can reduce cycle counts by 10-100x for supported operations.

Next Steps