PERF_EVENT_OPEN(2) Linux Programmer's Manual PERF_EVENT_OPEN(2)NAME
perf_event_open - open a performance event
SYNOPSIS
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
syscall(__NR_perf_event_open, struct perf_event_attr attr_uptr,
pid_t pid, int cpu, int group_fd, unsigned long flags);
DESCRIPTIONperf_event_open() open a performance event, associate it to a task/cpu.
This system call is part of the performance counters for Linux.
Performance counters are special hardware registers available on most
modern CPUs. These register count the number of certain types of hw
events: such as instructions executed, cachemisses suffered, or
branches mis-predicted, without slowing down the kernel or applica‐
tions. These registers can also trigger interrupts when a threshold
number of events have passed - and can thus be used to profile the code
that runs on that CPU.
The Linux Performance Counter subsystem provides an abstraction of per‐
formance counter hardware capabilities. It provides per task and per
CPU counters, and it provides event capabilities on top of those.
The goal of the Performance Counter implementation is similar to the
perfmon mechanism accessible with the perfmonctl(2) system call. How‐
ever, it is based on a fundamentally different design:
· The API is based on a single counter abstraction.
· Only one single new system call is needed: perf_event_open(2).
All performance-counter operations are implemented via standard
VFS APIs such as read(2), fcntl(2), and poll(2).
User-space is not exposed to lowlevel details like contexts or
arrays of counters. Opening and reading a basic counter is as
simple as 2 lines of C code:
void main(void)
{
u64 count;
struct perf_event_attr attr;
attr->type = PERF_TYPE_SOFTWARE;
attr->config = PERF_COUNT_SW_CONTEXT_SWITCHES;
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING;
fd = syscall(__NR_perf_event_open, attr, 0, 0, 0, -1);
ret = read(fd, &count, sizeof(count));
if (ret == sizeof(count))
printf("Current count: %Ld context switches!",
count);
}
No interaction with ptrace: any task (with sufficient permis‐
sions) can monitor other tasks, without having to stop that
task.
Mapping of counters to hw counters is not static - counters are
scheduled dynamically on each CPU where a task runs.
There's a /sys based reservation facility that allows the allo‐
cation of a certain number of hw counters for guaranteed sysad‐
min access.
RETURN VALUE
On success, perf_event_open() returns the file descriptor that can be
used with VFS system calls. On failure -1 is returned and errno is set
to indicate the error.
ERRORS
EINVAL flags contains unsupported values. Or the sampling frequency is
too high.
EACCES User space requested tracing of the kernel with /proc/sys/ker‐
nel/perf_event_paranoid is set to 1 and the caller does not pos‐
sess the CAP_SYS_ADMIN capability.
VERSIONSperf_event_open() appeared on Linux in kernel 2.6.32.
CONFORMING TO
This system call is Linux-specific, and should be avoided in portable
programs.
NOTES
This system call is specific to the x86 architecture.
SEE ALSOread(2), poll(2), fcntl(2).
Linux 2011-07-19 PERF_EVENT_OPEN(2)