1 .\" Copyright (c) 2003-2008 Joseph Koshy
2 .\" Copyright (c) 2007,2023 The FreeBSD Foundation
4 .\" Portions of this software were developed by A. Joseph Koshy under
5 .\" sponsorship from the FreeBSD Foundation and Google, Inc.
7 .\" Portions of this documentation were written by Mitchell Horne
8 .\" under sponsorship from the FreeBSD Foundation.
10 .\" Redistribution and use in source and binary forms, with or without
11 .\" modification, are permitted provided that the following conditions
13 .\" 1. Redistributions of source code must retain the above copyright
14 .\" notice, this list of conditions and the following disclaimer.
15 .\" 2. Redistributions in binary form must reproduce the above copyright
16 .\" notice, this list of conditions and the following disclaimer in the
17 .\" documentation and/or other materials provided with the distribution.
19 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
20 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
23 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
36 .Nd "Hardware Performance Monitoring Counter support"
38 The following option must be present in the kernel configuration file:
39 .Bd -ragged -offset indent
40 .Cd "options HWPMC_HOOKS"
43 Additionally, for i386 systems:
44 .Bd -ragged -offset indent
48 To load the driver as a module at boot time:
49 .Bd -literal -offset indent
53 Alternatively, to compile the driver into the kernel:
54 .Bd -ragged -offset indent
58 To enable debugging features
62 .Bd -ragged -offset indent
64 .Cd "options KTR_COMPILE=(KTR_SUBSYS)"
65 .Cd "options KTR_MASK=(KTR_SUBSYS)"
66 .Cd "options HWPMC_DEBUG"
71 driver virtualizes the hardware performance monitoring facilities in
72 modern CPUs and provides support for using these facilities from
75 The driver supports multi-processor systems.
77 PMCs are allocated using the
78 .Dv PMC_OP_PMCALLOCATE
81 .Dv PMC_OP_PMCALLOCATE
82 request will return a handle to the requesting process.
83 Subsequent operations on the allocated PMC use this handle to denote
85 A process that has successfully allocated a PMC is termed an
88 PMCs may be allocated with process or system scope.
89 .Bl -tag -width ".Em Process-scope"
90 .It Em "Process-scope"
91 The PMC is active only when a thread belonging
92 to a process it is attached to is scheduled on a CPU.
94 The PMC operates independently of processes and
95 measures hardware events for the system as a whole.
98 PMCs may be allocated for counting or for sampling:
99 .Bl -tag -width ".Em Counting"
101 In counting modes, the PMCs count hardware events.
102 These counts are retrievable using the
104 system call on all architectures.
105 Some architectures offer faster methods of reading these counts.
107 In sampling modes, the PMCs are configured to sample the CPU
108 instruction pointer (and optionally to capture the call chain leading
109 up to the sampled instruction pointer) after a configurable number of
110 hardware events have been observed.
111 Instruction pointer samples and call chain records are usually
112 directed to a log file for subsequent analysis.
115 Scope and operational mode are orthogonal; a PMC may thus be
116 configured to operate in one of the following four modes:
117 .Bl -tag -width indent
118 .It Process-scope, counting
119 These PMCs count hardware events whenever a thread in their attached process is
121 These PMCs normally count from zero, but the initial count may be
125 Applications can read the value of the PMC anytime using the
128 .It Process-scope, sampling
129 These PMCs sample the target processes instruction pointer after they
130 have seen the configured number of hardware events.
131 The PMCs only count events when a thread belonging to their attached
133 The desired frequency of sampling is set using the
135 operation prior to starting the PMC.
136 Log files are configured using the
137 .Dv PMC_OP_CONFIGURELOG
139 .It System-scope, counting
140 These PMCs count hardware events seen by them independent of the
141 processes that are executing.
142 The current count on these PMCs can be read using the
145 These PMCs normally count from zero, but the initial count may be
149 .It System-scope, sampling
150 These PMCs will periodically sample the instruction pointer of the CPU
151 they are allocated on, and will write the sample to a log for further
153 The desired frequency of sampling is set using the
155 operation prior to starting the PMC.
156 Log files are configured using the
157 .Dv PMC_OP_CONFIGURELOG
160 System-wide statistical sampling can only be enabled by a process with
161 super-user privileges.
164 Processes are allowed to allocate as many PMCs as the hardware and
165 current operating conditions permit.
166 Processes may mix allocations of system-wide and process-private
168 Multiple processes may be using PMCs simultaneously.
170 Allocated PMCs are started using the
172 operation, and stopped using the
175 Stopping and starting a PMC is permitted at any time the owner process
176 has a valid handle to the PMC.
178 Process-private PMCs need to be attached to a target process before
180 Attaching a process to a PMC is done using the
183 An already attached PMC may be detached from its target process
189 operation on an as yet unattached PMC will cause it to be attached
190 to its owner process.
191 The following rules determine whether a given process may attach
192 a PMC to another target process:
195 A non-jailed process with super-user privileges is allowed to attach
196 to any other process in the system.
198 Other processes are only allowed to attach to targets that they would
199 be able to attach to for debugging (as determined by
203 PMCs are released using
204 .Dv PMC_OP_PMCRELEASE .
206 .Dv PMC_OP_PMCRELEASE
207 operation the handle to the PMC will become invalid.
210 .Dv PMC_OP_PMCALLOCATE
211 operation supports the following flags that modify the behavior
213 .Bl -tag -width indent
214 .It Dv PMC_F_CALLCHAIN
215 This modifier informs sampling PMCs to record a callchain when
217 The maximum depth to which call chains are recorded is specified
219 .Va "kern.hwpmc.callchaindepth"
221 .It Dv PMC_F_DESCENDANTS
222 This modifier is valid only for a PMC being allocated in process-private
224 It signifies that the PMC will track hardware events for its
225 target process and the target's current and future descendants.
226 .It Dv PMC_F_LOG_PROCCSW
227 This modifier is valid only for a PMC being allocated in process-private
229 When this modifier is present, at every context switch,
231 will log a record containing the number of hardware events
232 seen by the target process when it was scheduled on the CPU.
233 .It Dv PMC_F_LOG_PROCEXIT
234 This modifier is valid only for a PMC being allocated in process-private
236 With this modifier present,
238 will maintain per-process counts for each target process attached to
240 At process exit time, a record containing the target process' PID and
241 the accumulated per-process count for that process will be written to the
246 .Dv PMC_F_LOG_PROCEXIT
248 .Dv PMC_F_LOG_PROCCSW
249 may be used in combination with modifier
250 .Dv PMC_F_DESCENDANTS
251 to track the behavior of complex pipelines of processes.
253 .Dv PMC_F_LOG_PROCEXIT
255 .Dv PMC_F_LOG_PROCCSW
256 cannot be started until their owner process has configured a log file.
260 driver may deliver signals to processes that have allocated PMCs:
261 .Bl -tag -width ".Dv SIGBUS"
265 operation was attempted on a process-private PMC that does not have
266 attached target processes.
270 driver is being unloaded from the kernel.
272 .Ss PMC ROW DISPOSITIONS
273 A PMC row is defined as the set of PMC resources at the same hardware
274 address in the CPUs in a system.
275 Since process scope PMCs need to move between CPUs following their
276 target threads, allocation of a process scope PMC reserves all PMCs in
277 a PMC row for use only with process scope PMCs.
278 Accordingly a PMC row will be in one of the following dispositions:
279 .Bl -tag -width ".Dv PMC_DISP_STANDALONE" -compact
281 Hardware counters in this row are free and may be use to satisfy
282 either of system scope or process scope allocation requests.
283 .It Dv PMC_DISP_THREAD
284 Hardware counters in this row are in use by process scope PMCs
285 and are only available for process scope allocation requests.
286 .It Dv PMC_DISP_STANDALONE
287 Some hardware counters in this row have been administratively
288 disabled or are in use by system scope PMCs.
289 Non-disabled hardware counters in such a row may be used
290 for satisfying system scope allocation requests.
291 No process scope PMCs will use hardware counters in this row.
294 The API and ABI documented in this manual page may change in the future.
295 This interface is intended to be consumed by the
297 library; other consumers are unsupported.
298 Applications targeting PMCs should use the
304 driver operates using a system call number that is dynamically
305 allotted to it when it is loaded into the kernel.
309 driver supports the following operations:
310 .Bl -tag -width indent
311 .It Dv PMC_OP_CONFIGURELOG
312 Configure a log file for PMCs that require a log file.
315 driver will write log data to this file asynchronously.
316 If it encounters an error, logging will be stopped and the error code
317 encountered will be saved for subsequent retrieval by a
320 .It Dv PMC_OP_FLUSHLOG
321 Transfer buffered log data inside
323 to a configured output file.
324 This operation returns to the caller after the write operation
326 The returned error code reflects any pending error state inside
328 .It Dv PMC_OP_GETCPUINFO
329 Retrieve information about the highest possible CPU number for the system,
330 and the number of hardware performance monitoring counters available per CPU.
331 .It Dv PMC_OP_GETDRIVERSTATS
332 Retrieve module statistics (for analyzing the behavior of
335 .It Dv PMC_OP_GETMODULEVERSION
336 Retrieve the version number of API.
337 .It Dv PMC_OP_GETPMCINFO
338 Retrieve information about the current state of the PMCs on a
340 .It Dv PMC_OP_PMCADMIN
341 Set the administrative state (i.e., whether enabled or disabled) for
342 the hardware PMCs managed by the
345 The invoking process needs to possess the
348 .It Dv PMC_OP_PMCALLOCATE
349 Allocate and configure a PMC.
350 On successful allocation, a handle to the PMC (a 32 bit value)
352 .It Dv PMC_OP_PMCATTACH
353 Attach a process mode PMC to a target process.
354 The PMC will be active whenever a thread in the target process is
358 .Dv PMC_F_DESCENDANTS
359 flag had been specified at PMC allocation time, then the PMC is
360 attached to all current and future descendants of the target process.
361 .It Dv PMC_OP_PMCDETACH
362 Detach a PMC from its target process.
363 .It Dv PMC_OP_PMCRELEASE
366 Read and write a PMC.
367 This operation is valid only for PMCs configured in counting modes.
368 .It Dv PMC_OP_SETCOUNT
369 Set the initial count (for counting mode PMCs) or the desired sampling
370 rate (for sampling mode PMCs).
371 .It Dv PMC_OP_PMCSTART
373 .It Dv PMC_OP_PMCSTOP
375 .It Dv PMC_OP_WRITELOG
376 Insert a timestamped user record into the log file.
378 .Ss i386 Specific API
379 Some i386 family CPUs support the RDPMC instruction which allows a
380 user process to read a PMC value without needing to invoke a
383 On such CPUs, the machine address associated with an allocated PMC is
384 retrievable using the
385 .Dv PMC_OP_PMCX86GETMSR
387 .Bl -tag -width indent
388 .It Dv PMC_OP_PMCX86GETMSR
389 Retrieve the MSR (machine specific register) number associated with
390 the given PMC handle.
392 The PMC needs to be in process-private mode and allocated without the
393 .Dv PMC_F_DESCENDANTS
394 modifier flag, and should be attached only to its owner process at the
397 .Ss amd64 Specific API
398 AMD64 CPUs support the RDPMC instruction which allows a
399 user process to read a PMC value without needing to invoke a
402 The machine address associated with an allocated PMC is
403 retrievable using the
404 .Dv PMC_OP_PMCX86GETMSR
406 .Bl -tag -width indent
407 .It Dv PMC_OP_PMCX86GETMSR
408 Retrieve the MSR (machine specific register) number associated with
409 the given PMC handle.
411 The PMC needs to be in process-private mode and allocated without the
412 .Dv PMC_F_DESCENDANTS
413 modifier flag, and should be attached only to its owner process at the
416 .Sh SYSCTL VARIABLES AND LOADER TUNABLES
419 is influenced by the following
424 .Bl -tag -width indent
425 .It Va kern.hwpmc.callchaindepth Pq integer, read-only
426 The maximum number of call chain records to capture per sample.
428 .It Va kern.hwpmc.debugflags Pq string, read-write
429 (Only available if the
431 driver was compiled with
433 Control the verbosity of debug messages from the
436 .It Va kern.hwpmc.hashsize Pq integer, read-only
437 The number of rows in the hash tables used to keep track of owner and
440 .It Va kern.hwpmc.logbuffersize Pq integer, read-only
441 The size in kilobytes of each log buffer used by
444 The default buffer size is 4KB.
445 .It Va kern.hwpmc.mincount Pq integer, read-write
446 The minimum sampling rate for sampling mode PMCs.
447 The default count is 1000 events.
448 .It Va kern.hwpmc.mtxpoolsize Pq integer, read-only
449 The size of the spin mutex pool used by the PMC driver.
451 .It Va kern.hwpmc.nbuffers_pcpu Pq integer, read-only
452 The number of log buffers used by
456 .It Va kern.hwpmc.nsamples Pq integer, read-only
457 The number of entries in the per-CPU ring buffer used during sampling.
459 .It Va security.bsd.unprivileged_syspmcs Pq boolean, read-write
460 If set to non-zero, allow unprivileged processes to allocate system-wide
462 The default value is 0.
463 .It Va security.bsd.unprivileged_proc_debug Pq boolean, read-write
466 driver will only allow privileged processes to attach PMCs to other
470 These variables may be set in the kernel environment using
475 .Sh IMPLEMENTATION NOTES
477 The kernel driver requires all physical CPUs in an SMP system to have
478 identical performance monitoring counter hardware.
479 .Ss Sparse CPU Numbering
480 On platforms that sparsely number CPUs and which support hot-plugging
481 of CPUs, requests that specify non-existent or disabled CPUs will fail
483 Applications allocating system-scope PMCs need to be aware of
484 the possibility of such transient failures.
486 Historically, on the x86 architecture,
488 has permitted user processes running at a processor CPL of 3 to
489 read the TSC using the RDTSC instruction.
492 driver preserves this behavior.
493 .Ss Intel P4/HTT Handling
494 On CPUs with HTT support, Intel P4 PMCs are capable of qualifying
495 only a subset of hardware events on a per-logical CPU basis.
496 Consequently, if HTT is enabled on a system with Intel Pentium P4
499 driver will reject allocation requests for process-private PMCs that
500 request counting of hardware events that cannot be counted separately
501 for each logical CPU.
504 .It "hwpmc: [class/npmc/capabilities]..."
505 Announce the presence of
509 with capabilities described by bit string
511 .It "hwpmc: kernel version (0x%x) does not match module version (0x%x)."
512 The module loading process failed because a version mismatch was detected
513 between the currently executing kernel and the module being loaded.
514 .It "hwpmc: this kernel has not been compiled with 'options HWPMC_HOOKS'."
515 The module loading process failed because the currently executing kernel
516 was not configured with the required configuration option
518 .It "hwpmc: tunable hashsize=%d must be greater than zero."
519 A negative value was supplied for tunable
520 .Va kern.hwpmc.hashsize .
521 .It "hwpmc: tunable logbuffersize=%d must be greater than zero."
522 A negative value was supplied for tunable
523 .Va kern.hwpmc.logbuffersize .
524 .It "hwpmc: tunable nlogbuffers=%d must be greater than zero."
525 A negative value was supplied for tunable
526 .Va kern.hwpmc.nlogbuffers .
527 .It "hwpmc: tunable nsamples=%d out of range."
528 The value for tunable
529 .Va kern.hwpmc.nsamples
530 was negative or greater than 65535.
535 module can be configured to record trace entries using the
538 This is useful for debugging the driver's functionality, primarily during
540 This debugging functionality is not enabled by default, and requires
541 recompiling the kernel and
543 module after adding the following to the kernel config:
544 .Bd -literal -offset indent
546 .Cd options KTR_COMPILE=(KTR_SUBSYS)
547 .Cd options KTR_MASK=(KTR_SUBSYS)
548 .Cd options HWPMC_DEBUG
551 This alone is not enough to enable tracing; one must also configure the
552 .Va kern.hwpmc.debugflags
554 variable, which provides fine-grained control over which types of events are
555 logged to the trace buffer.
558 trace events are grouped by 'major' and 'minor' flag types.
559 The major flag names are as follows:
561 .Bl -tag -width "sampling" -compact -offset indent
565 Context switch events
569 Machine-dependent/class-dependent events
575 PMC management events
582 The minor flags for each major flag group can vary.
583 The individual minor flag names are:
584 .Bd -ragged -offset indent
620 .Va kern.hwpmc.debugflags
621 variable is a string with a custom format.
622 The string should contain a space-separated list of event specifiers.
623 Each event specifier consists of the major flag name, followed by an equal sign
624 (=), followed by a comma-separated list of minor event types.
625 To track all events for a major group, an asterisk (*) can be given instead of
628 For example, to trace all allocation and release events, set
631 .Bd -literal -offset indent
632 kern.hwpmc.debugflags="pmc=allocate,release md=allocate,release"
635 To trace all events in the process and context switch major flag groups:
636 .Bd -literal -offset indent
637 kern.hwpmc.debugflags="process=* csw=*"
640 To disable all trace events, set the variable to an empty string.
641 .Bd -literal -offset indent
642 kern.hwpmc.debugflags=""
645 Trace events are recorded by
647 and can be inspected at run-time using the
651 prompt after a panic with the 'show ktr' command.
653 A command issued to the
655 driver may fail with the following errors:
658 Helper process creation failed for a
659 .Dv PMC_OP_CONFIGURELOG
660 request due to a temporary resource shortage in the kernel.
663 .Dv PMC_OP_CONFIGURELOG
664 operation was requested while an existing log was active.
666 A DISABLE operation was requested using the
668 request for a set of hardware resources currently in use for
669 process-private PMCs.
673 operation was requested on an active system mode PMC.
677 operation was requested for a target process that already had another
678 PMC using the same hardware resources attached to it.
682 request writing a new value was issued on a PMC that was active.
685 .Dv PMC_OP_PMCSETCOUNT
686 request was issued on a PMC that was active.
690 operation was requested without a log file being configured for a
692 .Dv PMC_F_LOG_PROCCSW
694 .Dv PMC_F_LOG_PROCEXIT
699 operation was requested on a system-wide sampling PMC without a log
700 file being configured.
704 request was reissued for a target process that already is the target
707 A bad address was passed in to the driver.
709 An invalid PMC handle was specified.
711 An invalid CPU number was passed in for a
712 .Dv PMC_OP_GETPMCINFO
718 .Dv PMC_OP_CONFIGURELOG
719 request contained unknown flags.
722 .Dv PMC_OP_CONFIGURELOG
723 request to de-configure a log file was issued without a log file
728 request was issued without a log file being configured.
730 An invalid CPU number was passed in for a
734 An invalid operation request was passed in for a
738 An invalid PMC ID was passed in for a
742 A suitable PMC matching the parameters passed in to a
743 .Dv PMC_OP_PMCALLOCATE
744 request could not be allocated.
746 An invalid PMC mode was requested during a
747 .Dv PMC_OP_PMCALLOCATE
750 An invalid CPU number was specified during a
751 .Dv PMC_OP_PMCALLOCATE
757 .Dv PMC_OP_PMCALLOCATE
758 request for a process-private PMC.
763 .Dv PMC_OP_PMCALLOCATE
764 request for a system-wide PMC.
769 .Dv PMC_OP_PMCALLOCATE
770 request contained unknown flags.
772 (On Intel Pentium 4 CPUs with HTT support)
774 .Dv PMC_OP_PMCALLOCATE
775 request for a process-private PMC was issued for an event that does
776 not support counting on a per-logical CPU basis.
778 A PMC allocated for system-wide operation was specified with a
790 request specified an illegal process ID.
794 request was issued for a PMC not attached to the target process.
800 request contained illegal flags.
803 .Dv PMC_OP_PMCX86GETMSR
804 operation was requested for a PMC not in process-virtual mode, or
805 for a PMC that is not solely attached to its owner process, or for
806 a PMC that was allocated with flag
807 .Dv PMC_F_DESCENDANTS .
811 request was issued for an owner process without a log file
814 The system was not able to allocate kernel memory.
816 (On i386 and amd64 architectures)
818 .Dv PMC_OP_PMCX86GETMSR
819 operation was requested for hardware that does not support reading
820 PMCs directly with the RDPMC instruction.
823 .Dv PMC_OP_GETPMCINFO
824 operation was requested for an absent or disabled CPU.
827 .Dv PMC_OP_PMCALLOCATE
828 operation specified allocation of a system-wide PMC on an absent or
835 request was issued for a system-wide PMC that was allocated on a CPU
836 that is currently absent or disabled.
839 .Dv PMC_OP_PMCALLOCATE
840 request was issued for PMC capabilities not supported
841 by the specified PMC class.
844 A sampling mode PMC was requested on a CPU lacking an APIC.
848 request was issued by a process without super-user
849 privilege or by a jailed super-user process.
853 operation was issued for a target process that the current process
854 does not have permission to attach to.
856 (i386 and amd64 architectures)
859 operation was issued on a PMC whose MSR has been retrieved using
860 .Dv PMC_OP_PMCX86GETMSR .
862 A process issued a PMC operation request without having allocated any
865 A process issued a PMC operation request after the PMC was detached
866 from all of its target processes.
872 request specified a non-existent process ID.
874 The target process for a
876 operation is not being monitored by
895 driver first appeared in
900 driver was written by
901 .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .
903 The driver samples the state of the kernel's logical processor support
904 at the time of initialization (i.e., at module load time).
905 On CPUs supporting logical processors, the driver could misbehave if
906 logical processors are subsequently enabled or disabled while the
909 On the i386 architecture, the driver requires that the local APIC on the
910 CPU be enabled for sampling mode to be supported.
911 Many single-processor motherboards keep the APIC disabled in BIOS; on
914 will not support sampling PMCs.
915 .Sh SECURITY CONSIDERATIONS
916 PMCs may be used to monitor the actual behavior of the system on hardware.
917 In situations where this constitutes an undesirable information leak,
918 the following options are available:
924 .Va security.bsd.unprivileged_syspmcs
926 This ensures that unprivileged processes cannot allocate system-wide
927 PMCs and thus cannot observe the hardware behavior of the system
929 This tunable may also be set at boot time using
935 driver into the kernel.
940 .Va security.bsd.unprivileged_proc_debug
942 This will ensure that an unprivileged process cannot attach a PMC
943 to any process other than itself and thus cannot observe the hardware
944 behavior of other processes with the same credentials.
947 System administrators should note that on IA-32 platforms
949 makes the content of the IA-32 TSC counter available to all processes
950 via the RDTSC instruction.