1 .\" Copyright (c) 2003-2008 Joseph Koshy
2 .\" Copyright (c) 2007 The FreeBSD Foundation
3 .\" All rights reserved.
5 .\" Portions of this software were developed by A. Joseph Koshy under
6 .\" sponsorship from the FreeBSD Foundation and Google, Inc.
8 .\" Redistribution and use in source and binary forms, with or without
9 .\" modification, are permitted provided that the following conditions
11 .\" 1. Redistributions of source code must retain the above copyright
12 .\" notice, this list of conditions and the following disclaimer.
13 .\" 2. Redistributions in binary form must reproduce the above copyright
14 .\" notice, this list of conditions and the following disclaimer in the
15 .\" documentation and/or other materials provided with the distribution.
17 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
18 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
21 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
36 .Nd "Hardware Performance Monitoring Counter support"
38 .Cd "options HWPMC_HOOKS"
41 Additionally, for i386 systems:
46 driver virtualizes the hardware performance monitoring facilities in
47 modern CPUs and provides support for using these facilities from
50 The driver supports multi-processor systems.
52 PMCs are allocated using the
53 .Dv PMC_OP_PMCALLOCATE
56 .Dv PMC_OP_PMCALLOCATE
57 request will return a handle to the requesting process.
58 Subsequent operations on the allocated PMC use this handle to denote
60 A process that has successfully allocated a PMC is termed an
63 PMCs may be allocated with process or system scope.
64 .Bl -tag -width ".Em Process-scope"
65 .It Em "Process-scope"
66 The PMC is active only when a thread belonging
67 to a process it is attached to is scheduled on a CPU.
69 The PMC operates independently of processes and
70 measures hardware events for the system as a whole.
73 PMCs may be allocated for counting or for sampling:
74 .Bl -tag -width ".Em Counting"
76 In counting modes, the PMCs count hardware events.
77 These counts are retrievable using the
79 system call on all architectures.
80 Some architectures offer faster methods of reading these counts.
82 In sampling modes, the PMCs are configured to sample the CPU
83 instruction pointer (and optionally to capture the call chain leading
84 up to the sampled instruction pointer) after a configurable number of
85 hardware events have been observed.
86 Instruction pointer samples and call chain records are usually
87 directed to a log file for subsequent analysis.
90 Scope and operational mode are orthogonal; a PMC may thus be
91 configured to operate in one of the following four modes:
92 .Bl -tag -width indent
93 .It Process-scope, counting
94 These PMCs count hardware events whenever a thread in their attached process is
96 These PMCs normally count from zero, but the initial count may be
100 Applications can read the value of the PMC anytime using the
103 .It Process-scope, sampling
104 These PMCs sample the target processes instruction pointer after they
105 have seen the configured number of hardware events.
106 The PMCs only count events when a thread belonging to their attached
108 The desired frequency of sampling is set using the
110 operation prior to starting the PMC.
111 Log files are configured using the
112 .Dv PMC_OP_CONFIGURELOG
114 .It System-scope, counting
115 These PMCs count hardware events seen by them independent of the
116 processes that are executing.
117 The current count on these PMCs can be read using the
120 These PMCs normally count from zero, but the initial count may be
124 .It System-scope, sampling
125 These PMCs will periodically sample the instruction pointer of the CPU
126 they are allocated on, and will write the sample to a log for further
128 The desired frequency of sampling is set using the
130 operation prior to starting the PMC.
131 Log files are configured using the
132 .Dv PMC_OP_CONFIGURELOG
135 System-wide statistical sampling can only be enabled by a process with
136 super-user privileges.
139 Processes are allowed to allocate as many PMCs as the hardware and
140 current operating conditions permit.
141 Processes may mix allocations of system-wide and process-private
143 Multiple processes may be using PMCs simultaneously.
145 Allocated PMCs are started using the
147 operation, and stopped using the
150 Stopping and starting a PMC is permitted at any time the owner process
151 has a valid handle to the PMC.
153 Process-private PMCs need to be attached to a target process before
155 Attaching a process to a PMC is done using the
158 An already attached PMC may be detached from its target process
164 operation on an as yet unattached PMC will cause it to be attached
165 to its owner process.
166 The following rules determine whether a given process may attach
167 a PMC to another target process:
170 A non-jailed process with super-user privileges is allowed to attach
171 to any other process in the system.
173 Other processes are only allowed to attach to targets that they would
174 be able to attach to for debugging (as determined by
178 PMCs are released using
179 .Dv PMC_OP_PMCRELEASE .
181 .Dv PMC_OP_PMCRELEASE
182 operation the handle to the PMC will become invalid.
185 .Dv PMC_OP_PMCALLOCATE
186 operation supports the following flags that modify the behavior
188 .Bl -tag -width indent
189 .It Dv PMC_F_CALLCHAIN
190 This modifier informs sampling PMCs to record a callchain when
192 The maximum depth to which call chains are recorded is specified
194 .Va "kern.hwpmc.callchaindepth"
196 .It Dv PMC_F_DESCENDANTS
197 This modifier is valid only for a PMC being allocated in process-private
199 It signifies that the PMC will track hardware events for its
200 target process and the target's current and future descendants.
202 This modifier is valid only for a PMC being allocated in system-wide
204 It signifies that the PMC's sampling interrupt is to be used to drive
207 This functionality is currently unimplemented.
208 .It Dv PMC_F_LOG_PROCCSW
209 This modifier is valid only for a PMC being allocated in process-private
211 When this modifier is present, at every context switch,
213 will log a record containing the number of hardware events
214 seen by the target process when it was scheduled on the CPU.
215 .It Dv PMC_F_LOG_PROCEXIT
216 This modifier is valid only for a PMC being allocated in process-private
218 With this modifier present,
220 will maintain per-process counts for each target process attached to
222 At process exit time, a record containing the target process' PID and
223 the accumulated per-process count for that process will be written to the
228 .Dv PMC_F_LOG_PROCEXIT
230 .Dv PMC_F_LOG_PROCCSW
231 may be used in combination with modifier
232 .Dv PMC_F_DESCENDANTS
233 to track the behavior of complex pipelines of processes.
235 .Dv PMC_F_LOG_PROCEXIT
237 .Dv PMC_F_LOG_PROCCSW
238 cannot be started until their owner process has configured a log file.
242 driver may deliver signals to processes that have allocated PMCs:
243 .Bl -tag -width ".Dv SIGBUS"
247 operation was attempted on a process-private PMC that does not have
248 attached target processes.
252 driver is being unloaded from the kernel.
254 .Ss PMC ROW DISPOSITIONS
255 A PMC row is defined as the set of PMC resources at the same hardware
256 address in the CPUs in a system.
257 Since process scope PMCs need to move between CPUs following their
258 target threads, allocation of a process scope PMC reserves all PMCs in
259 a PMC row for use only with process scope PMCs.
260 Accordingly a PMC row will be in one of the following dispositions:
261 .Bl -tag -width ".Dv PMC_DISP_STANDALONE" -compact
263 Hardware counters in this row are free and may be use to satisfy
264 either of system scope or process scope allocation requests.
265 .It Dv PMC_DISP_THREAD
266 Hardware counters in this row are in use by process scope PMCs
267 and are only available for process scope allocation requests.
268 .It Dv PMC_DISP_STANDALONE
269 Some hardware counters in this row have been administratively
270 disabled or are in use by system scope PMCs.
271 Non-disabled hardware counters in such a row may be used
272 for satisfying system scope allocation requests.
273 No process scope PMCs will use hardware counters in this row.
276 The recommended way for application programs to use the facilities of
279 driver is using the API provided by the
285 driver operates using a system call number that is dynamically
286 allotted to it when it is loaded into the kernel.
290 driver supports the following operations:
291 .Bl -tag -width indent
292 .It Dv PMC_OP_CONFIGURELOG
293 Configure a log file for PMCs that require a log file.
296 driver will write log data to this file asynchronously.
297 If it encounters an error, logging will be stopped and the error code
298 encountered will be saved for subsequent retrieval by a
301 .It Dv PMC_OP_FLUSHLOG
302 Transfer buffered log data inside
304 to a configured output file.
305 This operation returns to the caller after the write operation
307 The returned error code reflects any pending error state inside
309 .It Dv PMC_OP_GETCPUINFO
310 Retrieve information about the highest possible CPU number for the system,
311 and the number of hardware performance monitoring counters available per CPU.
312 .It Dv PMC_OP_GETDRIVERSTATS
313 Retrieve module statistics (for analyzing the behavior of
316 .It Dv PMC_OP_GETMODULEVERSION
317 Retrieve the version number of API.
318 .It Dv PMC_OP_GETPMCINFO
319 Retrieve information about the current state of the PMCs on a
321 .It Dv PMC_OP_PMCADMIN
322 Set the administrative state (i.e., whether enabled or disabled) for
323 the hardware PMCs managed by the
326 The invoking process needs to possess the
329 .It Dv PMC_OP_PMCALLOCATE
330 Allocate and configure a PMC.
331 On successful allocation, a handle to the PMC (a 32 bit value)
333 .It Dv PMC_OP_PMCATTACH
334 Attach a process mode PMC to a target process.
335 The PMC will be active whenever a thread in the target process is
339 .Dv PMC_F_DESCENDANTS
340 flag had been specified at PMC allocation time, then the PMC is
341 attached to all current and future descendants of the target process.
342 .It Dv PMC_OP_PMCDETACH
343 Detach a PMC from its target process.
344 .It Dv PMC_OP_PMCRELEASE
347 Read and write a PMC.
348 This operation is valid only for PMCs configured in counting modes.
349 .It Dv PMC_OP_SETCOUNT
350 Set the initial count (for counting mode PMCs) or the desired sampling
351 rate (for sampling mode PMCs).
352 .It Dv PMC_OP_PMCSTART
354 .It Dv PMC_OP_PMCSTOP
356 .It Dv PMC_OP_WRITELOG
357 Insert a timestamped user record into the log file.
359 .Ss i386 Specific API
360 Some i386 family CPUs support the RDPMC instruction which allows a
361 user process to read a PMC value without needing to invoke a
364 On such CPUs, the machine address associated with an allocated PMC is
365 retrievable using the
366 .Dv PMC_OP_PMCX86GETMSR
368 .Bl -tag -width indent
369 .It Dv PMC_OP_PMCX86GETMSR
370 Retrieve the MSR (machine specific register) number associated with
371 the given PMC handle.
373 The PMC needs to be in process-private mode and allocated without the
374 .Dv PMC_F_DESCENDANTS
375 modifier flag, and should be attached only to its owner process at the
378 .Ss amd64 Specific API
379 AMD64 CPUs support the RDPMC instruction which allows a
380 user process to read a PMC value without needing to invoke a
383 The machine address associated with an allocated PMC is
384 retrievable using the
385 .Dv PMC_OP_PMCX86GETMSR
387 .Bl -tag -width indent
388 .It Dv PMC_OP_PMCX86GETMSR
389 Retrieve the MSR (machine specific register) number associated with
390 the given PMC handle.
392 The PMC needs to be in process-private mode and allocated without the
393 .Dv PMC_F_DESCENDANTS
394 modifier flag, and should be attached only to its owner process at the
397 .Sh SYSCTL VARIABLES AND LOADER TUNABLES
400 is influenced by the following
405 .Bl -tag -width indent
406 .It Va kern.hwpmc.callchaindepth Pq integer, read-only
407 The maximum number of call chain records to capture per sample.
409 .It Va kern.hwpmc.debugflags Pq string, read-write
410 (Only available if the
412 driver was compiled with
414 Control the verbosity of debug messages from the
417 .It Va kern.hwpmc.hashsize Pq integer, read-only
418 The number of rows in the hash tables used to keep track of owner and
421 .It Va kern.hwpmc.logbuffersize Pq integer, read-only
422 The size in kilobytes of each log buffer used by
425 The default buffer size is 4KB.
426 .It Va kern.hwpmc.mtxpoolsize Pq integer, read-only
427 The size of the spin mutex pool used by the PMC driver.
429 .It Va kern.hwpmc.nbuffers Pq integer, read-only
430 The number of log buffers used by
434 .It Va kern.hwpmc.nsamples Pq integer, read-only
435 The number of entries in the per-CPU ring buffer used during sampling.
437 .It Va security.bsd.unprivileged_syspmcs Pq boolean, read-write
438 If set to non-zero, allow unprivileged processes to allocate system-wide
440 The default value is 0.
441 .It Va security.bsd.unprivileged_proc_debug Pq boolean, read-write
444 driver will only allow privileged processes to attach PMCs to other
448 These variables may be set in the kernel environment using
453 .Sh IMPLEMENTATION NOTES
455 The kernel driver requires all physical CPUs in an SMP system to have
456 identical performance monitoring counter hardware.
457 .Ss Sparse CPU Numbering
458 On platforms that sparsely number CPUs and which support hot-plugging
459 of CPUs, requests that specify non-existent or disabled CPUs will fail
461 Applications allocating system-scope PMCs need to be aware of
462 the possibility of such transient failures.
464 Historically, on the x86 architecture,
466 has permitted user processes running at a processor CPL of 3 to
467 read the TSC using the RDTSC instruction.
470 driver preserves this behavior.
471 .Ss Intel P4/HTT Handling
472 On CPUs with HTT support, Intel P4 PMCs are capable of qualifying
473 only a subset of hardware events on a per-logical CPU basis.
474 Consequently, if HTT is enabled on a system with Intel Pentium P4
477 driver will reject allocation requests for process-private PMCs that
478 request counting of hardware events that cannot be counted separately
479 for each logical CPU.
480 .Ss Intel Pentium-Pro Handling
481 Writing a value to the PMC MSRs found in Intel Pentium-Pro style PMCs
483 .Tn "Intel Pentium Pro" ,
489 processors) will replicate bit 31 of the
490 value being written into the upper 8 bits of the MSR,
491 bringing down the usable width of these PMCs to 31 bits.
492 For process-virtual PMCs, the
494 driver implements a workaround in software and makes the corrected 64
495 bit count available via the
498 Processes that intend to use RDPMC instructions directly or
499 that intend to write values larger than 2^31 into these PMCs with
501 need to be aware of this hardware limitation.
504 .It "hwpmc: [class/npmc/capabilities]..."
505 Announce the presence of
509 with capabilities described by bit string
511 .It "hwpmc: kernel version (0x%x) does not match module version (0x%x)."
512 The module loading process failed because a version mismatch was detected
513 between the currently executing kernel and the module being loaded.
514 .It "hwpmc: this kernel has not been compiled with 'options HWPMC_HOOKS'."
515 The module loading process failed because the currently executing kernel
516 was not configured with the required configuration option
518 .It "hwpmc: tunable hashsize=%d must be greater than zero."
519 A negative value was supplied for tunable
520 .Va kern.hwpmc.hashsize .
521 .It "hwpmc: tunable logbuffersize=%d must be greater than zero."
522 A negative value was supplied for tunable
523 .Va kern.hwpmc.logbuffersize .
524 .It "hwpmc: tunable nlogbuffers=%d must be greater than zero."
525 A negative value was supplied for tunable
526 .Va kern.hwpmc.nlogbuffers .
527 .It "hwpmc: tunable nsamples=%d out of range."
528 The value for tunable
529 .Va kern.hwpmc.nsamples
530 was negative or greater than 65535.
537 The API and ABI documented in this manual page may change in
539 The recommended method of accessing this driver is using the
543 A command issued to the
545 driver may fail with the following errors:
548 Helper process creation failed for a
549 .Dv PMC_OP_CONFIGURELOG
550 request due to a temporary resource shortage in the kernel.
553 .Dv PMC_OP_CONFIGURELOG
554 operation was requested while an existing log was active.
556 A DISABLE operation was requested using the
558 request for a set of hardware resources currently in use for
559 process-private PMCs.
563 operation was requested on an active system mode PMC.
567 operation was requested for a target process that already had another
568 PMC using the same hardware resources attached to it.
572 request writing a new value was issued on a PMC that was active.
575 .Dv PMC_OP_PMCSETCOUNT
576 request was issued on a PMC that was active.
580 operation was requested without a log file being configured for a
582 .Dv PMC_F_LOG_PROCCSW
584 .Dv PMC_F_LOG_PROCEXIT
589 operation was requested on a system-wide sampling PMC without a log
590 file being configured.
594 request was reissued for a target process that already is the target
597 A bad address was passed in to the driver.
599 An invalid PMC handle was specified.
601 An invalid CPU number was passed in for a
602 .Dv PMC_OP_GETPMCINFO
606 .Dv PMC_OP_CONFIGURELOG
607 request to de-configure a log file was issued without a log file
612 request was issued without a log file being configured.
614 An invalid CPU number was passed in for a
618 An invalid operation request was passed in for a
622 An invalid PMC ID was passed in for a
626 A suitable PMC matching the parameters passed in to a
627 .Dv PMC_OP_PMCALLOCATE
628 request could not be allocated.
630 An invalid PMC mode was requested during a
631 .Dv PMC_OP_PMCALLOCATE
634 An invalid CPU number was specified during a
635 .Dv PMC_OP_PMCALLOCATE
641 .Dv PMC_OP_PMCALLOCATE
642 request for a process-private PMC.
647 .Dv PMC_OP_PMCALLOCATE
648 request for a system-wide PMC.
653 .Dv PMC_OP_PMCALLOCATE
654 request contained unknown flags.
656 (On Intel Pentium 4 CPUs with HTT support)
658 .Dv PMC_OP_PMCALLOCATE
659 request for a process-private PMC was issued for an event that does
660 not support counting on a per-logical CPU basis.
662 A PMC allocated for system-wide operation was specified with a
674 request specified an illegal process ID.
678 request was issued for a PMC not attached to the target process.
684 request contained illegal flags.
687 .Dv PMC_OP_PMCX86GETMSR
688 operation was requested for a PMC not in process-virtual mode, or
689 for a PMC that is not solely attached to its owner process, or for
690 a PMC that was allocated with flag
691 .Dv PMC_F_DESCENDANTS .
695 request was issued for an owner process without a log file
698 The system was not able to allocate kernel memory.
700 (On i386 and amd64 architectures)
702 .Dv PMC_OP_PMCX86GETMSR
703 operation was requested for hardware that does not support reading
704 PMCs directly with the RDPMC instruction.
707 .Dv PMC_OP_GETPMCINFO
708 operation was requested for an absent or disabled CPU.
711 .Dv PMC_OP_PMCALLOCATE
712 operation specified allocation of a system-wide PMC on an absent or
719 request was issued for a system-wide PMC that was allocated on a CPU
720 that is currently absent or disabled.
723 .Dv PMC_OP_PMCALLOCATE
724 request was issued for PMC capabilities not supported
725 by the specified PMC class.
728 A sampling mode PMC was requested on a CPU lacking an APIC.
732 request was issued by a process without super-user
733 privilege or by a jailed super-user process.
737 operation was issued for a target process that the current process
738 does not have permission to attach to.
740 (i386 and amd64 architectures)
743 operation was issued on a PMC whose MSR has been retrieved using
744 .Dv PMC_OP_PMCX86GETMSR .
746 A process issued a PMC operation request without having allocated any
749 A process issued a PMC operation request after the PMC was detached
750 from all of its target processes.
756 request specified a non-existent process ID.
758 The target process for a
760 operation is not being monitored by
777 driver first appeared in
782 driver was written by
783 .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .
785 The driver samples the state of the kernel's logical processor support
786 at the time of initialization (i.e., at module load time).
787 On CPUs supporting logical processors, the driver could misbehave if
788 logical processors are subsequently enabled or disabled while the
791 On the i386 architecture, the driver requires that the local APIC on the
792 CPU be enabled for sampling mode to be supported.
793 Many single-processor motherboards keep the APIC disabled in BIOS; on
796 will not support sampling PMCs.
797 .Sh SECURITY CONSIDERATIONS
798 PMCs may be used to monitor the actual behavior of the system on hardware.
799 In situations where this constitutes an undesirable information leak,
800 the following options are available:
806 .Va security.bsd.unprivileged_syspmcs
808 This ensures that unprivileged processes cannot allocate system-wide
809 PMCs and thus cannot observe the hardware behavior of the system
811 This tunable may also be set at boot time using
817 driver into the kernel.
822 .Va security.bsd.unprivileged_proc_debug
824 This will ensure that an unprivileged process cannot attach a PMC
825 to any process other than itself and thus cannot observe the hardware
826 behavior of other processes with the same credentials.
829 System administrators should note that on IA-32 platforms
831 makes the content of the IA-32 TSC counter available to all processes
832 via the RDTSC instruction.