1 .\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30 .Nd measurement events for
39 AMD K8 PMCs are present in the
44 They are documented in the
46 .%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
47 .%N "Publication No. 26094"
49 .%Q "Advanced Micro Devices, Inc."
52 AMD K8 PMCs are 48 bits wide.
53 Each CPU contains 4 PMCs with the following capabilities:
54 .Bl -column "PMC_CAP_INTERRUPT" "Support"
55 .It Em Capability Ta Em Support
56 .It PMC_CAP_CASCADE Ta \&No
57 .It PMC_CAP_EDGE Ta Yes
58 .It PMC_CAP_INTERRUPT Ta Yes
59 .It PMC_CAP_INVERT Ta Yes
60 .It PMC_CAP_READ Ta Yes
61 .It PMC_CAP_PRECISE Ta \&No
62 .It PMC_CAP_SYSTEM Ta Yes
63 .It PMC_CAP_TAGGING Ta \&No
64 .It PMC_CAP_THRESHOLD Ta Yes
65 .It PMC_CAP_USER Ta Yes
66 .It PMC_CAP_WRITE Ta Yes
69 Event specifiers for AMD K8 PMCs can have the following optional
71 .Bl -tag -width indent
72 .It Li count= Ns Ar value
73 Configure the counter to increment only if the number of configured
74 events measured in a cycle is greater than or equal to
77 Configure the counter to only count negated-to-asserted transitions
78 of the conditions expressed by the other fields.
79 In other words, the counter will increment only once whenever a given
80 condition becomes true, irrespective of the number of clocks during
81 which the condition remains true.
83 Invert the sense of comparison when the
85 qualifier is present, making the counter to increment when the
86 number of events per cycle is less than the value specified by
90 .It Li mask= Ns Ar qualifier
91 Many event specifiers for AMD K8 PMCs need to be additionally
92 qualified using a mask qualifier.
93 These additional qualifiers are event-specific and are documented
94 along with their associated event specifiers below.
96 Configure the PMC to count events happening at privilege level 0.
98 Configure the PMC to count events occurring at privilege levels 1, 2
106 qualifiers were specified, the default is to enable both.
107 .Ss AMD K8 Event Specifiers
108 The event specifiers supported on AMD K8 PMCs are:
109 .Bl -tag -width indent
110 .It Li k8-bu-cpu-clk-unhalted
112 Count the number of clock cycles when the CPU is not in the HLT or
114 .It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
116 Count fill requests that missed in the L2 cache.
117 This event may be further qualified using
121 separated set of the following keywords:
123 .Bl -tag -width indent -compact
125 Count data cache fill requests.
127 Count instruction cache fill requests.
132 The default is to count all types of requests.
133 .It Li k8-bu-fill-into-l2 Op Li ,mask= Ns Ar qualifier
135 The number of lines written to and from the L2 cache.
136 The event may be further qualified by using
140 separated set of the following keywords:
142 .Bl -tag -width indent -compact
143 .It Li dirty-l2-victim
144 Count lines written into L2 cache due to victim writebacks from the
145 Icache or Dcache, TLB page table walks or hardware data prefetches.
146 .It Li victim-from-l2
147 Count writebacks of dirty lines from L2 to the system.
149 .It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
151 Count internally generated requests to the L2 cache.
152 This event may be further qualified using
156 separated set of the following keywords:
158 .Bl -tag -width indent -compact
160 Count cancelled requests.
162 Count data cache fill requests.
164 Count instruction cache fill requests.
166 Count tag snoop requests.
171 The default is to count all types of requests.
174 Count data cache accesses including microcode scratch pad accesses.
175 .It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
177 Count data cache copyback operations.
178 This event may be further qualified using
182 separated set of the following keywords:
184 .Bl -tag -width indent -compact
186 Count operations for lines in the
190 Count operations for lines in the
194 Count operations for lines in the
198 Count operations for lines in the
202 Count operations for lines in the
207 The default is to count operations for lines in all the
209 .It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
211 Count data cache accesses by lock instructions.
212 This event is only available on processors of revision C or later
214 This event may be further qualified using
218 separated set of the following keywords:
220 .Bl -tag -width indent -compact
222 Count data cache accesses by lock instructions.
224 Count data cache misses by lock instructions.
227 The default is to count all accesses.
228 .It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
230 Count the number of dispatched prefetch instructions.
231 This event may be further qualified using
235 separated set of the following keywords:
237 .Bl -tag -width indent -compact
239 Count load operations.
241 Count non-temporal operations.
243 Count store operations.
246 The default is to count all operations.
247 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
249 Count L1 DTLB misses that are L2 DTLB hits.
250 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
252 Count L1 DTLB misses that are also misses in the L2 DTLB.
253 .It Li k8-dc-microarchitectural-early-cancel-of-an-access
255 Count microarchitectural early cancels of data cache accesses.
256 .It Li k8-dc-microarchitectural-late-cancel-of-an-access
258 Count microarchitectural late cancels of data cache accesses.
259 .It Li k8-dc-misaligned-data-reference
261 Count misaligned data references.
264 Count data cache misses.
265 .It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
267 Count one bit ECC errors found by the scrubber.
268 This event may be further qualified using
272 separated set of the following keywords:
274 .Bl -tag -width indent -compact
276 Count scrubber detected errors.
278 Count piggyback scrubber errors.
281 The default is to count both kinds of errors.
282 .It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
284 Count data cache refills from L2 cache.
285 This event may be further qualified using
289 separated set of the following keywords:
291 .Bl -tag -width indent -compact
293 Count operations for lines in the
297 Count operations for lines in the
301 Count operations for lines in the
305 Count operations for lines in the
309 Count operations for lines in the
314 The default is to count operations for lines in all the
316 .It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
318 Count data cache refills from system memory.
319 This event may be further qualified using
323 separated set of the following keywords:
325 .Bl -tag -width indent -compact
327 Count operations for lines in the
331 Count operations for lines in the
335 Count operations for lines in the
339 Count operations for lines in the
343 Count operations for lines in the
348 The default is to count operations for lines in all the
350 .It Li k8-fp-cycles-with-no-fpu-ops-retired
352 Count cycles when no FPU ops were retired.
353 This event is supported in revision B and later CPUs.
354 .It Li k8-fp-dispatched-fpu-fast-flag-ops
356 Count dispatched FPU ops that use the fast flag interface.
357 This event is supported in revision B and later CPUs.
358 .It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
360 Count the number of dispatched FPU ops.
361 This event is supported in revision B and later CPUs.
362 This event may be further qualified using
366 separated set of the following keywords:
368 .Bl -tag -width indent -compact
369 .It Li add-pipe-excluding-junk-ops
370 Count add pipe ops excluding junk ops.
371 .It Li add-pipe-junk-ops
372 Count junk ops in the add pipe.
373 .It Li multiply-pipe-excluding-junk-ops
374 Count multiply pipe ops excluding junk ops.
375 .It Li multiply-pipe-junk-ops
376 Count junk ops in the multiply pipe.
377 .It Li store-pipe-excluding-junk-ops
378 Count store pipe ops excluding junk ops
379 .It Li store-pipe-junk-ops
380 Count junk ops in the store pipe.
383 The default is to count all types of ops.
384 .It Li k8-fr-decoder-empty
386 Count cycles when there was nothing to dispatch (i.e., the decoder
388 .It Li k8-fr-dispatch-stall-for-segment-load
390 Count dispatch stalls for segment loads.
391 .It Li k8-fr-dispatch-stall-for-serialization
393 Count dispatch stalls for serialization.
394 .It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
396 Count dispatch stalls from branch abort to retiral.
397 .It Li k8-fr-dispatch-stall-when-fpu-is-full
399 Count dispatch stalls when the FPU is full.
400 .It Li k8-fr-dispatch-stall-when-ls-is-full
402 Count dispatch stalls when the load/store unit is full.
403 .It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
405 Count dispatch stalls when the reorder buffer is full.
406 .It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
408 Count dispatch stalls when reservation stations are full.
409 .It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
411 Count dispatch stalls when a far control transfer or a resync branch
413 .It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
415 Count dispatch stalls when waiting for all to be quiet.
416 .\" XXX What does "waiting for all to be quiet" mean?
417 .It Li k8-fr-dispatch-stalls
419 Count all dispatch stalls.
420 .It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
422 Count FPU exceptions.
423 This event is supported in revision B and later CPUs.
424 This event may be further qualified using
428 separated set of the following keywords:
430 .Bl -tag -width indent -compact
431 .It Li sse-and-x87-microtraps
432 Count SSE and x87 microtraps.
433 .It Li sse-reclass-microfaults
434 Count SSE reclass microfaults
435 .It Li sse-retype-microfaults
436 Count SSE retype microfaults
437 .It Li x87-reclass-microfaults
438 Count x87 reclass microfaults.
441 The default is to count all types of exceptions.
442 .It Li k8-fr-interrupts-masked-cycles
444 Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
445 .It Li k8-fr-interrupts-masked-while-pending-cycles
447 Count cycles while interrupts were masked while pending (i.e., cycles
448 when INTR was asserted while CPU RFLAGS field IF was zero).
449 .It Li k8-fr-number-of-breakpoints-for-dr0
451 Count the number of breakpoints for DR0.
452 .It Li k8-fr-number-of-breakpoints-for-dr1
454 Count the number of breakpoints for DR1.
455 .It Li k8-fr-number-of-breakpoints-for-dr2
457 Count the number of breakpoints for DR2.
458 .It Li k8-fr-number-of-breakpoints-for-dr3
460 Count the number of breakpoints for DR3.
461 .It Li k8-fr-retired-branches
463 Count retired branches including exceptions and interrupts.
464 .It Li k8-fr-retired-branches-mispredicted
466 Count mispredicted retired branches.
467 .It Li k8-fr-retired-far-control-transfers
469 Count retired far control transfers (which are always mispredicted).
470 .It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
472 Count retired fastpath double op instructions.
473 This event is supported in revision B and later CPUs.
474 This event may be further qualified using
478 separated set of the following keywords:
480 .Bl -tag -width indent -compact
482 Count instructions with the low op in position 0.
484 Count instructions with the low op in position 1.
486 Count instructions with the low op in position 2.
489 The default is to count all types of instructions.
490 .It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
492 Count retired FPU instructions.
493 This event is supported in revision B and later CPUs.
494 This event may be further qualified using
498 separated set of the following keywords:
500 .Bl -tag -width indent -compact
502 Count MMX and 3DNow!\& instructions.
503 .It Li packed-sse-sse2
504 Count packed SSE and SSE2 instructions.
505 .It Li scalar-sse-sse2
506 Count scalar SSE and SSE2 instructions
508 Count x87 instructions.
511 The default is to count all types of instructions.
512 .It Li k8-fr-retired-near-returns
514 Count retired near returns.
515 .It Li k8-fr-retired-near-returns-mispredicted
517 Count mispredicted near returns.
518 .It Li k8-fr-retired-resyncs
520 Count retired resyncs (non-control transfer branches).
521 .It Li k8-fr-retired-taken-branches
523 Count retired taken branches.
524 .It Li k8-fr-retired-taken-branches-mispredicted
526 Count retired taken branches that were mispredicted.
527 .It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
529 Count retired taken branches that were mispredicted only due to an
531 .It Li k8-fr-retired-taken-hardware-interrupts
533 Count retired taken hardware interrupts.
534 .It Li k8-fr-retired-uops
537 .It Li k8-fr-retired-x86-instructions
539 Count retired x86 instructions including exceptions and interrupts.
542 Count instruction cache fetches.
543 .It Li k8-ic-instruction-fetch-stall
545 Count cycles in stalls due to instruction fetch.
546 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
548 Count L1 ITLB misses that are L2 ITLB hits.
549 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
551 Count ITLB misses that miss in both L1 and L2 ITLBs.
552 .It Li k8-ic-microarchitectural-resync-by-snoop
554 Count microarchitectural resyncs caused by snoops.
557 Count instruction cache misses.
558 .It Li k8-ic-refill-from-l2
560 Count instruction cache refills from L2 cache.
561 .It Li k8-ic-refill-from-system
563 Count instruction cache refills from system memory.
564 .It Li k8-ic-return-stack-hits
566 Count hits to the return stack.
567 .It Li k8-ic-return-stack-overflow
569 Count overflows of the return stack.
570 .It Li k8-ls-buffer2-full
572 Count load/store buffer2 full events.
573 .It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
575 Count locked operations.
576 For revision C and later CPUs, the following qualifiers are supported:
578 .Bl -tag -width indent -compact
579 .It Li cycles-in-request
580 Count the number of cycles in the lock request/grant stage.
581 .It Li cycles-to-complete
582 Count the number of cycles a lock takes to complete once it is
583 non-speculative and is the older load/store operation.
584 .It Li locked-instructions
585 Count the number of lock instructions executed.
588 The default is to count the number of lock instructions executed.
589 .It Li k8-ls-microarchitectural-late-cancel
591 Count microarchitectural late cancels of operations in the load/store
593 .It Li k8-ls-microarchitectural-resync-by-self-modifying-code
595 Count microarchitectural resyncs caused by self-modifying code.
596 .It Li k8-ls-microarchitectural-resync-by-snoop
598 Count microarchitectural resyncs caused by snoops.
599 .It Li k8-ls-retired-cflush-instructions
601 Count retired CFLUSH instructions.
602 .It Li k8-ls-retired-cpuid-instructions
604 Count retired CPUID instructions.
605 .It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
607 Count segment register loads.
608 This event may be further qualified using
612 separated set of the following keywords:
613 .Bl -tag -width indent -compact
615 Count CS register loads.
617 Count DS register loads.
619 Count ES register loads.
621 Count FS register loads.
623 Count GS register loads.
625 .\" Count HS register loads.
626 .\" XXX "HS" register?
628 Count SS register loads.
631 The default is to count all types of loads.
632 .It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
633 .It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
634 .It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
635 .Pq Events F6H, F7H and F8H respectively
636 Count events on the HyperTransport(tm) buses.
637 These events may be further qualified using
641 separated set of the following keywords:
643 .Bl -tag -width indent -compact
644 .It Li buffer-release
645 Count buffer release messages sent.
647 Count command messages sent.
649 Count data messages sent.
651 Count nop messages sent.
654 The default is to count all types of messages.
655 .It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
657 Count memory controller bypass counter saturation events.
658 This event may be further qualified using
662 separated set of the following keywords:
664 .Bl -tag -width indent -compact
665 .It Li dram-controller-interface-bypass
666 Count DRAM controller interface bypass.
667 .It Li dram-controller-queue-bypass
668 Count DRAM controller queue bypass.
669 .It Li memory-controller-hi-pri-bypass
670 Count memory controller high priority bypasses.
671 .It Li memory-controller-lo-pri-bypass
672 Count memory controller low priority bypasses.
674 .It Li k8-nb-memory-controller-dram-slots-missed
676 Count memory controller DRAM command slots missed (in MemClks).
677 .It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
679 Count memory controller page access events.
680 This event may be further qualified using
684 separated set of the following keywords:
686 .Bl -tag -width indent -compact
688 Count page conflicts.
695 The default is to count all types of events.
696 .It Li k8-nb-memory-controller-page-table-overflow
698 Count memory control page table overflow events.
699 .It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
701 Count memory control turnaround events.
702 This event may be further qualified using
706 separated set of the following keywords:
708 .Bl -tag -width indent -compact
709 .\" XXX doc is unclear whether these are cycle counts or event counts
710 .It Li dimm-turnaround
711 Count DIMM turnarounds.
712 .It Li read-to-write-turnaround
713 Count read to write turnarounds.
714 .It Li write-to-read-turnaround
715 Count write to read turnarounds.
718 The default is to count all types of events.
719 .It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
722 This event may be further qualified using
726 separated set of the following keywords:
728 .Bl -tag -width indent -compact
730 Count all probe hits.
731 .It Li probe-hit-dirty-no-memory-cancel
732 Count probe hits without memory cancels.
733 .It Li probe-hit-dirty-with-memory-cancel
734 Count probe hits with memory cancels.
738 .It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
740 Count sized commands issued.
741 This event may be further qualified using
745 separated set of the following keywords:
747 .Bl -tag -width indent -compact
748 .It Li nonpostwrszbyte
749 .It Li nonpostwrszdword
757 The default is to count all types of commands.
759 .Ss Event Name Aliases
760 The following table shows the mapping between the PMC-independent
763 and the underlying hardware events used.
764 .Bl -column "branch-mispredicts" "Description"
765 .It Em Alias Ta Em Event
766 .It Li branches Ta Li k8-fr-retired-taken-branches
767 .It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted
768 .It Li dc-misses Ta Li k8-dc-miss
769 .It Li ic-misses Ta Li k8-ic-miss
770 .It Li instructions Ta Li k8-fr-retired-x86-instructions
771 .It Li interrupts Ta Li k8-fr-taken-hardware-interrupts
772 .It Li unhalted-cycles Ta Li k8-bu-cpu-clk-unhalted
787 library first appeared in
792 library was written by
793 .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .