1 .\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31 .Nd measurement events for
40 AMD K8 PMCs are present in the
45 They are documented in the
47 .%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
48 .%N "Publication No. 26094"
50 .%Q "Advanced Micro Devices, Inc."
53 AMD K8 PMCs are 48 bits wide.
54 Each CPU contains 4 PMCs with the following capabilities:
55 .Bl -column "PMC_CAP_INTERRUPT" "Support"
56 .It Em Capability Ta Em Support
57 .It PMC_CAP_CASCADE Ta \&No
58 .It PMC_CAP_EDGE Ta Yes
59 .It PMC_CAP_INTERRUPT Ta Yes
60 .It PMC_CAP_INVERT Ta Yes
61 .It PMC_CAP_READ Ta Yes
62 .It PMC_CAP_PRECISE Ta \&No
63 .It PMC_CAP_SYSTEM Ta Yes
64 .It PMC_CAP_TAGGING Ta \&No
65 .It PMC_CAP_THRESHOLD Ta Yes
66 .It PMC_CAP_USER Ta Yes
67 .It PMC_CAP_WRITE Ta Yes
70 Event specifiers for AMD K8 PMCs can have the following optional
72 .Bl -tag -width indent
73 .It Li count= Ns Ar value
74 Configure the counter to increment only if the number of configured
75 events measured in a cycle is greater than or equal to
78 Configure the counter to only count negated-to-asserted transitions
79 of the conditions expressed by the other fields.
80 In other words, the counter will increment only once whenever a given
81 condition becomes true, irrespective of the number of clocks during
82 which the condition remains true.
84 Invert the sense of comparison when the
86 qualifier is present, making the counter to increment when the
87 number of events per cycle is less than the value specified by
91 .It Li mask= Ns Ar qualifier
92 Many event specifiers for AMD K8 PMCs need to be additionally
93 qualified using a mask qualifier.
94 These additional qualifiers are event-specific and are documented
95 along with their associated event specifiers below.
97 Configure the PMC to count events happening at privilege level 0.
99 Configure the PMC to count events occurring at privilege levels 1, 2
107 qualifiers were specified, the default is to enable both.
108 .Ss AMD K8 Event Specifiers
109 The event specifiers supported on AMD K8 PMCs are:
110 .Bl -tag -width indent
111 .It Li k8-bu-cpu-clk-unhalted
113 Count the number of clock cycles when the CPU is not in the HLT or
115 .It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
117 Count fill requests that missed in the L2 cache.
118 This event may be further qualified using
122 separated set of the following keywords:
124 .Bl -tag -width indent -compact
126 Count data cache fill requests.
128 Count instruction cache fill requests.
133 The default is to count all types of requests.
134 .It Li k8-bu-fill-into-l2 Op Li ,mask= Ns Ar qualifier
136 The number of lines written to and from the L2 cache.
137 The event may be further qualified by using
141 separated set of the following keywords:
143 .Bl -tag -width indent -compact
144 .It Li dirty-l2-victim
145 Count lines written into L2 cache due to victim writebacks from the
146 Icache or Dcache, TLB page table walks or hardware data prefetches.
147 .It Li victim-from-l2
148 Count writebacks of dirty lines from L2 to the system.
150 .It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
152 Count internally generated requests to the L2 cache.
153 This event may be further qualified using
157 separated set of the following keywords:
159 .Bl -tag -width indent -compact
161 Count cancelled requests.
163 Count data cache fill requests.
165 Count instruction cache fill requests.
167 Count tag snoop requests.
172 The default is to count all types of requests.
175 Count data cache accesses including microcode scratch pad accesses.
176 .It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
178 Count data cache copyback operations.
179 This event may be further qualified using
183 separated set of the following keywords:
185 .Bl -tag -width indent -compact
187 Count operations for lines in the
191 Count operations for lines in the
195 Count operations for lines in the
199 Count operations for lines in the
203 Count operations for lines in the
208 The default is to count operations for lines in all the
210 .It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
212 Count data cache accesses by lock instructions.
213 This event is only available on processors of revision C or later
215 This event may be further qualified using
219 separated set of the following keywords:
221 .Bl -tag -width indent -compact
223 Count data cache accesses by lock instructions.
225 Count data cache misses by lock instructions.
228 The default is to count all accesses.
229 .It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
231 Count the number of dispatched prefetch instructions.
232 This event may be further qualified using
236 separated set of the following keywords:
238 .Bl -tag -width indent -compact
240 Count load operations.
242 Count non-temporal operations.
244 Count store operations.
247 The default is to count all operations.
248 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
250 Count L1 DTLB misses that are L2 DTLB hits.
251 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
253 Count L1 DTLB misses that are also misses in the L2 DTLB.
254 .It Li k8-dc-microarchitectural-early-cancel-of-an-access
256 Count microarchitectural early cancels of data cache accesses.
257 .It Li k8-dc-microarchitectural-late-cancel-of-an-access
259 Count microarchitectural late cancels of data cache accesses.
260 .It Li k8-dc-misaligned-data-reference
262 Count misaligned data references.
265 Count data cache misses.
266 .It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
268 Count one bit ECC errors found by the scrubber.
269 This event may be further qualified using
273 separated set of the following keywords:
275 .Bl -tag -width indent -compact
277 Count scrubber detected errors.
279 Count piggyback scrubber errors.
282 The default is to count both kinds of errors.
283 .It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
285 Count data cache refills from L2 cache.
286 This event may be further qualified using
290 separated set of the following keywords:
292 .Bl -tag -width indent -compact
294 Count operations for lines in the
298 Count operations for lines in the
302 Count operations for lines in the
306 Count operations for lines in the
310 Count operations for lines in the
315 The default is to count operations for lines in all the
317 .It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
319 Count data cache refills from system memory.
320 This event may be further qualified using
324 separated set of the following keywords:
326 .Bl -tag -width indent -compact
328 Count operations for lines in the
332 Count operations for lines in the
336 Count operations for lines in the
340 Count operations for lines in the
344 Count operations for lines in the
349 The default is to count operations for lines in all the
351 .It Li k8-fp-cycles-with-no-fpu-ops-retired
353 Count cycles when no FPU ops were retired.
354 This event is supported in revision B and later CPUs.
355 .It Li k8-fp-dispatched-fpu-fast-flag-ops
357 Count dispatched FPU ops that use the fast flag interface.
358 This event is supported in revision B and later CPUs.
359 .It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
361 Count the number of dispatched FPU ops.
362 This event is supported in revision B and later CPUs.
363 This event may be further qualified using
367 separated set of the following keywords:
369 .Bl -tag -width indent -compact
370 .It Li add-pipe-excluding-junk-ops
371 Count add pipe ops excluding junk ops.
372 .It Li add-pipe-junk-ops
373 Count junk ops in the add pipe.
374 .It Li multiply-pipe-excluding-junk-ops
375 Count multiply pipe ops excluding junk ops.
376 .It Li multiply-pipe-junk-ops
377 Count junk ops in the multiply pipe.
378 .It Li store-pipe-excluding-junk-ops
379 Count store pipe ops excluding junk ops
380 .It Li store-pipe-junk-ops
381 Count junk ops in the store pipe.
384 The default is to count all types of ops.
385 .It Li k8-fr-decoder-empty
387 Count cycles when there was nothing to dispatch (i.e., the decoder
389 .It Li k8-fr-dispatch-stall-for-segment-load
391 Count dispatch stalls for segment loads.
392 .It Li k8-fr-dispatch-stall-for-serialization
394 Count dispatch stalls for serialization.
395 .It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
397 Count dispatch stalls from branch abort to retiral.
398 .It Li k8-fr-dispatch-stall-when-fpu-is-full
400 Count dispatch stalls when the FPU is full.
401 .It Li k8-fr-dispatch-stall-when-ls-is-full
403 Count dispatch stalls when the load/store unit is full.
404 .It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
406 Count dispatch stalls when the reorder buffer is full.
407 .It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
409 Count dispatch stalls when reservation stations are full.
410 .It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
412 Count dispatch stalls when a far control transfer or a resync branch
414 .It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
416 Count dispatch stalls when waiting for all to be quiet.
417 .\" XXX What does "waiting for all to be quiet" mean?
418 .It Li k8-fr-dispatch-stalls
420 Count all dispatch stalls.
421 .It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
423 Count FPU exceptions.
424 This event is supported in revision B and later CPUs.
425 This event may be further qualified using
429 separated set of the following keywords:
431 .Bl -tag -width indent -compact
432 .It Li sse-and-x87-microtraps
433 Count SSE and x87 microtraps.
434 .It Li sse-reclass-microfaults
435 Count SSE reclass microfaults
436 .It Li sse-retype-microfaults
437 Count SSE retype microfaults
438 .It Li x87-reclass-microfaults
439 Count x87 reclass microfaults.
442 The default is to count all types of exceptions.
443 .It Li k8-fr-interrupts-masked-cycles
445 Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
446 .It Li k8-fr-interrupts-masked-while-pending-cycles
448 Count cycles while interrupts were masked while pending (i.e., cycles
449 when INTR was asserted while CPU RFLAGS field IF was zero).
450 .It Li k8-fr-number-of-breakpoints-for-dr0
452 Count the number of breakpoints for DR0.
453 .It Li k8-fr-number-of-breakpoints-for-dr1
455 Count the number of breakpoints for DR1.
456 .It Li k8-fr-number-of-breakpoints-for-dr2
458 Count the number of breakpoints for DR2.
459 .It Li k8-fr-number-of-breakpoints-for-dr3
461 Count the number of breakpoints for DR3.
462 .It Li k8-fr-retired-branches
464 Count retired branches including exceptions and interrupts.
465 .It Li k8-fr-retired-branches-mispredicted
467 Count mispredicted retired branches.
468 .It Li k8-fr-retired-far-control-transfers
470 Count retired far control transfers (which are always mispredicted).
471 .It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
473 Count retired fastpath double op instructions.
474 This event is supported in revision B and later CPUs.
475 This event may be further qualified using
479 separated set of the following keywords:
481 .Bl -tag -width indent -compact
483 Count instructions with the low op in position 0.
485 Count instructions with the low op in position 1.
487 Count instructions with the low op in position 2.
490 The default is to count all types of instructions.
491 .It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
493 Count retired FPU instructions.
494 This event is supported in revision B and later CPUs.
495 This event may be further qualified using
499 separated set of the following keywords:
501 .Bl -tag -width indent -compact
503 Count MMX and 3DNow!\& instructions.
504 .It Li packed-sse-sse2
505 Count packed SSE and SSE2 instructions.
506 .It Li scalar-sse-sse2
507 Count scalar SSE and SSE2 instructions
509 Count x87 instructions.
512 The default is to count all types of instructions.
513 .It Li k8-fr-retired-near-returns
515 Count retired near returns.
516 .It Li k8-fr-retired-near-returns-mispredicted
518 Count mispredicted near returns.
519 .It Li k8-fr-retired-resyncs
521 Count retired resyncs (non-control transfer branches).
522 .It Li k8-fr-retired-taken-branches
524 Count retired taken branches.
525 .It Li k8-fr-retired-taken-branches-mispredicted
527 Count retired taken branches that were mispredicted.
528 .It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
530 Count retired taken branches that were mispredicted only due to an
532 .It Li k8-fr-retired-taken-hardware-interrupts
534 Count retired taken hardware interrupts.
535 .It Li k8-fr-retired-uops
538 .It Li k8-fr-retired-x86-instructions
540 Count retired x86 instructions including exceptions and interrupts.
543 Count instruction cache fetches.
544 .It Li k8-ic-instruction-fetch-stall
546 Count cycles in stalls due to instruction fetch.
547 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
549 Count L1 ITLB misses that are L2 ITLB hits.
550 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
552 Count ITLB misses that miss in both L1 and L2 ITLBs.
553 .It Li k8-ic-microarchitectural-resync-by-snoop
555 Count microarchitectural resyncs caused by snoops.
558 Count instruction cache misses.
559 .It Li k8-ic-refill-from-l2
561 Count instruction cache refills from L2 cache.
562 .It Li k8-ic-refill-from-system
564 Count instruction cache refills from system memory.
565 .It Li k8-ic-return-stack-hits
567 Count hits to the return stack.
568 .It Li k8-ic-return-stack-overflow
570 Count overflows of the return stack.
571 .It Li k8-ls-buffer2-full
573 Count load/store buffer2 full events.
574 .It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
576 Count locked operations.
577 For revision C and later CPUs, the following qualifiers are supported:
579 .Bl -tag -width indent -compact
580 .It Li cycles-in-request
581 Count the number of cycles in the lock request/grant stage.
582 .It Li cycles-to-complete
583 Count the number of cycles a lock takes to complete once it is
584 non-speculative and is the older load/store operation.
585 .It Li locked-instructions
586 Count the number of lock instructions executed.
589 The default is to count the number of lock instructions executed.
590 .It Li k8-ls-microarchitectural-late-cancel
592 Count microarchitectural late cancels of operations in the load/store
594 .It Li k8-ls-microarchitectural-resync-by-self-modifying-code
596 Count microarchitectural resyncs caused by self-modifying code.
597 .It Li k8-ls-microarchitectural-resync-by-snoop
599 Count microarchitectural resyncs caused by snoops.
600 .It Li k8-ls-retired-cflush-instructions
602 Count retired CFLUSH instructions.
603 .It Li k8-ls-retired-cpuid-instructions
605 Count retired CPUID instructions.
606 .It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
608 Count segment register loads.
609 This event may be further qualified using
613 separated set of the following keywords:
614 .Bl -tag -width indent -compact
616 Count CS register loads.
618 Count DS register loads.
620 Count ES register loads.
622 Count FS register loads.
624 Count GS register loads.
626 .\" Count HS register loads.
627 .\" XXX "HS" register?
629 Count SS register loads.
632 The default is to count all types of loads.
633 .It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
634 .It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
635 .It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
636 .Pq Events F6H, F7H and F8H respectively
637 Count events on the HyperTransport(tm) buses.
638 These events may be further qualified using
642 separated set of the following keywords:
644 .Bl -tag -width indent -compact
645 .It Li buffer-release
646 Count buffer release messages sent.
648 Count command messages sent.
650 Count data messages sent.
652 Count nop messages sent.
655 The default is to count all types of messages.
656 .It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
658 Count memory controller bypass counter saturation events.
659 This event may be further qualified using
663 separated set of the following keywords:
665 .Bl -tag -width indent -compact
666 .It Li dram-controller-interface-bypass
667 Count DRAM controller interface bypass.
668 .It Li dram-controller-queue-bypass
669 Count DRAM controller queue bypass.
670 .It Li memory-controller-hi-pri-bypass
671 Count memory controller high priority bypasses.
672 .It Li memory-controller-lo-pri-bypass
673 Count memory controller low priority bypasses.
675 .It Li k8-nb-memory-controller-dram-slots-missed
677 Count memory controller DRAM command slots missed (in MemClks).
678 .It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
680 Count memory controller page access events.
681 This event may be further qualified using
685 separated set of the following keywords:
687 .Bl -tag -width indent -compact
689 Count page conflicts.
696 The default is to count all types of events.
697 .It Li k8-nb-memory-controller-page-table-overflow
699 Count memory control page table overflow events.
700 .It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
702 Count memory control turnaround events.
703 This event may be further qualified using
707 separated set of the following keywords:
709 .Bl -tag -width indent -compact
710 .\" XXX doc is unclear whether these are cycle counts or event counts
711 .It Li dimm-turnaround
712 Count DIMM turnarounds.
713 .It Li read-to-write-turnaround
714 Count read to write turnarounds.
715 .It Li write-to-read-turnaround
716 Count write to read turnarounds.
719 The default is to count all types of events.
720 .It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
723 This event may be further qualified using
727 separated set of the following keywords:
729 .Bl -tag -width indent -compact
731 Count all probe hits.
732 .It Li probe-hit-dirty-no-memory-cancel
733 Count probe hits without memory cancels.
734 .It Li probe-hit-dirty-with-memory-cancel
735 Count probe hits with memory cancels.
739 .It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
741 Count sized commands issued.
742 This event may be further qualified using
746 separated set of the following keywords:
748 .Bl -tag -width indent -compact
749 .It Li nonpostwrszbyte
750 .It Li nonpostwrszdword
758 The default is to count all types of commands.
760 .Ss Event Name Aliases
761 The following table shows the mapping between the PMC-independent
764 and the underlying hardware events used.
765 .Bl -column "branch-mispredicts" "Description"
766 .It Em Alias Ta Em Event
767 .It Li branches Ta Li k8-fr-retired-taken-branches
768 .It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted
769 .It Li dc-misses Ta Li k8-dc-miss
770 .It Li ic-misses Ta Li k8-ic-miss
771 .It Li instructions Ta Li k8-fr-retired-x86-instructions
772 .It Li interrupts Ta Li k8-fr-taken-hardware-interrupts
773 .It Li unhalted-cycles Ta Li k8-bu-cpu-clk-unhalted
792 library first appeared in
797 library was written by
798 .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .