1 .\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" This software is provided by Joseph Koshy ``as is'' and
13 .\" any express or implied warranties, including, but not limited to, the
14 .\" implied warranties of merchantability and fitness for a particular purpose
15 .\" are disclaimed. in no event shall Joseph Koshy be liable
16 .\" for any direct, indirect, incidental, special, exemplary, or consequential
17 .\" damages (including, but not limited to, procurement of substitute goods
18 .\" or services; loss of use, data, or profits; or business interruption)
19 .\" however caused and on any theory of liability, whether in contract, strict
20 .\" liability, or tort (including negligence or otherwise) arising in any way
21 .\" out of the use of this software, even if advised of the possibility of
31 .Nd measurement events for
40 AMD K8 PMCs are present in the
45 They are documented in the
47 .%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
48 .%N "Publication No. 26094"
50 .%Q "Advanced Micro Devices, Inc."
53 AMD K8 PMCs are 48 bits wide.
54 Each CPU contains 4 PMCs with the following capabilities:
55 .Bl -column "PMC_CAP_INTERRUPT" "Support"
56 .It Em Capability Ta Em Support
57 .It PMC_CAP_CASCADE Ta \&No
58 .It PMC_CAP_EDGE Ta Yes
59 .It PMC_CAP_INTERRUPT Ta Yes
60 .It PMC_CAP_INVERT Ta Yes
61 .It PMC_CAP_READ Ta Yes
62 .It PMC_CAP_PRECISE Ta \&No
63 .It PMC_CAP_SYSTEM Ta Yes
64 .It PMC_CAP_TAGGING Ta \&No
65 .It PMC_CAP_THRESHOLD Ta Yes
66 .It PMC_CAP_USER Ta Yes
67 .It PMC_CAP_WRITE Ta Yes
71 Event specifiers for AMD K8 PMCs can have the following optional
73 .Bl -tag -width indent
74 .It Li count= Ns Ar value
75 Configure the counter to increment only if the number of configured
76 events measured in a cycle is greater than or equal to
79 Configure the counter to only count negated-to-asserted transitions
80 of the conditions expressed by the other fields.
81 In other words, the counter will increment only once whenever a given
82 condition becomes true, irrespective of the number of clocks during
83 which the condition remains true.
85 Invert the sense of comparision when the
87 qualifier is present, making the counter to increment when the
88 number of events per cycle is less than the value specified by
92 .It Li mask= Ns Ar qualifier
93 Many event specifiers for AMD K8 PMCs need to be additionally
94 qualified using a mask qualifier.
95 These additional qualifiers are event-specific and are documented
96 along with their associated event specifiers below.
98 Configure the PMC to count events happening at privilege level 0.
100 Configure the PMC to count events occurring at privilege levels 1, 2
108 qualifiers were specified, the default is to enable both.
109 .Ss AMD K8 Event Specifiers
110 The event specifiers supported on AMD K8 PMCs are:
111 .Bl -tag -width indent
112 .It Li k8-bu-cpu-clk-unhalted
114 Count the number of clock cycles when the CPU is not in the HLT or
116 .It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
118 Count fill requests that missed in the L2 cache.
119 This event may be further qualified using
123 separated set of the following keywords:
125 .Bl -tag -width indent -compact
127 Count data cache fill requests.
129 Count instruction cache fill requests.
134 The default is to count all types of requests.
135 .It Li k8-bu-fill-into-l2 Op Li ,mask= Ns Ar qualifier
137 The number of lines written to and from the L2 cache.
138 The event may be further qualified by using
142 separated set of the following keywords:
144 .Bl -tag -width indent -compact
145 .It Li dirty-l2-victim
146 Count lines written into L2 cache due to victim writebacks from the
147 Icache or Dcache, TLB page table walks or hardware data prefetches.
148 .It Li victim-from-l2
149 Count writebacks of dirty lines from L2 to the system.
151 .It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
153 Count internally generated requests to the L2 cache.
154 This event may be further qualified using
158 separated set of the following keywords:
160 .Bl -tag -width indent -compact
162 Count cancelled requests.
164 Count data cache fill requests.
166 Count instruction cache fill requests.
168 Count tag snoop requests.
173 The default is to count all types of requests.
176 Count data cache accesses including microcode scratchpad accesses.
177 .It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
179 Count data cache copyback operations.
180 This event may be further qualified using
184 separated set of the following keywords:
186 .Bl -tag -width indent -compact
188 Count operations for lines in the
192 Count operations for lines in the
196 Count operations for lines in the
200 Count operations for lines in the
204 Count operations for lines in the
209 The default is to count operations for lines in all the
211 .It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
213 Count data cache accesses by lock instructions.
214 This event is only available on processors of revision C or later
216 This event may be further qualified using
220 separated set of the following keywords:
222 .Bl -tag -width indent -compact
224 Count data cache accesses by lock instructions.
226 Count data cache misses by lock instructions.
229 The default is to count all accesses.
230 .It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
232 Count the number of dispatched prefetch instructions.
233 This event may be further qualified using
237 separated set of the following keywords:
239 .Bl -tag -width indent -compact
241 Count load operations.
243 Count non-temporal operations.
245 Count store operations.
248 The default is to count all operations.
249 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
251 Count L1 DTLB misses that are L2 DTLB hits.
252 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
254 Count L1 DTLB misses that are also misses in the L2 DTLB.
255 .It Li k8-dc-microarchitectural-early-cancel-of-an-access
257 Count microarchitectural early cancels of data cache accesses.
258 .It Li k8-dc-microarchitectural-late-cancel-of-an-access
260 Count microarchitectural late cancels of data cache accesses.
261 .It Li k8-dc-misaligned-data-reference
263 Count misaligned data references.
266 Count data cache misses.
267 .It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
269 Count one bit ECC errors found by the scrubber.
270 This event may be further qualified using
274 separated set of the following keywords:
276 .Bl -tag -width indent -compact
278 Count scrubber detected errors.
280 Count piggyback scrubber errors.
283 The default is to count both kinds of errors.
284 .It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
286 Count data cache refills from L2 cache.
287 This event may be further qualified using
291 separated set of the following keywords:
293 .Bl -tag -width indent -compact
295 Count operations for lines in the
299 Count operations for lines in the
303 Count operations for lines in the
307 Count operations for lines in the
311 Count operations for lines in the
316 The default is to count operations for lines in all the
318 .It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
320 Count data cache refills from system memory.
321 This event may be further qualified using
325 separated set of the following keywords:
327 .Bl -tag -width indent -compact
329 Count operations for lines in the
333 Count operations for lines in the
337 Count operations for lines in the
341 Count operations for lines in the
345 Count operations for lines in the
350 The default is to count operations for lines in all the
352 .It Li k8-fp-cycles-with-no-fpu-ops-retired
354 Count cycles when no FPU ops were retired.
355 This event is supported in revision B and later CPUs.
356 .It Li k8-fp-dispatched-fpu-fast-flag-ops
358 Count dispatched FPU ops that use the fast flag interface.
359 This event is supported in revision B and later CPUs.
360 .It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
362 Count the number of dispatched FPU ops.
363 This event is supported in revision B and later CPUs.
364 This event may be further qualified using
368 separated set of the following keywords:
370 .Bl -tag -width indent -compact
371 .It Li add-pipe-excluding-junk-ops
372 Count add pipe ops excluding junk ops.
373 .It Li add-pipe-junk-ops
374 Count junk ops in the add pipe.
375 .It Li multiply-pipe-excluding-junk-ops
376 Count multiply pipe ops excluding junk ops.
377 .It Li multiply-pipe-junk-ops
378 Count junk ops in the multiply pipe.
379 .It Li store-pipe-excluding-junk-ops
380 Count store pipe ops excluding junk ops
381 .It Li store-pipe-junk-ops
382 Count junk ops in the store pipe.
385 The default is to count all types of ops.
386 .It Li k8-fr-decoder-empty
388 Count cycles when there was nothing to dispatch (i.e., the decoder
390 .It Li k8-fr-dispatch-stall-for-segment-load
392 Count dispatch stalls for segment loads.
393 .It Li k8-fr-dispatch-stall-for-serialization
395 Count dispatch stalls for serialization.
396 .It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
398 Count dispatch stalls from branch abort to retiral.
399 .It Li k8-fr-dispatch-stall-when-fpu-is-full
401 Count dispatch stalls when the FPU is full.
402 .It Li k8-fr-dispatch-stall-when-ls-is-full
404 Count dispatch stalls when the load/store unit is full.
405 .It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
407 Count dispatch stalls when the reorder buffer is full.
408 .It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
410 Count dispatch stalls when reservation stations are full.
411 .It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
413 Count dispatch stalls when a far control transfer or a resync branch
415 .It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
417 Count dispatch stalls when waiting for all to be quiet.
418 .\" XXX What does "waiting for all to be quiet" mean?
419 .It Li k8-fr-dispatch-stalls
421 Count all dispatch stalls.
422 .It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
424 Count FPU exceptions.
425 This event is supported in revision B and later CPUs.
426 This event may be further qualified using
430 separated set of the following keywords:
432 .Bl -tag -width indent -compact
433 .It Li sse-and-x87-microtraps
434 Count SSE and x87 microtraps.
435 .It Li sse-reclass-microfaults
436 Count SSE reclass microfaults
437 .It Li sse-retype-microfaults
438 Count SSE retype microfaults
439 .It Li x87-reclass-microfaults
440 Count x87 reclass microfaults.
443 The default is to count all types of exceptions.
444 .It Li k8-fr-interrupts-masked-cycles
446 Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
447 .It Li k8-fr-interrupts-masked-while-pending-cycles
449 Count cycles while interrupts were masked while pending (i.e., cycles
450 when INTR was asserted while CPU RFLAGS field IF was zero).
451 .It Li k8-fr-number-of-breakpoints-for-dr0
453 Count the number of breakpoints for DR0.
454 .It Li k8-fr-number-of-breakpoints-for-dr1
456 Count the number of breakpoints for DR1.
457 .It Li k8-fr-number-of-breakpoints-for-dr2
459 Count the number of breakpoints for DR2.
460 .It Li k8-fr-number-of-breakpoints-for-dr3
462 Count the number of breakpoints for DR3.
463 .It Li k8-fr-retired-branches
465 Count retired branches including exceptions and interrupts.
466 .It Li k8-fr-retired-branches-mispredicted
468 Count mispredicted retired branches.
469 .It Li k8-fr-retired-far-control-transfers
471 Count retired far control transfers (which are always mispredicted).
472 .It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
474 Count retired fastpath double op instructions.
475 This event is supported in revision B and later CPUs.
476 This event may be further qualified using
480 separated set of the following keywords:
482 .Bl -tag -width indent -compact
484 Count instructions with the low op in position 0.
486 Count instructions with the low op in position 1.
488 Count instructions with the low op in position 2.
491 The default is to count all types of instructions.
492 .It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
494 Count retired FPU instructions.
495 This event is supported in revision B and later CPUs.
496 This event may be further qualified using
500 separated set of the following keywords:
502 .Bl -tag -width indent -compact
504 Count MMX and 3DNow!\& instructions.
505 .It Li packed-sse-sse2
506 Count packed SSE and SSE2 instructions.
507 .It Li scalar-sse-sse2
508 Count scalar SSE and SSE2 instructions
510 Count x87 instructions.
513 The default is to count all types of instructions.
514 .It Li k8-fr-retired-near-returns
516 Count retired near returns.
517 .It Li k8-fr-retired-near-returns-mispredicted
519 Count mispredicted near returns.
520 .It Li k8-fr-retired-resyncs
522 Count retired resyncs (non-control transfer branches).
523 .It Li k8-fr-retired-taken-branches
525 Count retired taken branches.
526 .It Li k8-fr-retired-taken-branches-mispredicted
528 Count retired taken branches that were mispredicted.
529 .It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
531 Count retired taken branches that were mispredicted only due to an
533 .It Li k8-fr-retired-taken-hardware-interrupts
535 Count retired taken hardware interrupts.
536 .It Li k8-fr-retired-uops
539 .It Li k8-fr-retired-x86-instructions
541 Count retired x86 instructions including exceptions and interrupts.
544 Count instruction cache fetches.
545 .It Li k8-ic-instruction-fetch-stall
547 Count cycles in stalls due to instruction fetch.
548 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
550 Count L1 ITLB misses that are L2 ITLB hits.
551 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
553 Count ITLB misses that miss in both L1 and L2 ITLBs.
554 .It Li k8-ic-microarchitectural-resync-by-snoop
556 Count microarchitectural resyncs caused by snoops.
559 Count instruction cache misses.
560 .It Li k8-ic-refill-from-l2
562 Count instruction cache refills from L2 cache.
563 .It Li k8-ic-refill-from-system
565 Count instruction cache refills from system memory.
566 .It Li k8-ic-return-stack-hits
568 Count hits to the return stack.
569 .It Li k8-ic-return-stack-overflow
571 Count overflows of the return stack.
572 .It Li k8-ls-buffer2-full
574 Count load/store buffer2 full events.
575 .It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
577 Count locked operations.
578 For revision C and later CPUs, the following qualifiers are supported:
580 .Bl -tag -width indent -compact
581 .It Li cycles-in-request
582 Count the number of cycles in the lock request/grant stage.
583 .It Li cycles-to-complete
584 Count the number of cycles a lock takes to complete once it is
585 non-speculative and is the older load/store operation.
586 .It Li locked-instructions
587 Count the number of lock instructions executed.
590 The default is to count the number of lock instructions executed.
591 .It Li k8-ls-microarchitectural-late-cancel
593 Count microarchitectural late cancels of operations in the load/store
595 .It Li k8-ls-microarchitectural-resync-by-self-modifying-code
597 Count microarchitectural resyncs caused by self-modifying code.
598 .It Li k8-ls-microarchitectural-resync-by-snoop
600 Count microarchitectural resyncs caused by snoops.
601 .It Li k8-ls-retired-cflush-instructions
603 Count retired CFLUSH instructions.
604 .It Li k8-ls-retired-cpuid-instructions
606 Count retired CPUID instructions.
607 .It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
609 Count segment register loads.
610 This event may be further qualified using
614 separated set of the following keywords:
615 .Bl -tag -width indent -compact
617 Count CS register loads.
619 Count DS register loads.
621 Count ES register loads.
623 Count FS register loads.
625 Count GS register loads.
627 .\" Count HS register loads.
628 .\" XXX "HS" register?
630 Count SS register loads.
633 The default is to count all types of loads.
634 .It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
635 .It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
636 .It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
637 .Pq Events F6H, F7H and F8H respectively
638 Count events on the HyperTransport(tm) buses.
639 These events may be further qualified using
643 separated set of the following keywords:
645 .Bl -tag -width indent -compact
646 .It Li buffer-release
647 Count buffer release messages sent.
649 Count command messages sent.
651 Count data messages sent.
653 Count nop messages sent.
656 The default is to count all types of messages.
657 .It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
659 Count memory controller bypass counter saturation events.
660 This event may be further qualified using
664 separated set of the following keywords:
666 .Bl -tag -width indent -compact
667 .It Li dram-controller-interface-bypass
668 Count DRAM controller interface bypass.
669 .It Li dram-controller-queue-bypass
670 Count DRAM controller queue bypass.
671 .It Li memory-controller-hi-pri-bypass
672 Count memory controller high priority bypasses.
673 .It Li memory-controller-lo-pri-bypass
674 Count memory controller low priority bypasses.
677 .It Li k8-nb-memory-controller-dram-slots-missed
679 Count memory controller DRAM command slots missed (in MemClks).
680 .It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
682 Count memory controller page access events.
683 This event may be further qualified using
687 separated set of the following keywords:
689 .Bl -tag -width indent -compact
691 Count page conflicts.
698 The default is to count all types of events.
699 .It Li k8-nb-memory-controller-page-table-overflow
701 Count memory control page table overflow events.
702 .It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
704 Count memory control turnaround events.
705 This event may be further qualified using
709 separated set of the following keywords:
711 .Bl -tag -width indent -compact
712 .\" XXX doc is unclear whether these are cycle counts or event counts
713 .It Li dimm-turnaround
714 Count DIMM turnarounds.
715 .It Li read-to-write-turnaround
716 Count read to write turnarounds.
717 .It Li write-to-read-turnaround
718 Count write to read turnarounds.
721 The default is to count all types of events.
722 .It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
725 This event may be further qualified using
729 separated set of the following keywords:
731 .Bl -tag -width indent -compact
733 Count all probe hits.
734 .It Li probe-hit-dirty-no-memory-cancel
735 Count probe hits without memory cancels.
736 .It Li probe-hit-dirty-with-memory-cancel
737 Count probe hits with memory cancels.
741 .It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
743 Count sized commands issued.
744 This event may be further qualified using
748 separated set of the following keywords:
750 .Bl -tag -width indent -compact
751 .It Li nonpostwrszbyte
752 .It Li nonpostwrszdword
760 The default is to count all types of commands.
762 .Ss Event Name Aliases
763 The following table shows the mapping between the PMC-independent
766 and the underlying hardware events used.
767 .Bl -column "branch-mispredicts" "Description"
768 .It Em Alias Ta Em Event
769 .It Li branches Ta Li k8-fr-retired-taken-branches
770 .It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted
771 .It Li dc-misses Ta Li k8-dc-miss
772 .It Li ic-misses Ta Li k8-ic-miss
773 .It Li instructions Ta Li k8-fr-retired-x86-instructions
774 .It Li interrupts Ta Li k8-fr-taken-hardware-interrupts
775 .It Li unhalted-cycles Ta Li k8-bu-cpu-clk-unhalted
793 library first appeared in
798 library was written by
800 .Aq jkoshy@FreeBSD.org .