1 .\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29 .Nd measurement events for
38 AMD K8 PMCs are present in the
43 They are documented in the
45 .%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
46 .%N "Publication No. 26094"
48 .%Q "Advanced Micro Devices, Inc."
51 AMD K8 PMCs are 48 bits wide.
52 Each CPU contains 4 PMCs with the following capabilities:
53 .Bl -column "PMC_CAP_INTERRUPT" "Support"
54 .It Em Capability Ta Em Support
55 .It PMC_CAP_CASCADE Ta \&No
56 .It PMC_CAP_EDGE Ta Yes
57 .It PMC_CAP_INTERRUPT Ta Yes
58 .It PMC_CAP_INVERT Ta Yes
59 .It PMC_CAP_READ Ta Yes
60 .It PMC_CAP_PRECISE Ta \&No
61 .It PMC_CAP_SYSTEM Ta Yes
62 .It PMC_CAP_TAGGING Ta \&No
63 .It PMC_CAP_THRESHOLD Ta Yes
64 .It PMC_CAP_USER Ta Yes
65 .It PMC_CAP_WRITE Ta Yes
68 Event specifiers for AMD K8 PMCs can have the following optional
70 .Bl -tag -width indent
71 .It Li count= Ns Ar value
72 Configure the counter to increment only if the number of configured
73 events measured in a cycle is greater than or equal to
76 Configure the counter to only count negated-to-asserted transitions
77 of the conditions expressed by the other fields.
78 In other words, the counter will increment only once whenever a given
79 condition becomes true, irrespective of the number of clocks during
80 which the condition remains true.
82 Invert the sense of comparison when the
84 qualifier is present, making the counter to increment when the
85 number of events per cycle is less than the value specified by
89 .It Li mask= Ns Ar qualifier
90 Many event specifiers for AMD K8 PMCs need to be additionally
91 qualified using a mask qualifier.
92 These additional qualifiers are event-specific and are documented
93 along with their associated event specifiers below.
95 Configure the PMC to count events happening at privilege level 0.
97 Configure the PMC to count events occurring at privilege levels 1, 2
105 qualifiers were specified, the default is to enable both.
106 .Ss AMD K8 Event Specifiers
107 The event specifiers supported on AMD K8 PMCs are:
108 .Bl -tag -width indent
109 .It Li k8-bu-cpu-clk-unhalted
111 Count the number of clock cycles when the CPU is not in the HLT or
113 .It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
115 Count fill requests that missed in the L2 cache.
116 This event may be further qualified using
120 separated set of the following keywords:
122 .Bl -tag -width indent -compact
124 Count data cache fill requests.
126 Count instruction cache fill requests.
131 The default is to count all types of requests.
132 .It Li k8-bu-fill-into-l2 Op Li ,mask= Ns Ar qualifier
134 The number of lines written to and from the L2 cache.
135 The event may be further qualified by using
139 separated set of the following keywords:
141 .Bl -tag -width indent -compact
142 .It Li dirty-l2-victim
143 Count lines written into L2 cache due to victim writebacks from the
144 Icache or Dcache, TLB page table walks or hardware data prefetches.
145 .It Li victim-from-l2
146 Count writebacks of dirty lines from L2 to the system.
148 .It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
150 Count internally generated requests to the L2 cache.
151 This event may be further qualified using
155 separated set of the following keywords:
157 .Bl -tag -width indent -compact
159 Count cancelled requests.
161 Count data cache fill requests.
163 Count instruction cache fill requests.
165 Count tag snoop requests.
170 The default is to count all types of requests.
173 Count data cache accesses including microcode scratch pad accesses.
174 .It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
176 Count data cache copyback operations.
177 This event may be further qualified using
181 separated set of the following keywords:
183 .Bl -tag -width indent -compact
185 Count operations for lines in the
189 Count operations for lines in the
193 Count operations for lines in the
197 Count operations for lines in the
201 Count operations for lines in the
206 The default is to count operations for lines in all the
208 .It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
210 Count data cache accesses by lock instructions.
211 This event is only available on processors of revision C or later
213 This event may be further qualified using
217 separated set of the following keywords:
219 .Bl -tag -width indent -compact
221 Count data cache accesses by lock instructions.
223 Count data cache misses by lock instructions.
226 The default is to count all accesses.
227 .It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
229 Count the number of dispatched prefetch instructions.
230 This event may be further qualified using
234 separated set of the following keywords:
236 .Bl -tag -width indent -compact
238 Count load operations.
240 Count non-temporal operations.
242 Count store operations.
245 The default is to count all operations.
246 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
248 Count L1 DTLB misses that are L2 DTLB hits.
249 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
251 Count L1 DTLB misses that are also misses in the L2 DTLB.
252 .It Li k8-dc-microarchitectural-early-cancel-of-an-access
254 Count microarchitectural early cancels of data cache accesses.
255 .It Li k8-dc-microarchitectural-late-cancel-of-an-access
257 Count microarchitectural late cancels of data cache accesses.
258 .It Li k8-dc-misaligned-data-reference
260 Count misaligned data references.
263 Count data cache misses.
264 .It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
266 Count one bit ECC errors found by the scrubber.
267 This event may be further qualified using
271 separated set of the following keywords:
273 .Bl -tag -width indent -compact
275 Count scrubber detected errors.
277 Count piggyback scrubber errors.
280 The default is to count both kinds of errors.
281 .It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
283 Count data cache refills from L2 cache.
284 This event may be further qualified using
288 separated set of the following keywords:
290 .Bl -tag -width indent -compact
292 Count operations for lines in the
296 Count operations for lines in the
300 Count operations for lines in the
304 Count operations for lines in the
308 Count operations for lines in the
313 The default is to count operations for lines in all the
315 .It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
317 Count data cache refills from system memory.
318 This event may be further qualified using
322 separated set of the following keywords:
324 .Bl -tag -width indent -compact
326 Count operations for lines in the
330 Count operations for lines in the
334 Count operations for lines in the
338 Count operations for lines in the
342 Count operations for lines in the
347 The default is to count operations for lines in all the
349 .It Li k8-fp-cycles-with-no-fpu-ops-retired
351 Count cycles when no FPU ops were retired.
352 This event is supported in revision B and later CPUs.
353 .It Li k8-fp-dispatched-fpu-fast-flag-ops
355 Count dispatched FPU ops that use the fast flag interface.
356 This event is supported in revision B and later CPUs.
357 .It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
359 Count the number of dispatched FPU ops.
360 This event is supported in revision B and later CPUs.
361 This event may be further qualified using
365 separated set of the following keywords:
367 .Bl -tag -width indent -compact
368 .It Li add-pipe-excluding-junk-ops
369 Count add pipe ops excluding junk ops.
370 .It Li add-pipe-junk-ops
371 Count junk ops in the add pipe.
372 .It Li multiply-pipe-excluding-junk-ops
373 Count multiply pipe ops excluding junk ops.
374 .It Li multiply-pipe-junk-ops
375 Count junk ops in the multiply pipe.
376 .It Li store-pipe-excluding-junk-ops
377 Count store pipe ops excluding junk ops
378 .It Li store-pipe-junk-ops
379 Count junk ops in the store pipe.
382 The default is to count all types of ops.
383 .It Li k8-fr-decoder-empty
385 Count cycles when there was nothing to dispatch (i.e., the decoder
387 .It Li k8-fr-dispatch-stall-for-segment-load
389 Count dispatch stalls for segment loads.
390 .It Li k8-fr-dispatch-stall-for-serialization
392 Count dispatch stalls for serialization.
393 .It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
395 Count dispatch stalls from branch abort to retiral.
396 .It Li k8-fr-dispatch-stall-when-fpu-is-full
398 Count dispatch stalls when the FPU is full.
399 .It Li k8-fr-dispatch-stall-when-ls-is-full
401 Count dispatch stalls when the load/store unit is full.
402 .It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
404 Count dispatch stalls when the reorder buffer is full.
405 .It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
407 Count dispatch stalls when reservation stations are full.
408 .It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
410 Count dispatch stalls when a far control transfer or a resync branch
412 .It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
414 Count dispatch stalls when waiting for all to be quiet.
415 .\" XXX What does "waiting for all to be quiet" mean?
416 .It Li k8-fr-dispatch-stalls
418 Count all dispatch stalls.
419 .It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
421 Count FPU exceptions.
422 This event is supported in revision B and later CPUs.
423 This event may be further qualified using
427 separated set of the following keywords:
429 .Bl -tag -width indent -compact
430 .It Li sse-and-x87-microtraps
431 Count SSE and x87 microtraps.
432 .It Li sse-reclass-microfaults
433 Count SSE reclass microfaults
434 .It Li sse-retype-microfaults
435 Count SSE retype microfaults
436 .It Li x87-reclass-microfaults
437 Count x87 reclass microfaults.
440 The default is to count all types of exceptions.
441 .It Li k8-fr-interrupts-masked-cycles
443 Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
444 .It Li k8-fr-interrupts-masked-while-pending-cycles
446 Count cycles while interrupts were masked while pending (i.e., cycles
447 when INTR was asserted while CPU RFLAGS field IF was zero).
448 .It Li k8-fr-number-of-breakpoints-for-dr0
450 Count the number of breakpoints for DR0.
451 .It Li k8-fr-number-of-breakpoints-for-dr1
453 Count the number of breakpoints for DR1.
454 .It Li k8-fr-number-of-breakpoints-for-dr2
456 Count the number of breakpoints for DR2.
457 .It Li k8-fr-number-of-breakpoints-for-dr3
459 Count the number of breakpoints for DR3.
460 .It Li k8-fr-retired-branches
462 Count retired branches including exceptions and interrupts.
463 .It Li k8-fr-retired-branches-mispredicted
465 Count mispredicted retired branches.
466 .It Li k8-fr-retired-far-control-transfers
468 Count retired far control transfers (which are always mispredicted).
469 .It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
471 Count retired fastpath double op instructions.
472 This event is supported in revision B and later CPUs.
473 This event may be further qualified using
477 separated set of the following keywords:
479 .Bl -tag -width indent -compact
481 Count instructions with the low op in position 0.
483 Count instructions with the low op in position 1.
485 Count instructions with the low op in position 2.
488 The default is to count all types of instructions.
489 .It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
491 Count retired FPU instructions.
492 This event is supported in revision B and later CPUs.
493 This event may be further qualified using
497 separated set of the following keywords:
499 .Bl -tag -width indent -compact
501 Count MMX and 3DNow!\& instructions.
502 .It Li packed-sse-sse2
503 Count packed SSE and SSE2 instructions.
504 .It Li scalar-sse-sse2
505 Count scalar SSE and SSE2 instructions
507 Count x87 instructions.
510 The default is to count all types of instructions.
511 .It Li k8-fr-retired-near-returns
513 Count retired near returns.
514 .It Li k8-fr-retired-near-returns-mispredicted
516 Count mispredicted near returns.
517 .It Li k8-fr-retired-resyncs
519 Count retired resyncs (non-control transfer branches).
520 .It Li k8-fr-retired-taken-branches
522 Count retired taken branches.
523 .It Li k8-fr-retired-taken-branches-mispredicted
525 Count retired taken branches that were mispredicted.
526 .It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
528 Count retired taken branches that were mispredicted only due to an
530 .It Li k8-fr-retired-taken-hardware-interrupts
532 Count retired taken hardware interrupts.
533 .It Li k8-fr-retired-uops
536 .It Li k8-fr-retired-x86-instructions
538 Count retired x86 instructions including exceptions and interrupts.
541 Count instruction cache fetches.
542 .It Li k8-ic-instruction-fetch-stall
544 Count cycles in stalls due to instruction fetch.
545 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
547 Count L1 ITLB misses that are L2 ITLB hits.
548 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
550 Count ITLB misses that miss in both L1 and L2 ITLBs.
551 .It Li k8-ic-microarchitectural-resync-by-snoop
553 Count microarchitectural resyncs caused by snoops.
556 Count instruction cache misses.
557 .It Li k8-ic-refill-from-l2
559 Count instruction cache refills from L2 cache.
560 .It Li k8-ic-refill-from-system
562 Count instruction cache refills from system memory.
563 .It Li k8-ic-return-stack-hits
565 Count hits to the return stack.
566 .It Li k8-ic-return-stack-overflow
568 Count overflows of the return stack.
569 .It Li k8-ls-buffer2-full
571 Count load/store buffer2 full events.
572 .It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
574 Count locked operations.
575 For revision C and later CPUs, the following qualifiers are supported:
577 .Bl -tag -width indent -compact
578 .It Li cycles-in-request
579 Count the number of cycles in the lock request/grant stage.
580 .It Li cycles-to-complete
581 Count the number of cycles a lock takes to complete once it is
582 non-speculative and is the older load/store operation.
583 .It Li locked-instructions
584 Count the number of lock instructions executed.
587 The default is to count the number of lock instructions executed.
588 .It Li k8-ls-microarchitectural-late-cancel
590 Count microarchitectural late cancels of operations in the load/store
592 .It Li k8-ls-microarchitectural-resync-by-self-modifying-code
594 Count microarchitectural resyncs caused by self-modifying code.
595 .It Li k8-ls-microarchitectural-resync-by-snoop
597 Count microarchitectural resyncs caused by snoops.
598 .It Li k8-ls-retired-cflush-instructions
600 Count retired CFLUSH instructions.
601 .It Li k8-ls-retired-cpuid-instructions
603 Count retired CPUID instructions.
604 .It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
606 Count segment register loads.
607 This event may be further qualified using
611 separated set of the following keywords:
612 .Bl -tag -width indent -compact
614 Count CS register loads.
616 Count DS register loads.
618 Count ES register loads.
620 Count FS register loads.
622 Count GS register loads.
624 .\" Count HS register loads.
625 .\" XXX "HS" register?
627 Count SS register loads.
630 The default is to count all types of loads.
631 .It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
632 .It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
633 .It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
634 .Pq Events F6H, F7H and F8H respectively
635 Count events on the HyperTransport(tm) buses.
636 These events may be further qualified using
640 separated set of the following keywords:
642 .Bl -tag -width indent -compact
643 .It Li buffer-release
644 Count buffer release messages sent.
646 Count command messages sent.
648 Count data messages sent.
650 Count nop messages sent.
653 The default is to count all types of messages.
654 .It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
656 Count memory controller bypass counter saturation events.
657 This event may be further qualified using
661 separated set of the following keywords:
663 .Bl -tag -width indent -compact
664 .It Li dram-controller-interface-bypass
665 Count DRAM controller interface bypass.
666 .It Li dram-controller-queue-bypass
667 Count DRAM controller queue bypass.
668 .It Li memory-controller-hi-pri-bypass
669 Count memory controller high priority bypasses.
670 .It Li memory-controller-lo-pri-bypass
671 Count memory controller low priority bypasses.
673 .It Li k8-nb-memory-controller-dram-slots-missed
675 Count memory controller DRAM command slots missed (in MemClks).
676 .It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
678 Count memory controller page access events.
679 This event may be further qualified using
683 separated set of the following keywords:
685 .Bl -tag -width indent -compact
687 Count page conflicts.
694 The default is to count all types of events.
695 .It Li k8-nb-memory-controller-page-table-overflow
697 Count memory control page table overflow events.
698 .It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
700 Count memory control turnaround events.
701 This event may be further qualified using
705 separated set of the following keywords:
707 .Bl -tag -width indent -compact
708 .\" XXX doc is unclear whether these are cycle counts or event counts
709 .It Li dimm-turnaround
710 Count DIMM turnarounds.
711 .It Li read-to-write-turnaround
712 Count read to write turnarounds.
713 .It Li write-to-read-turnaround
714 Count write to read turnarounds.
717 The default is to count all types of events.
718 .It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
721 This event may be further qualified using
725 separated set of the following keywords:
727 .Bl -tag -width indent -compact
729 Count all probe hits.
730 .It Li probe-hit-dirty-no-memory-cancel
731 Count probe hits without memory cancels.
732 .It Li probe-hit-dirty-with-memory-cancel
733 Count probe hits with memory cancels.
737 .It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
739 Count sized commands issued.
740 This event may be further qualified using
744 separated set of the following keywords:
746 .Bl -tag -width indent -compact
747 .It Li nonpostwrszbyte
748 .It Li nonpostwrszdword
756 The default is to count all types of commands.
758 .Ss Event Name Aliases
759 The following table shows the mapping between the PMC-independent
762 and the underlying hardware events used.
763 .Bl -column "branch-mispredicts" "Description"
764 .It Em Alias Ta Em Event
765 .It Li branches Ta Li k8-fr-retired-taken-branches
766 .It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted
767 .It Li dc-misses Ta Li k8-dc-miss
768 .It Li ic-misses Ta Li k8-ic-miss
769 .It Li instructions Ta Li k8-fr-retired-x86-instructions
770 .It Li interrupts Ta Li k8-fr-taken-hardware-interrupts
771 .It Li unhalted-cycles Ta Li k8-bu-cpu-clk-unhalted
787 library first appeared in
792 library was written by
793 .An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .