1 .\" Copyright (c) 2003-2006 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" This software is provided by Joseph Koshy ``as is'' and
13 .\" any express or implied warranties, including, but not limited to, the
14 .\" implied warranties of merchantability and fitness for a particular purpose
15 .\" are disclaimed. in no event shall Joseph Koshy be liable
16 .\" for any direct, indirect, incidental, special, exemplary, or consequential
17 .\" damages (including, but not limited to, procurement of substitute goods
18 .\" or services; loss of use, data, or profits; or business interruption)
19 .\" however caused and on any theory of liability, whether in contract, strict
20 .\" liability, or tort (including negligence or otherwise) arising in any way
21 .\" out of the use of this software, even if advised of the possibility of
32 .Nm pmc_capabilities ,
33 .Nm pmc_configure_logfile ,
38 .Nm pmc_event_names_of_class ,
39 .Nm pmc_flush_logfile ,
40 .Nm pmc_get_driver_stats ,
43 .Nm pmc_name_of_capability ,
44 .Nm pmc_name_of_class ,
45 .Nm pmc_name_of_cputype ,
46 .Nm pmc_name_of_event ,
47 .Nm pmc_name_of_mode ,
48 .Nm pmc_name_of_state ,
61 .Nd programming API for using hardware performance monitoring counters
68 .Fa "const char *eventspecifier"
69 .Fa "enum pmc_mode mode"
75 .Fn pmc_attach "pmc_id_t pmcid" "pid_t pid"
77 .Fn pmc_capabilities "pmc_id_t pmc" "uint32_t *caps"
79 .Fn pmc_configure_logfile "int fd"
81 .Fn pmc_cpuinfo "const struct pmc_cpuinfo **cpu_info"
83 .Fn pmc_detach "pmc_id_t pmcid" "pid_t pid"
85 .Fn pmc_disable "int cpu" "int pmc"
87 .Fn pmc_enable "int cpu" "int pmc"
89 .Fo pmc_event_names_of_class
90 .Fa "enum pmc_class cl"
91 .Fa "const char ***eventnames"
95 .Fn pmc_flush_logfile void
97 .Fn pmc_get_driver_stats "struct pmc_driverstats *gms"
99 .Fn pmc_get_msr "pmc_id_t pmc" "uint32_t *msr"
103 .Fn pmc_name_of_capability "enum pmc_caps pc"
105 .Fn pmc_name_of_class "enum pmc_class pc"
107 .Fn pmc_name_of_cputype "enum pmc_cputype ct"
109 .Fn pmc_name_of_disposition "enum pmc_disp pd"
111 .Fn pmc_name_of_event "enum pmc_event pe"
113 .Fn pmc_name_of_mode "enum pmc_mode pm"
115 .Fn pmc_name_of_state "enum pmc_state ps"
119 .Fn pmc_npmc "int cpu"
121 .Fn pmc_pmcinfo "int cpu" "struct pmc_pmcinfo **pmc_info"
123 .Fn pmc_read "pmc_id_t pmc" "pmc_value_t *value"
125 .Fn pmc_release "pmc_id_t pmc"
127 .Fn pmc_rw "pmc_id_t pmc" "pmc_value_t newvalue" "pmc_value_t *oldvaluep"
129 .Fn pmc_set "pmc_id_t pmc" "pmc_value_t value"
131 .Fn pmc_start "pmc_id_t pmc"
133 .Fn pmc_stop "pmc_id_t pmc"
135 .Fn pmc_write "pmc_id_t pmc" "pmc_value_t value"
137 .Fn pmc_writelog "uint32_t userdata"
139 .Fn pmc_width "pmc_id_t pmc" "uint32_t *width"
141 These functions implement a high-level library for using the
142 system's hardware performance counters.
144 PMCs are allocated using
150 Allocated PMCs may be started or stopped at any time using
155 An allocated PMC may be of
157 scope, meaning that the PMC measures system-wide events, or
159 scope, meaning that the PMC only counts hardware events when
160 the allocating process (or, optionally, its children)
163 PMCs may further be in
164 .Dq "counting mode" ,
166 .Dq "sampling mode" .
167 Sampling mode PMCs deliver an interrupt to the CPU after
168 a configured number of hardware events have been seen.
169 A process-private sampling mode PMC will cause its owner
170 process to get periodic
172 interrupts, while a global sampling mode PMC is used to
173 do system-wide statistical sampling (see
175 The sampling rate desired of a sampling-mode PMC is set using
177 Counting mode PMCs do not interrupt the CPU; their values
181 System-wide statistical sampling is configured by allocating
182 at least one sampling mode PMC with
183 global scope, and when a log file is configured using
184 .Fn pmc_configure_logfile .
187 driver manages system-wide statistical sampling; for more
188 information please see
190 .Ss Application Programming Interface
196 This function must be called first, before any of the other
197 functions in the library.
201 allocates a counter that counts the events named by
203 and writes the allocated counter ID to
207 comprises an PMC event name followed by an optional comma separated
208 list of keywords and qualifiers.
209 The allowed syntax for
211 is processor architecture specific and is listed in section
212 .Sx "EVENT SPECIFIERS"
214 The desired PMC mode is specified by
216 and any mode specific modifiers are specified using
220 argument is the value
222 or names the CPU the allocation is to be on.
223 Requesting a specific CPU only makes sense for global PMCs;
224 process-private PMC allocations should always specify
227 By default, a PMC configured in process-virtual counting mode is set up
228 to profile its owner process.
231 may be used to attach the PMC to a different process.
233 needs to be called before the counter is first started
238 may be used to detach a PMC from a process it was attached to
239 using a prior call to
244 releases a PMC previously allocated with
246 This function call implicitly detaches the PMC from all its target
249 An allocated PMC may be started and stopped using
255 The current value of a PMC may be read with
259 provided the underlying hardware supports these operations on
261 The read and write operation may be combined using
268 to a bitmask of capabilities supported by the PMC denoted by
275 to the width of the PMC denoted by argument
279 .Fn pmc_configure_logfile
282 driver to log performance data to file corresponding
283 to the process' file handle
287 is \-1, then any previously configured logging is reset
288 and all data queued to be written are discarded.
291 .Fn pmc_flush_logfile
292 function will send all data queued inside the
294 driver to the configured log file before returning.
297 function will append a log entry containing the argument
303 configures a sampling PMC
310 sets the initial value of the PMC to
314 .Fn pmc_get_driver_statistics
315 copies a snapshot of the usage statistics maintained by
317 into the memory area pointed to by argument
319 .Ss Signal Handling Requirements
320 Applications using PMCs are required to handle the following signals:
321 .Bl -tag -width indent
325 module is unloaded using
327 processes that have PMCs allocated to them will be sent a
333 driver will send a PMC owning process a
338 If any process-mode PMC allocated by it loses all its
341 If the driver encounters an error when writing log data to a
343 This error may be retrieved by a subsequent call to
344 .Fn pmc_flush_logfile .
347 .Ss Convenience Functions
350 returns the number of CPUs present in the system.
354 returns the number of PMCs supported on CPU
360 to point to an internal structure with information about the system's CPUs.
361 The caller should not
366 returns information about the current state of CPU
369 This function sets argument
371 to point to a memory area allocated with
373 The caller is expected to
378 .Fn pmc_name_of_capability ,
379 .Fn pmc_name_of_class ,
380 .Fn pmc_name_of_cputype ,
381 .Fn pmc_name_of_disposition ,
382 .Fn pmc_name_of_event ,
385 .Fn pmc_name_of_state
386 are useful for code wanting to print error messages.
389 pointers to human-readable representations of their arguments.
390 These return values should not be freed using
394 .Fn pmc_event_names_of_class
395 returns a list of event names supported by a given PMC class
397 On successful return, an array of
399 pointers to the names of valid events supported by class
401 is allocated by the library using
403 and a pointer to this array is returned in the location pointed to by
405 The number of pointers allocated is returned in the location pointed
409 Individual PMCs may be enabled or disabled on a given CPU using
416 is the CPU number, and
418 is the index of the PMC to be operated on.
419 Only the super-user is allowed to enable and disable PMCs.
420 .Ss x86 Architecture Specific API
423 function returns the processor model specific register number
426 Applications may use the x86
428 instruction to directly read the contents of the PMC.
430 Event specifiers are strings comprising of an event name, followed by
431 optional parameters modifying the semantics of the hardware event
433 Event names are PMC architecture dependent, but the
435 library defines machine independent aliases for commonly used
437 .Ss Event Name Aliases
438 Event name aliases are CPU architecture independent names for commonly
440 The following aliases are known to this version of the
443 .Bl -tag -width indent
445 Measure the number of branches retired.
446 .It Li branch-mispredicts
447 Measure the number of retired branches that were mispredicted.
449 Measure processor cycles.
450 This event is implemented using the processor's Time Stamp Counter
453 Measure the number of data cache misses.
455 Measure the number of instruction cache misses.
457 Measure the number of instructions retired.
459 Measure the number of interrupts seen.
460 .It Li unhalted-cycles
461 Measure the number of cycles the processor is not in a halted
464 .Ss Time Stamp Counter (TSC)
465 The timestamp counter is a monotonically non-decreasing counter that
466 counts processor cycles.
468 In the i386 architecture, this counter may
469 be selected by requesting an event with event specifier
473 event does not support any further qualifiers.
474 It can only be allocated in system-wide counting mode,
475 and is a read-only counter.
476 Multiple processes are allowed to allocate the TSC.
477 Once allocated, it may be read using the
479 function, or by using the RDTSC instruction.
481 These PMCs are present in the
483 series of CPUs and are documented in:
485 .%B "AMD Athlon Processor x86 Code Optimization Guide"
486 .%N "Publication No. 22007"
488 .%Q "Advanced Micro Devices, Inc."
491 Event specifiers for AMD K7 PMCs can have the following optional
493 .Bl -tag -width indent
494 .It Li count= Ns Ar value
495 Configure the counter to increment only if the number of configured
496 events measured in a cycle is greater than or equal to
499 Configure the counter to only count negated-to-asserted transitions
500 of the conditions expressed by the other qualifiers.
501 In other words, the counter will increment only once whenever a given
502 condition becomes true, irrespective of the number of clocks during
503 which the condition remains true.
505 Invert the sense of comparision when the
507 qualifier is present, making the counter to increment when the
508 number of events per cycle is less than the value specified by
513 Configure the PMC to count events happening at privilege level 0.
514 .It Li unitmask= Ns Ar mask
515 This qualifier is used to further qualify a select few events,
516 .Dq Li k7-dc-refills-from-l2 ,
517 .Dq Li k7-dc-refills-from-system
519 .Dq Li k7-dc-writebacks .
522 is a string of the following characters optionally separated by
526 .Bl -tag -width indent -compact
528 Count operations for lines in the
532 Count operations for lines in the
536 Count operations for lines in the
540 Count operations for lines in the
544 Count operations for lines in the
551 qualifier is specified, the default is to count events for caches
552 lines in any of the above states.
554 Configure the PMC to count events occurring at privilege levels 1, 2
562 qualifiers were specified, the default is to enable both.
564 The event specifiers supported on AMD K7 PMCs are:
565 .Bl -tag -width indent
566 .It Li k7-dc-accesses
567 Count data cache accesses.
569 Count data cache misses.
570 .It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask
571 Count data cache refills from L2 cache.
572 This event may be further qualified using the
575 .It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask
576 Count data cache refills from system memory.
577 This event may be further qualified using the
580 .It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask
581 Count data cache writebacks.
582 This event may be further qualified using the
585 .It Li k7-l1-dtlb-miss-and-l2-dtlb-hits
586 Count L1 DTLB misses and L2 DTLB hits.
587 .It Li k7-l1-and-l2-dtlb-misses
588 Count L1 and L2 DTLB misses.
589 .It Li k7-misaligned-references
590 Count misaligned data references.
592 Count instruction cache fetches.
594 Count instruction cache misses.
595 .It Li k7-l1-itlb-misses
596 Count L1 ITLB misses that are L2 ITLB hits.
597 .It Li k7-l1-l2-itlb-misses
598 Count L1 (and L2) ITLB misses.
599 .It Li k7-retired-instructions
600 Count all retired instructions.
601 .It Li k7-retired-ops
603 .It Li k7-retired-branches
604 Count all retired branches (conditional, unconditional, exceptions
606 .It Li k7-retired-branches-mispredicted
607 Count all misprediced retired branches.
608 .It Li k7-retired-taken-branches
609 Count retired taken branches.
610 .It Li k7-retired-taken-branches-mispredicted
611 Count mispredicted taken branches that were retired.
612 .It Li k7-retired-far-control-transfers
613 Count retired far control transfers.
614 .It Li k7-retired-resync-branches
615 Count retired resync branches (non control transfer branches).
616 .It Li k7-interrupts-masked-cycles
617 Count the number of cycles when the processor's
620 .It Li k7-interrupts-masked-while-pending-cycles
621 Count the number of cycles interrupts were masked while pending due
625 .It Li k7-hardware-interrupts
626 Count the number of taken hardware interrupts.
629 These PMCs are present in the
634 They are documented in:
636 .%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
637 .%N "Publication No. 26094"
639 .%Q "Advanced Micro Devices, Inc."
642 Event specifiers for AMD K8 PMCs can have the following optional
644 .Bl -tag -width indent
645 .It Li count= Ns Ar value
646 Configure the counter to increment only if the number of configured
647 events measured in a cycle is greater than or equal to
650 Configure the counter to only count negated-to-asserted transitions
651 of the conditions expressed by the other fields.
652 In other words, the counter will increment only once whenever a given
653 condition becomes true, irrespective of the number of clocks during
654 which the condition remains true.
656 Invert the sense of comparision when the
658 qualifier is present, making the counter to increment when the
659 number of events per cycle is less than the value specified by
663 .It Li mask= Ns Ar qualifier
664 Many event specifiers for AMD K8 PMCs need to be additionally
665 qualified using a mask qualifier.
666 These additional qualifiers are event-specific and are documented
667 along with their associated event specifiers below.
669 Configure the PMC to count events happening at privilege level 0.
671 Configure the PMC to count events occurring at privilege levels 1, 2
679 qualifiers were specified, the default is to enable both.
681 The event specifiers supported on AMD K8 PMCs are:
682 .Bl -tag -width indent
683 .It Li k8-bu-cpu-clk-unhalted
684 Count the number of clock cycles when the CPU is not in the HLT or
686 .It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
687 Count fill requests that missed in the L2 cache.
688 This event may be further qualified using
692 separated set of the following keywords:
694 .Bl -tag -width indent -compact
696 Count data cache fill requests.
698 Count instruction cache fill requests.
703 The default is to count all types of requests.
704 .It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
705 Count internally generated requests to the L2 cache.
706 This event may be further qualified using
710 separated set of the following keywords:
712 .Bl -tag -width indent -compact
714 Count cancelled requests.
716 Count data cache fill requests.
718 Count instruction cache fill requests.
720 Count tag snoop requests.
725 The default is to count all types of requests.
727 Count data cache accesses including microcode scratchpad accesses.
728 .It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
729 Count data cache copyback operations.
730 This event may be further qualified using
734 separated set of the following keywords:
736 .Bl -tag -width indent -compact
738 Count operations for lines in the
742 Count operations for lines in the
746 Count operations for lines in the
750 Count operations for lines in the
754 Count operations for lines in the
759 The default is to count operations for lines in all the
761 .It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
762 Count data cache accesses by lock instructions.
763 This event is only available on processors of revision C or later
765 This event may be further qualified using
769 separated set of the following keywords:
771 .Bl -tag -width indent -compact
773 Count data cache accesses by lock instructions.
775 Count data cache misses by lock instructions.
778 The default is to count all accesses.
779 .It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
780 Count the number of dispatched prefetch instructions.
781 This event may be further qualified using
785 separated set of the following keywords:
787 .Bl -tag -width indent -compact
789 Count load operations.
791 Count non-temporal operations.
793 Count store operations.
796 The default is to count all operations.
797 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
798 Count L1 DTLB misses that are L2 DTLB hits.
799 .It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
800 Count L1 DTLB misses that are also misses in the L2 DTLB.
801 .It Li k8-dc-microarchitectural-early-cancel-of-an-access
802 Count microarchitectural early cancels of data cache accesses.
803 .It Li k8-dc-microarchitectural-late-cancel-of-an-access
804 Count microarchitectural late cancels of data cache accesses.
805 .It Li k8-dc-misaligned-data-reference
806 Count misaligned data references.
808 Count data cache misses.
809 .It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
810 Count one bit ECC errors found by the scrubber.
811 This event may be further qualified using
815 separated set of the following keywords:
817 .Bl -tag -width indent -compact
819 Count scrubber detected errors.
821 Count piggyback scrubber errors.
824 The default is to count both kinds of errors.
825 .It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
826 Count data cache refills from L2 cache.
827 This event may be further qualified using
831 separated set of the following keywords:
833 .Bl -tag -width indent -compact
835 Count operations for lines in the
839 Count operations for lines in the
843 Count operations for lines in the
847 Count operations for lines in the
851 Count operations for lines in the
856 The default is to count operations for lines in all the
858 .It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
859 Count data cache refills from system memory.
860 This event may be further qualified using
864 separated set of the following keywords:
866 .Bl -tag -width indent -compact
868 Count operations for lines in the
872 Count operations for lines in the
876 Count operations for lines in the
880 Count operations for lines in the
884 Count operations for lines in the
889 The default is to count operations for lines in all the
891 .It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
892 Count the number of dispatched FPU ops.
893 This event is supported in revision B and later CPUs.
894 This event may be further qualified using
898 separated set of the following keywords:
900 .Bl -tag -width indent -compact
901 .It Li add-pipe-excluding-junk-ops
902 Count add pipe ops excluding junk ops.
903 .It Li add-pipe-junk-ops
904 Count junk ops in the add pipe.
905 .It Li multiply-pipe-excluding-junk-ops
906 Count multiply pipe ops excluding junk ops.
907 .It Li multiply-pipe-junk-ops
908 Count junk ops in the multiply pipe.
909 .It Li store-pipe-excluding-junk-ops
910 Count store pipe ops excluding junk ops
911 .It Li store-pipe-junk-ops
912 Count junk ops in the store pipe.
915 The default is to count all types of ops.
916 .It Li k8-fp-cycles-with-no-fpu-ops-retired
917 Count cycles when no FPU ops were retired.
918 This event is supported in revision B and later CPUs.
919 .It Li k8-fp-dispatched-fpu-fast-flag-ops
920 Count dispatched FPU ops that use the fast flag interface.
921 This event is supported in revision B and later CPUs.
922 .It Li k8-fr-decoder-empty
923 Count cycles when there was nothing to dispatch (i.e., the decoder
925 .It Li k8-fr-dispatch-stalls
926 Count all dispatch stalls.
927 .It Li k8-fr-dispatch-stall-for-segment-load
928 Count dispatch stalls for segment loads.
929 .It Li k8-fr-dispatch-stall-for-serialization
930 Count dispatch stalls for serialization.
931 .It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
932 Count dispatch stalls from branch abort to retiral.
933 .It Li k8-fr-dispatch-stall-when-fpu-is-full
934 Count dispatch stalls when the FPU is full.
935 .It Li k8-fr-dispatch-stall-when-ls-is-full
936 Count dispatch stalls when the load/store unit is full.
937 .It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
938 Count dispatch stalls when the reorder buffer is full.
939 .It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
940 Count dispatch stalls when reservation stations are full.
941 .It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
942 Count dispatch stalls when waiting for all to be quiet.
943 .\" XXX What does "waiting for all to be quiet" mean?
944 .It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
945 Count dispatch stalls when a far control transfer or a resync branch
947 .It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
948 Count FPU exceptions.
949 This event is supported in revision B and later CPUs.
950 This event may be further qualified using
954 separated set of the following keywords:
956 .Bl -tag -width indent -compact
957 .It Li sse-and-x87-microtraps
958 Count SSE and x87 microtraps.
959 .It Li sse-reclass-microfaults
960 Count SSE reclass microfaults
961 .It Li sse-retype-microfaults
962 Count SSE retype microfaults
963 .It Li x87-reclass-microfaults
964 Count x87 reclass microfaults.
967 The default is to count all types of exceptions.
968 .It Li k8-fr-interrupts-masked-cycles
969 Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
970 .It Li k8-fr-interrupts-masked-while-pending-cycles
971 Count cycles while interrupts were masked while pending (i.e., cycles
972 when INTR was asserted while CPU RFLAGS field IF was zero).
973 .It Li k8-fr-number-of-breakpoints-for-dr0
974 Count the number of breakpoints for DR0.
975 .It Li k8-fr-number-of-breakpoints-for-dr1
976 Count the number of breakpoints for DR1.
977 .It Li k8-fr-number-of-breakpoints-for-dr2
978 Count the number of breakpoints for DR2.
979 .It Li k8-fr-number-of-breakpoints-for-dr3
980 Count the number of breakpoints for DR3.
981 .It Li k8-fr-retired-branches
982 Count retired branches including exceptions and interrupts.
983 .It Li k8-fr-retired-branches-mispredicted
984 Count mispredicted retired branches.
985 .It Li k8-fr-retired-far-control-transfers
986 Count retired far control transfers (which are always mispredicted).
987 .It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
988 Count retired fastpath double op instructions.
989 This event is supported in revision B and later CPUs.
990 This event may be further qualified using
994 separated set of the following keywords:
996 .Bl -tag -width indent -compact
998 Count instructions with the low op in position 0.
1000 Count instructions with the low op in position 1.
1002 Count instructions with the low op in position 2.
1005 The default is to count all types of instructions.
1006 .It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
1007 Count retired FPU instructions.
1008 This event is supported in revision B and later CPUs.
1009 This event may be further qualified using
1013 separated set of the following keywords:
1015 .Bl -tag -width indent -compact
1017 Count MMX and 3DNow!\& instructions.
1018 .It Li packed-sse-sse2
1019 Count packed SSE and SSE2 instructions.
1020 .It Li scalar-sse-sse2
1021 Count scalar SSE and SSE2 instructions
1023 Count x87 instructions.
1026 The default is to count all types of instructions.
1027 .It Li k8-fr-retired-near-returns
1028 Count retired near returns.
1029 .It Li k8-fr-retired-near-returns-mispredicted
1030 Count mispredicted near returns.
1031 .It Li k8-fr-retired-resyncs
1032 Count retired resyncs (non-control transfer branches).
1033 .It Li k8-fr-retired-taken-hardware-interrupts
1034 Count retired taken hardware interrupts.
1035 .It Li k8-fr-retired-taken-branches
1036 Count retired taken branches.
1037 .It Li k8-fr-retired-taken-branches-mispredicted
1038 Count retired taken branches that were mispredicted.
1039 .It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
1040 Count retired taken branches that were mispredicted only due to an
1042 .It Li k8-fr-retired-uops
1044 .It Li k8-fr-retired-x86-instructions
1045 Count retired x86 instructions including exceptions and interrupts.
1047 Count instruction cache fetches.
1048 .It Li k8-ic-instruction-fetch-stall
1049 Count cycles in stalls due to instruction fetch.
1050 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
1051 Count L1 ITLB misses that are L2 ITLB hits.
1052 .It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
1053 Count ITLB misses that miss in both L1 and L2 ITLBs.
1054 .It Li k8-ic-microarchitectural-resync-by-snoop
1055 Count microarchitectural resyncs caused by snoops.
1057 Count instruction cache misses.
1058 .It Li k8-ic-refill-from-l2
1059 Count instruction cache refills from L2 cache.
1060 .It Li k8-ic-refill-from-system
1061 Count instruction cache refills from system memory.
1062 .It Li k8-ic-return-stack-hits
1063 Count hits to the return stack.
1064 .It Li k8-ic-return-stack-overflow
1065 Count overflows of the return stack.
1066 .It Li k8-ls-buffer2-full
1067 Count load/store buffer2 full events.
1068 .It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
1069 Count locked operations.
1070 For revision C and later CPUs, the following qualifiers are supported:
1072 .Bl -tag -width indent -compact
1073 .It Li cycles-in-request
1074 Count the number of cycles in the lock request/grant stage.
1075 .It Li cycles-to-complete
1076 Count the number of cycles a lock takes to complete once it is
1077 non-speculative and is the older load/store operation.
1078 .It Li locked-instructions
1079 Count the number of lock instructions executed.
1082 The default is to count the number of lock instructions executed.
1083 .It Li k8-ls-microarchitectural-late-cancel
1084 Count microarchitectural late cancels of operations in the load/store
1086 .It Li k8-ls-microarchitectural-resync-by-self-modifying-code
1087 Count microarchitectural resyncs caused by self-modifying code.
1088 .It Li k8-ls-microarchitectural-resync-by-snoop
1089 Count microarchitectural resyncs caused by snoops.
1090 .It Li k8-ls-retired-cflush-instructions
1091 Count retired CFLUSH instructions.
1092 .It Li k8-ls-retired-cpuid-instructions
1093 Count retired CPUID instructions.
1094 .It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
1095 Count segment register loads.
1096 This event may be further qualified using
1100 separated set of the following keywords:
1101 .Bl -tag -width indent -compact
1103 Count CS register loads.
1105 Count DS register loads.
1107 Count ES register loads.
1109 Count FS register loads.
1111 Count GS register loads.
1113 .\" Count HS register loads.
1114 .\" XXX "HS" register?
1116 Count SS register loads.
1119 The default is to count all types of loads.
1120 .It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
1121 Count memory controller bypass counter saturation events.
1122 This event may be further qualified using
1126 separated set of the following keywords:
1128 .Bl -tag -width indent -compact
1129 .It Li dram-controller-interface-bypass
1130 Count DRAM controller interface bypass.
1131 .It Li dram-controller-queue-bypass
1132 Count DRAM controller queue bypass.
1133 .It Li memory-controller-hi-pri-bypass
1134 Count memory controller high priority bypasses.
1135 .It Li memory-controller-lo-pri-bypass
1136 Count memory controller low priority bypasses.
1139 .It Li k8-nb-memory-controller-dram-slots-missed
1140 Count memory controller DRAM command slots missed (in MemClks).
1141 .It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
1142 Count memory controller page access events.
1143 This event may be further qualified using
1147 separated set of the following keywords:
1149 .Bl -tag -width indent -compact
1150 .It Li page-conflict
1151 Count page conflicts.
1158 The default is to count all types of events.
1159 .It Li k8-nb-memory-controller-page-table-overflow
1160 Count memory control page table overflow events.
1161 .It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
1163 This event may be further qualified using
1167 separated set of the following keywords:
1169 .Bl -tag -width indent -compact
1171 Count all probe hits.
1172 .It Li probe-hit-dirty-no-memory-cancel
1173 Count probe hits without memory cancels.
1174 .It Li probe-hit-dirty-with-memory-cancel
1175 Count probe hits with memory cancels.
1179 .It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
1180 Count sized commands issued.
1181 This event may be further qualified using
1185 separated set of the following keywords:
1187 .Bl -tag -width indent -compact
1188 .It Li nonpostwrszbyte
1189 .It Li nonpostwrszdword
1191 .It Li postwrszdword
1197 The default is to count all types of commands.
1198 .It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
1199 Count memory control turnaround events.
1200 This event may be further qualified using
1204 separated set of the following keywords:
1206 .Bl -tag -width indent -compact
1207 .\" XXX doc is unclear whether these are cycle counts or event counts
1208 .It Li dimm-turnaround
1209 Count DIMM turnarounds.
1210 .It Li read-to-write-turnaround
1211 Count read to write turnarounds.
1212 .It Li write-to-read-turnaround
1213 Count write to read turnarounds.
1216 The default is to count all types of events.
1217 .It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
1218 .It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
1219 .It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
1220 Count events on the HyperTransport(tm) buses.
1221 These events may be further qualified using
1225 separated set of the following keywords:
1227 .Bl -tag -width indent -compact
1228 .It Li buffer-release
1229 Count buffer release messages sent.
1231 Count command messages sent.
1233 Count data messages sent.
1235 Count nop messages sent.
1238 The default is to count all types of messages.
1241 Intel P6 PMCs are present in Intel
1250 These CPUs have two counters.
1251 Some events may only be used on specific counters and some events are
1252 defined only on specific processor models.
1254 These PMCs are documented in
1256 .%B "IA-32 Intel(R) Architecture Software Developer's Manual"
1257 .%T "Volume 3: System Programming Guide"
1258 .%N "Order Number 245472-012"
1260 .%Q "Intel Corporation"
1263 Some of these events are affected by processor errata described in
1265 .%B "Intel(R) Pentium(R) III Processor Specification Update"
1266 .%N "Document Number: 244453-054"
1268 .%Q "Intel Corporation"
1271 Event specifiers for Intel P6 PMCs can have the following common
1273 .Bl -tag -width indent
1274 .It Li cmask= Ns Ar value
1275 Configure the PMC to increment only if the number of configured
1276 events measured in a cycle is greater than or equal to
1279 Configure the PMC to count the number of deasserted to asserted
1280 transitions of the conditions expressed by the other qualifiers.
1281 If specified, the counter will increment only once whenever a
1282 condition becomes true, irrespective of the number of clocks during
1283 which the condition remains true.
1285 Invert the sense of comparision when the
1287 qualifier is present, making the counter increment when the number of
1288 events per cycle is less than the value specified by the
1292 Configure the PMC to count events happening at processor privilege
1294 .It Li umask= Ns Ar value
1295 This qualifier is used to further qualify the event selected (see
1298 Configure the PMC to count events occurring at privilege levels 1, 2
1306 qualifiers are specified, the default is to enable both.
1308 The event specifiers supported by Intel P6 PMCs are:
1309 .Bl -tag -width indent
1311 Count the number of times a static branch prediction was made by the
1312 branch decoder because the BTB did not have a prediction.
1313 .It Li p6-br-bac-missp-exec
1315 Count the number of branch instructions executed that where
1316 mispredicted at the Front End (BAC).
1318 Count the number of bogus branches.
1319 .It Li p6-br-call-exec
1321 Count the number of call instructions executed.
1322 .It Li p6-br-call-missp-exec
1324 Count the number of call instructions executed that were mispredicted.
1325 .It Li p6-br-cnd-exec
1327 Count the number of conditional branch instructions executed.
1328 .It Li p6-br-cnd-missp-exec
1330 Count the number of conditional branch instructions executed that were
1332 .It Li p6-br-ind-call-exec
1334 Count the number of indirect call instructions executed.
1335 .It Li p6-br-ind-exec
1337 Count the number of indirect branch instructions executed.
1338 .It Li p6-br-ind-missp-exec
1340 Count the number of indirect branch instructions executed that were
1342 .It Li p6-br-inst-decoded
1343 Count the number of branch instructions decoded.
1344 .It Li p6-br-inst-exec
1346 Count the number of branch instructions executed but necessarily retired.
1347 .It Li p6-br-inst-retired
1348 Count the number of branch instructions retired.
1349 .It Li p6-br-miss-pred-retired
1350 Count the number of mispredicted branch instructions retired.
1351 .It Li p6-br-miss-pred-taken-ret
1352 Count the number of taken mispredicted branches retired.
1353 .It Li p6-br-missp-exec
1355 Count the number of branch instructions executed that were
1356 mispredicted at execution.
1357 .It Li p6-br-ret-bac-missp-exec
1359 Count the number of return instructions executed that were
1360 mispredicted at the Front End (BAC).
1361 .It Li p6-br-ret-exec
1363 Count the number of return instructions executed.
1364 .It Li p6-br-ret-missp-exec
1366 Count the number of return instructions executed that were
1367 mispredicted at execution.
1368 .It Li p6-br-taken-retired
1369 Count the number of taken branches retired.
1370 .It Li p6-btb-misses
1371 Count the number of branches for which the BTB did not produce a
1373 .It Li p6-bus-bnr-drv
1374 Count the number of bus clock cycles during which this processor is
1375 driving the BNR# pin.
1376 .It Li p6-bus-data-rcv
1377 Count the number of bus clock cycles during which this processor is
1379 .It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier
1380 Count the number of clocks during which DRDY# is asserted.
1381 An additional qualifier may be specified, and comprises one of the
1384 .Bl -tag -width indent -compact
1386 Count transactions generated by any agent on the bus.
1388 Count transactions generated by this processor.
1391 The default is to count operations generated by this processor.
1392 .It Li p6-bus-hit-drv
1393 Count the number of bus clock cycles during which this processor is
1394 driving the HIT# pin.
1395 .It Li p6-bus-hitm-drv
1396 Count the number of bus clock cycles during which this processor is
1397 driving the HITM# pin.
1398 .It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier
1399 Count the number of clocks during with LOCK# is asserted on the
1400 external system bus.
1401 An additional qualifier may be specified and comprises one of the following
1404 .Bl -tag -width indent -compact
1406 Count transactions generated by any agent on the bus.
1408 Count transactions generated by this processor.
1411 The default is to count operations generated by this processor.
1412 .It Li p6-bus-req-outstanding
1413 Count the number of bus requests outstanding in any given cycle.
1414 .It Li p6-bus-snoop-stall
1415 Count the number of clock cycles during which the bus is snoop stalled.
1416 .It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier
1417 Count the number of completed bus transactions of any kind.
1418 An additional qualifier may be specified and comprises one of the following
1421 .Bl -tag -width indent -compact
1423 Count transactions generated by any agent on the bus.
1425 Count transactions generated by this processor.
1428 The default is to count operations generated by this processor.
1429 .It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier
1430 Count the number of burst read transactions.
1431 An additional qualifier may be specified and comprises one of the following
1434 .Bl -tag -width indent -compact
1436 Count transactions generated by any agent on the bus.
1438 Count transactions generated by this processor.
1441 The default is to count operations generated by this processor.
1442 .It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier
1443 Count the number of completed burst transactions.
1444 An additional qualifier may be specified and comprises one of the following
1447 .Bl -tag -width indent -compact
1449 Count transactions generated by any agent on the bus.
1451 Count transactions generated by this processor.
1454 The default is to count operations generated by this processor.
1455 .It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier
1456 Count the number of completed deferred transactions.
1457 An additional qualifier may be specified and comprises one of the following
1460 .Bl -tag -width indent -compact
1462 Count transactions generated by any agent on the bus.
1464 Count transactions generated by this processor.
1467 The default is to count operations generated by this processor.
1468 .It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier
1469 Count the number of completed instruction fetch transactions.
1470 An additional qualifier may be specified and comprises one of the following
1473 .Bl -tag -width indent -compact
1475 Count transactions generated by any agent on the bus.
1477 Count transactions generated by this processor.
1480 The default is to count operations generated by this processor.
1481 .It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier
1482 Count the number of completed invalidate transactions.
1483 An additional qualifier may be specified and comprises one of the following
1486 .Bl -tag -width indent -compact
1488 Count transactions generated by any agent on the bus.
1490 Count transactions generated by this processor.
1493 The default is to count operations generated by this processor.
1494 .It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier
1495 Count the number of completed memory transactions.
1496 An additional qualifier may be specified and comprises one of the following
1499 .Bl -tag -width indent -compact
1501 Count transactions generated by any agent on the bus.
1503 Count transactions generated by this processor.
1506 The default is to count operations generated by this processor.
1507 .It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier
1508 Count the number of completed partial write transactions.
1509 An additional qualifier may be specified and comprises one of the following
1512 .Bl -tag -width indent -compact
1514 Count transactions generated by any agent on the bus.
1516 Count transactions generated by this processor.
1519 The default is to count operations generated by this processor.
1520 .It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier
1521 Count the number of completed read-for-ownership transactions.
1522 An additional qualifier may be specified and comprises one of the following
1525 .Bl -tag -width indent -compact
1527 Count transactions generated by any agent on the bus.
1529 Count transactions generated by this processor.
1532 The default is to count operations generated by this processor.
1533 .It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier
1534 Count the number of completed I/O transactions.
1535 An additional qualifier may be specified and comprises one of the following
1538 .Bl -tag -width indent -compact
1540 Count transactions generated by any agent on the bus.
1542 Count transactions generated by this processor.
1545 The default is to count operations generated by this processor.
1546 .It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier
1547 Count the number of completed partial transactions.
1548 An additional qualifier may be specified and comprises one of the following
1551 .Bl -tag -width indent -compact
1553 Count transactions generated by any agent on the bus.
1555 Count transactions generated by this processor.
1558 The default is to count operations generated by this processor.
1559 .It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier
1560 Count the number of completed write-back transactions.
1561 An additional qualifier may be specified and comprises one of the following
1564 .Bl -tag -width indent -compact
1566 Count transactions generated by any agent on the bus.
1568 Count transactions generated by this processor.
1571 The default is to count operations generated by this processor.
1572 .It Li p6-cpu-clk-unhalted
1573 Count the number of cycles during with the processor was not halted.
1576 Count the number of cycles during with the processor was not halted
1577 and not in a thermal trip.
1578 .It Li p6-cycles-div-busy
1579 Count the number of cycles during which the divider is busy and cannot
1581 This event is only allocated on counter 0.
1582 .It Li p6-cycles-in-pending-and-masked
1583 Count the number of processor cycles for which interrupts were
1584 disabled and interrupts were pending.
1585 .It Li p6-cycles-int-masked
1586 Count the number of processor cycles for which interrupts were
1588 .It Li p6-data-mem-refs
1589 Count all loads and all stores using any memory type, including
1591 Each part of a split store is counted separately.
1592 .It Li p6-dcu-lines-in
1593 Count the total lines allocated in the data cache unit.
1594 .It Li p6-dcu-m-lines-in
1595 Count the number of M state lines allocated in the data cache unit.
1596 .It Li p6-dcu-m-lines-out
1597 Count the number of M state lines evicted from the data cache unit.
1598 .It Li p6-dcu-miss-outstanding
1599 Count the weighted number of cycles while a data cache unit miss is
1600 outstanding, incremented by the number of outstanding cache misses at
1603 Count the number of integer and floating-point divides including
1604 speculative divides.
1605 This event is only allocated on counter 1.
1606 .It Li p6-emon-esp-uops
1608 Count the total number of micro-ops.
1609 .It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier
1612 .Tn "Enhanced Intel SpeedStep"
1614 An additional qualifier may be specified, and can be one of the
1617 .Bl -tag -width indent -compact
1619 Count all transitions.
1621 Count only frequency transitions.
1624 The default is to count all transitions.
1625 .It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier
1627 Count the number of retired fused micro-ops.
1628 An additional qualifier may be specified, and may be one of the
1631 .Bl -tag -width indent -compact
1633 Count all fused micro-ops.
1635 Count only load and op micro-ops.
1637 Count only STD/STA micro-ops.
1640 The default is to count all fused micro-ops.
1641 .It Li p6-emon-kni-comp-inst-ret
1642 .Pq Tn "Pentium III"
1643 Count the number of SSE computational instructions retired.
1644 An additional qualifier may be specified, and comprises one of the
1647 .Bl -tag -width indent -compact
1648 .It Li packed-and-scalar
1649 Count packed and scalar operations.
1651 Count scalar operations only.
1654 The default is to count packed and scalar operations.
1655 .It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier
1656 .Pq Tn "Pentium III"
1657 Count the number of SSE instructions retired.
1658 An additional qualifier may be specified, and comprises one of the
1661 .Bl -tag -width indent -compact
1662 .It Li packed-and-scalar
1663 Count packed and scalar operations.
1665 Count scalar operations only.
1668 The default is to count packed and scalar operations.
1669 .It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier
1670 .Pq Tn "Pentium III"
1671 Count the number of SSE prefetch or weakly ordered instructions
1672 dispatched (including speculative prefetches).
1673 An additional qualifier may be specified, and comprises one of the
1676 .Bl -tag -width indent -compact
1678 Count non-temporal prefetches.
1680 Count prefetches to L1.
1682 Count prefetches to L2.
1684 Count weakly ordered stores.
1687 The default is to count non-temporal prefetches.
1688 .It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier
1689 .Pq Tn "Pentium III"
1690 Count the number of prefetch or weakly ordered instructions that miss
1692 An additional qualifier may be specified, and comprises one of the
1695 .Bl -tag -width indent -compact
1697 Count non-temporal prefetches.
1699 Count prefetches to L1.
1701 Count prefetches to L2.
1703 Count weakly ordered stores.
1706 The default is to count non-temporal prefetches.
1707 .It Li p6-emon-pref-rqsts-dn
1709 Count the number of downward prefetches issued.
1710 .It Li p6-emon-pref-rqsts-up
1712 Count the number of upward prefetches issued.
1713 .It Li p6-emon-simd-instr-retired
1715 Count the number of retired
1718 .It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier
1720 Count the number of computational SSE instructions retired.
1721 An additional qualifier may be specified and can be one of the
1724 .Bl -tag -width indent -compact
1725 .It Li sse-packed-single
1726 Count SSE packed-single instructions.
1727 .It Li sse-scalar-single
1728 Count SSE scalar-single instructions.
1729 .It Li sse2-packed-double
1730 Count SSE2 packed-double instructions.
1731 .It Li sse2-scalar-double
1732 Count SSE2 scalar-double instructions.
1735 The default is to count SSE packed-single instructions.
1736 .It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer
1739 Count the number of SSE instructions retired.
1740 An additional qualifier can be specified, and can be one of the
1743 .Bl -tag -width indent -compact
1744 .It Li sse-packed-single
1745 Count SSE packed-single instructions.
1746 .It Li sse-packed-single-scalar-single
1747 Count SSE packed-single and scalar-single instructions.
1748 .It Li sse2-packed-double
1749 Count SSE2 packed-double instructions.
1750 .It Li sse2-scalar-double
1751 Count SSE2 scalar-double instructions.
1754 The default is to count SSE packed-single instructions.
1755 .It Li p6-emon-synch-uops
1757 Count the number of sync micro-ops.
1758 .It Li p6-emon-thermal-trip
1760 Count the duration or occurrences of thermal trips.
1763 qualifier to count occurrences of thermal trips.
1764 .It Li p6-emon-unfusion
1766 Count the number of unfusion events in the reorder buffer.
1768 Count the number of computational floating point operations retired.
1769 This event is only allocated on counter 0.
1771 Count the number of floating point exceptions handled by microcode.
1772 This event is only allocated on counter 1.
1773 .It Li p6-fp-comps-ops-exe
1774 Count the number of computation floating point operations executed.
1775 This event is only allocated on counter 0.
1776 .It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier
1777 .Pq Tn "Pentium II" , Tn "Pentium III"
1778 Count the number of transitions between MMX and floating-point
1780 An additional qualifier may be specified, and comprises one of the
1783 .Bl -tag -width indent -compact
1785 Count transitions from MMX instructions to floating-point instructions.
1787 Count transitions from floating-point instructions to MMX instructions.
1790 The default is to count MMX to floating-point transitions.
1792 Count the number of hardware interrupts received.
1794 Count the number of instruction fetches, both cacheable and non-cacheable.
1795 .It Li p6-ifu-fetch-miss
1796 Count the number of instruction fetch misses (i.e., those that produce
1798 .It Li p6-ifu-mem-stall
1799 Count the number of cycles instruction fetch is stalled for any reason.
1801 Count the number of cycles the instruction length decoder is stalled.
1802 .It Li p6-inst-decoded
1803 Count the number of instructions decoded.
1804 .It Li p6-inst-retired
1805 Count the number of instructions retired.
1807 Count the number of instruction TLB misses.
1809 Count the number of L2 address strobes.
1810 .It Li p6-l2-dbus-busy
1811 Count the number of cycles during which the L2 cache data bus was busy.
1812 .It Li p6-l2-dbus-busy-rd
1813 Count the number of cycles during which the L2 cache data bus was busy
1814 transferring read data from L2 to the processor.
1815 .It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier
1816 Count the number of L2 instruction fetches.
1817 An additional qualifier may be specified and comprises a list of the following
1818 keywords separated by
1822 .Bl -tag -width indent -compact
1824 Count operations affecting E (exclusive) state lines.
1826 Count operations affecting I (invalid) state lines.
1828 Count operations affecting M (modified) state lines.
1830 Count operations affecting S (shared) state lines.
1833 The default is to count operations affecting all (MESI) state lines.
1834 .It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier
1835 Count the number of L2 data loads.
1836 An additional qualifier may be specified and comprises a list of the following
1837 keywords separated by
1841 .Bl -tag -width indent -compact
1844 Count both hardware-prefetched lines and non-hardware-prefetched lines.
1846 Count operations affecting E (exclusive) state lines.
1849 Count hardware-prefetched lines only.
1851 Count operations affecting I (invalid) state lines.
1853 Count operations affecting M (modified) state lines.
1856 Exclude hardware-prefetched lines.
1858 Count operations affecting S (shared) state lines.
1861 The default on processors other than
1863 processors is to count operations affecting all (MESI) state lines.
1866 processors is to count both hardware-prefetched and
1867 non-hardware-prefetch operations on all (MESI) state lines.
1869 This event is affected by processor errata E53.
1870 .It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier
1871 Count the number of L2 lines allocated.
1872 An additional qualifier may be specified and comprises a list of the following
1873 keywords separated by
1877 .Bl -tag -width indent -compact
1880 Count both hardware-prefetched lines and non-hardware-prefetched lines.
1882 Count operations affecting E (exclusive) state lines.
1885 Count hardware-prefetched lines only.
1887 Count operations affecting I (invalid) state lines.
1889 Count operations affecting M (modified) state lines.
1892 Exclude hardware-prefetched lines.
1894 Count operations affecting S (shared) state lines.
1897 The default on processors other than
1899 processors is to count operations affecting all (MESI) state lines.
1902 processors is to count both hardware-prefetched and
1903 non-hardware-prefetch operations on all (MESI) state lines.
1905 This event is affected by processor errata E45.
1906 .It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier
1907 Count the number of L2 lines evicted.
1908 An additional qualifier may be specified and comprises a list of the following
1909 keywords separated by
1913 .Bl -tag -width indent -compact
1916 Count both hardware-prefetched lines and non-hardware-prefetched lines.
1918 Count operations affecting E (exclusive) state lines.
1921 Count hardware-prefetched lines only.
1923 Count operations affecting I (invalid) state lines.
1925 Count operations affecting M (modified) state lines.
1927 .Pq Tn "Pentium M" only
1928 Exclude hardware-prefetched lines.
1930 Count operations affecting S (shared) state lines.
1933 The default on processors other than
1935 processors is to count operations affecting all (MESI) state lines.
1938 processors is to count both hardware-prefetched and
1939 non-hardware-prefetch operations on all (MESI) state lines.
1941 This event is affected by processor errata E45.
1942 .It Li p6-l2-m-lines-inm
1943 Count the number of modified lines allocated in L2 cache.
1944 .It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier
1945 Count the number of L2 M-state lines evicted.
1948 On these processors an additional qualifier may be specified and
1949 comprises a list of the following keywords separated by
1953 .Bl -tag -width indent -compact
1955 Count both hardware-prefetched lines and non-hardware-prefetched lines.
1957 Count hardware-prefetched lines only.
1959 Exclude hardware-prefetched lines.
1962 The default is to count both hardware-prefetched and
1963 non-hardware-prefetch operations.
1965 This event is affected by processor errata E53.
1966 .It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier
1967 Count the total number of L2 requests.
1968 An additional qualifier may be specified and comprises a list of the following
1969 keywords separated by
1973 .Bl -tag -width indent -compact
1975 Count operations affecting E (exclusive) state lines.
1977 Count operations affecting I (invalid) state lines.
1979 Count operations affecting M (modified) state lines.
1981 Count operations affecting S (shared) state lines.
1984 The default is to count operations affecting all (MESI) state lines.
1986 Count the number of L2 data stores.
1987 An additional qualifier may be specified and comprises a list of the following
1988 keywords separated by
1992 .Bl -tag -width indent -compact
1994 Count operations affecting E (exclusive) state lines.
1996 Count operations affecting I (invalid) state lines.
1998 Count operations affecting M (modified) state lines.
2000 Count operations affecting S (shared) state lines.
2003 The default is to count operations affecting all (MESI) state lines.
2005 Count the number of load operations delayed due to store buffer blocks.
2006 .It Li p6-misalign-mem-ref
2007 Count the number of misaligned data memory references (crossing a 64
2009 .It Li p6-mmx-assist
2010 .Pq Tn "Pentium II" , Tn "Pentium III"
2011 Count the number of MMX assists executed.
2012 .It Li p6-mmx-instr-exec
2013 .Pq Tn Celeron , Tn "Pentium II"
2014 Count the number of MMX instructions executed, except MOVQ and MOVD
2015 stores from register to memory.
2016 .It Li p6-mmx-instr-ret
2018 Count the number of MMX instructions retired.
2019 .It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier
2020 .Pq Tn "Pentium II" , Tn "Pentium III"
2021 Count the number of MMX instructions executed.
2022 An additional qualifier may be specified and comprises a list of
2023 the following keywords separated by
2027 .Bl -tag -width indent -compact
2029 Count MMX pack operation instructions.
2030 .It Li packed-arithmetic
2031 Count MMX packed arithmetic instructions.
2032 .It Li packed-logical
2033 Count MMX packed logical instructions.
2034 .It Li packed-multiply
2035 Count MMX packed multiply instructions.
2037 Count MMX packed shift instructions.
2039 Count MMX unpack operation instructions.
2042 The default is to count all operations.
2043 .It Li p6-mmx-sat-instr-exec
2044 .Pq Tn "Pentium II" , Tn "Pentium III"
2045 Count the number of MMX saturating instructions executed.
2046 .It Li p6-mmx-uops-exec
2047 .Pq Tn "Pentium II" , Tn "Pentium III"
2048 Count the number of MMX micro-ops executed.
2050 Count the number of integer and floating-point multiplies, including
2051 speculative multiplies.
2052 This event is only allocated on counter 1.
2053 .It Li p6-partial-rat-stalls
2054 Count the number of cycles or events for partial stalls.
2055 .It Li p6-resource-stalls
2056 Count the number of cycles there was a resource related stall of any kind.
2057 .It Li p6-ret-seg-renames
2058 .Pq Tn "Pentium II" , Tn "Pentium III"
2059 Count the number of segment register rename events retired.
2061 Count the number of cycles the store buffer is draining.
2062 .It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier
2063 .Pq Tn "Pentium II" , Tn "Pentium III"
2064 Count the number of segment register renames.
2065 An additional qualifier may be specified, and comprises a list of the
2066 following keywords separated by
2070 .Bl -tag -width indent -compact
2072 Count renames for segment register DS.
2074 Count renames for segment register ES.
2076 Count renames for segment register FS.
2078 Count renames for segment register GS.
2081 The default is to count operations affecting all segment registers.
2082 .It Li p6-seg-rename-stalls
2083 .Pq Tn "Pentium II" , Tn "Pentium III"
2084 Count the number of segment register renaming stalls.
2085 An additional qualifier may be specified, and comprises a list of the
2086 following keywords separated by
2090 .Bl -tag -width indent -compact
2092 Count stalls for segment register DS.
2094 Count stalls for segment register ES.
2096 Count stalls for segment register FS.
2098 Count stalls for segment register GS.
2101 The default is to count operations affecting all the segment registers.
2102 .It Li p6-segment-reg-loads
2103 Count the number of segment register loads.
2104 .It Li p6-uops-retired
2105 Count the number of micro-ops retired.
2108 Intel P4 PMCs are present in Intel
2113 These PMCs are documented in
2115 .%B "IA-32 Intel(R) Architecture Software Developer's Manual"
2116 .%T "Volume 3: System Programming Guide"
2117 .%N "Order Number 245472-012"
2119 .%Q "Intel Corporation"
2121 Further information about using these PMCs may be found in
2123 .%B "IA-32 Intel(R) Architecture Optimization Guide"
2125 .%N "Order Number 248966-009"
2126 .%Q "Intel Corporation"
2128 Some of these events are affected by processor errata described in
2130 .%B "Intel(R) Pentium(R) 4 Processor Specification Update"
2131 .%N "Document Number: 249199-059"
2133 .%Q "Intel Corporation"
2136 Event specifiers for Intel P4 PMCs can have the following common
2138 .Bl -tag -width indent
2139 .It Li active= Ns Ar choice
2140 (On P4 HTT CPUs) Filter event counting based on which logical
2141 processors are active.
2142 The allowed values of
2146 .Bl -tag -width indent -compact
2148 Count when either logical processor is active.
2150 Count when both logical processors are active.
2152 Count only when neither logical processor is active.
2154 Count only when one logical processor is active.
2160 Configure the PMC to cascade onto its partner.
2162 .Sx "Cascading P4 PMCs"
2163 below for more information.
2165 Configure the counter to count false to true transitions of the threshold
2167 This qualifier only takes effect if a threshold qualifier has also been
2170 Configure the counter to increment only when the event count seen is
2171 less than the threshold qualifier value specified.
2172 .It Li mask= Ns Ar qualifier
2173 Many event specifiers for Intel P4 PMCs need to be additionally
2174 qualified using a mask qualifier.
2175 The allowed syntax for these qualifiers is event specific and is
2176 described along with the events.
2178 Configure the PMC to count when the CPL of the processor is 0.
2180 Select precise event based sampling.
2181 Precise sampling is supported by the hardware for a limited set of
2183 .It Li tag= Ns Ar value
2184 Configure the PMC to tag the internal uop selected by the other
2185 fields in this event specifier with value
2187 This feature is used when cascading PMCs.
2188 .It Li threshold= Ns Ar value
2189 Configure the PMC to increment only when the event counts seen are
2190 greater than the specified threshold value
2193 Configure the PMC to count when the CPL of the processor is 1, 2 or 3.
2200 qualifiers are specified, the default is to enable both.
2202 On Intel Pentium 4 processors with HTT, events are
2203 divided into two classes:
2205 .Bl -tag -width indent -compact
2207 are those where hardware can differentiate between events
2208 generated on one logical processor from those generated on the
2211 are those where hardware cannot differentiate between events
2212 generated by multiple logical processors in a package.
2215 Only TS events are allowed for use with process-mode PMCs on
2218 The event specifiers supported by Intel P4 PMCs are:
2220 .Bl -tag -width indent
2221 .It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags
2223 Count integer SIMD SSE2 instructions that operate on 128 bit SIMD
2227 can take the following value (which is also the default):
2229 .Bl -tag -width indent -compact
2231 Count all uops operating on 128 bit SIMD integer operands in memory or
2235 If an instruction contains more than one 128 bit MMX uop, then each
2236 uop will be counted.
2237 .It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags
2239 Count MMX instructions that operate on 64 bit SIMD operands.
2242 can take the following value (which is also the default):
2244 .Bl -tag -width indent -compact
2246 Count all uops operating on 64 bit SIMD integer operands in memory or
2250 If an instruction contains more than one 64 bit MMX uop, then each
2251 uop will be counted.
2252 .It Li p4-b2b-cycles
2254 Count back-to-back bys cycles.
2255 Further documentation for this event is unavailable.
2258 Count bus-not-ready conditions.
2259 Further documentation for this event is unavailable.
2260 .It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier
2262 Count instruction fetch requests qualified by additional
2265 At this point only one flag is supported:
2267 .Bl -tag -width indent -compact
2269 Count trace cache lookup misses.
2272 The default qualifier is also
2273 .Dq Li mask=tcmiss .
2274 .It Li p4-branch-retired Op Li ,mask= Ns Ar flags
2276 Counts retired branches.
2279 is a list of the following
2283 .Bl -tag -width indent -compact
2285 Count branches not-taken and predicted.
2287 Count branches not-taken and mis-predicted.
2289 Count branches taken and predicted.
2291 Count branches taken and mis-predicted.
2294 The default qualifier counts all four kinds of branches.
2295 .It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier
2297 Count the number of entries (clipped at 15) currently active in the
2303 separated set of the following flags:
2305 .Bl -tag -width indent -compact
2306 .It Li req-type0 , Li req-type1
2307 Forms a 2-bit number used to select the request type encoding:
2309 .Bl -tag -width indent -compact
2311 reads excluding read invalidate
2315 writes other than writebacks
2322 is the MSB for this two bit number.
2323 .It Li req-len0 , Li req-len1
2324 Forms a two-bit number that specifies the request length encoding:
2326 .Bl -tag -width indent -compact
2337 is the MSB for this two bit number.
2339 Count requests that are input or output requests.
2340 .It Li req-lock-type
2341 Count requests that lock the bus.
2342 .It Li req-lock-cache
2343 Count requests that lock the cache.
2344 .It Li req-split-type
2345 Count requests that is a bus 8-byte chunk that is split across an
2348 Count requests that are demand (not prefetches) if set.
2349 Count requests that are prefetches if not set.
2351 Count requests that are ordered.
2352 .It Li mem-type0 , Li mem-type1 , Li mem-type2
2353 Forms a 3-bit number that specifies a memory type encoding:
2355 .Bl -tag -width indent -compact
2370 is the MSB of this 3-bit number.
2373 The default qualifier has all the above bits set.
2375 Edge triggering using the
2377 qualifier should not be used with this event when counting cycles.
2378 .It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier
2380 Count allocations in the bus sequence unit according to the flags
2385 separated set of the following flags:
2387 .Bl -tag -width indent -compact
2388 .It Li req-type0 , Li req-type1
2389 Forms a 2-bit number used to select the request type encoding:
2391 .Bl -tag -width indent -compact
2393 reads excluding read invalidate
2397 writes other than writebacks
2404 is the MSB for this two bit number.
2405 .It Li req-len0 , Li req-len1
2406 Forms a two-bit number that specifies the request length encoding:
2408 .Bl -tag -width indent -compact
2419 is the MSB for this two bit number.
2421 Count requests that are input or output requests.
2422 .It Li req-lock-type
2423 Count requests that lock the bus.
2424 .It Li req-lock-cache
2425 Count requests that lock the cache.
2426 .It Li req-split-type
2427 Count requests that is a bus 8-byte chunk that is split across an
2430 Count requests that are demand (not prefetches) if set.
2431 Count requests that are prefetches if not set.
2433 Count requests that are ordered.
2434 .It Li mem-type0 , Li mem-type1 , Li mem-type2
2435 Forms a 3-bit number that specifies a memory type encoding:
2437 .Bl -tag -width indent -compact
2452 is the MSB of this 3-bit number.
2455 The default qualifier has all the above bits set.
2457 This event is usually used along with the
2459 qualifier to avoid multiple counting.
2460 .It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier
2462 Count cache references as seen by the bus unit (2nd or 3rd level
2468 separated list of the following keywords:
2470 .Bl -tag -width indent -compact
2472 Count 2nd level cache hits in the shared state.
2474 Count 2nd level cache hits in the exclusive state.
2476 Count 2nd level cache hits in the modified state.
2478 Count 3rd level cache hits in the shared state.
2480 Count 3rd level cache hits in the exclusive state.
2482 Count 3rd level cache hits in the modified state.
2484 Count 2nd level cache misses.
2486 Count 3rd level cache misses.
2488 Count write-back lookups from the data access cache that miss the 2nd
2492 The default is to count all the above events.
2493 .It Li p4-execution-event Op Li ,mask= Ns Ar flags
2495 Count the retirement of tagged uops selected through the execution
2499 can contain the following strings separated by
2503 .Bl -tag -width indent -compact
2504 .It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3
2505 The marked uops are not bogus.
2506 .It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3
2507 The marked uops are bogus.
2510 This event requires additional (upstream) events to be allocated to
2511 perform the desired uop tagging.
2512 The default is to set all the above flags.
2513 This event can be used for precise event based sampling.
2514 .It Li p4-front-end-event Op Li ,mask= Ns Ar flags
2516 Count the retirement of tagged uops selected through the front-end
2520 can contain the following strings separated by
2524 .Bl -tag -width indent -compact
2526 The marked uops are not bogus.
2528 The marked uops are bogus.
2531 This event requires additional (upstream) events to be allocated to
2532 perform the desired uop tagging.
2533 The default is to select both kinds of events.
2534 This event can be used for precise event based sampling.
2535 .It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags
2537 Count each DBSY or DRDY event selected by qualifier
2543 separated set of the following flags:
2545 .Bl -tag -width indent -compact
2547 Count when this processor is driving data onto the bus.
2549 Count when this processor is reading data from the bus.
2551 Count when data is on the bus but not being sampled by this processor.
2553 Count when this processor reserves the bus for use in the next cycle
2554 in order to drive data.
2556 Count when some agent reserves the bus for use in the next bus cycle
2557 to drive data that this processor will sample.
2559 Count when some agent reserves the bus for use in the next bus cycle
2560 to drive data that this processor will not sample.
2567 are mutually exclusive.
2572 are mutually exclusive.
2573 The default value for
2576 .Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own .
2577 .It Li p4-global-power-events Op Li ,mask= Ns Ar flags
2579 Count cycles during which the processor is not stopped.
2582 can take the following value (which is also the default):
2584 .Bl -tag -width indent -compact
2586 Count cycles when the processor is active.
2589 .It Li p4-instr-retired Op Li ,mask= Ns Ar flags
2591 Count instructions retired during a clock cycle.
2594 comprises of the following strings separated by
2598 .Bl -tag -width indent -compact
2600 Count non-bogus instructions that are not tagged.
2602 Count non-bogus instructions that are tagged.
2604 Count bogus instructions that are not tagged.
2606 Count bogus instructions that are tagged.
2609 The default qualifier counts all the above kinds of instructions.
2610 .It Li p4-ioq-active-entries Xo
2611 .Op Li ,mask= Ns Ar qualifier
2612 .Op Li ,busreqtype= Ns Ar req-type
2615 Count the number of entries (clipped at 15) in the IOQ that are
2617 The event masks are specified by qualifier
2626 separated set of the following flags:
2628 .Bl -tag -width indent -compact
2632 Count write entries.
2634 Count entries accessing uncacheable memory.
2636 Count entries accessing write-combining memory.
2638 Count entries accessing write-through memory.
2640 Count entries accessing write-protected memory
2642 Count entries accessing write-back memory.
2644 Count store requests driven by the processor (i.e., not by other
2645 processors or by DMA).
2647 Count store requests driven by other processors or by DMA.
2649 Include hardware and software prefetch requests in the count.
2652 The default value for
2654 is to enable all the above flags.
2658 qualifier is a 5-bit number can be additionally used to select a
2659 specific bus request type.
2664 qualifier should not be used when counting cycles with this event.
2665 The exact behaviour of this event depends on the processor revision.
2666 .It Li p4-ioq-allocation Xo
2667 .Op Li ,mask= Ns Ar qualifier
2668 .Op Li ,busreqtype= Ns Ar req-type
2671 Count various types of transactions on the bus matching the flags set
2681 separated set of the following flags:
2683 .Bl -tag -width indent -compact
2687 Count write entries.
2689 Count entries accessing uncacheable memory.
2691 Count entries accessing write-combining memory.
2693 Count entries accessing write-through memory.
2695 Count entries accessing write-protected memory
2697 Count entries accessing write-back memory.
2699 Count store requests driven by the processor (i.e., not by other
2700 processors or by DMA).
2702 Count store requests driven by other processors or by DMA.
2704 Include hardware and software prefetch requests in the count.
2707 The default value for
2709 is to enable all the above flags.
2713 qualifier is a 5-bit number can be additionally used to select a
2714 specific bus request type.
2719 qualifier is normally used with this event to prevent multiple
2721 The exact behaviour of this event depends on the processor revision.
2722 .It Li p4-itlb-reference Op mask= Ns Ar qualifier
2724 Count translations using the intruction translation look-aside
2728 argument is a list of the following strings separated by
2732 .Bl -tag -width indent -compact
2738 Count uncacheable ITLB hits.
2743 is specified the default is to count all the three kinds of ITLB
2745 .It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier
2747 Count replayed events at the load port.
2750 can take on one value:
2752 .Bl -tag -width indent -compact
2757 The default value for
2761 .It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags
2763 Count mispredicted IA-32 branch instructions.
2766 can take the following value (which is also the default):
2768 .Bl -tag -width indent -compact
2770 Count non-bogus retired branch instructions.
2772 .It Li p4-machine-clear Op Li ,mask= Ns Ar flags
2774 Count the number of pipeline clears seen by the processor.
2777 is a list of the following strings separated by
2781 .Bl -tag -width indent -compact
2783 Count for a portion of the many cycles when the machine is being
2784 cleared for any reason.
2786 Count machine clears due to memory ordering issues.
2788 Count machine clears due to self-modifying code.
2793 to get a count of occurrences of machine clears.
2794 The default qualifier is
2796 .It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list
2798 Count the cancelling of various kinds of requests in the data cache
2799 address control unit of the CPU.
2802 is a list of the following strings separated by
2806 .Bl -tag -width indent -compact
2808 Requests cancelled because no store request buffer was available.
2810 Requests that conflict due to 64K aliasing.
2815 is not specified, then the default is to count both kinds of events.
2816 .It Li p4-memory-complete Op Li ,mask= Ns Ar event-list
2818 Count the completion of load split, store split, uncacheable split and
2819 uncacheable load operations selected by qualifier
2825 separated list of the following flags:
2827 .Bl -tag -width indent -compact
2829 Count load splits completed, excluding loads from uncacheable or
2830 write-combining areas.
2832 Count any split stores completed.
2835 The default is to count both kinds of operations.
2836 .It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier
2838 Count load replays triggered by the memory order buffer.
2843 separated list of the following flags:
2845 .Bl -tag -width indent -compact
2847 Count replays because of unknown store addresses.
2849 Count replays because of unknown store data.
2851 Count replays because of partially overlapped data accesses between
2852 load and store operations.
2854 Count replays because of mismatches in the lower 4 bits of load and
2858 The default qualifier is
2859 .Ar no-sta+no-std+partial-data+unalgn-addr .
2860 .It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags
2862 Count packed double-precision uops.
2865 can take the following value (which is also the default):
2867 .Bl -tag -width indent -compact
2869 Count all uops operating on packed double-precision operands.
2871 .It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags
2873 Count packed single-precision uops.
2876 can take the following value (which is also the default):
2878 .Bl -tag -width indent -compact
2880 Count all uops operating on packed single-precision operands.
2882 .It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier
2884 Count page walks performed by the page miss handler.
2889 separated list of the following keywords:
2891 .Bl -tag -width indent -compact
2893 Count page walks for data TLB misses.
2895 Count page walks for instruction TLB misses.
2898 The default value for
2901 .Dq Li dtmiss+itmiss .
2902 .It Li p4-replay-event Op Li ,mask= Ns Ar flags
2904 Count the retirement of tagged uops selected through the replay
2910 separated set of the following strings:
2912 .Bl -tag -width indent -compact
2914 The marked uops are not bogus.
2916 The marked uops are bogus.
2919 This event requires additional (upstream) events to be allocated to
2920 perform the desired uop tagging.
2921 The default qualifier counts both kinds of uops.
2922 This event can be used for precise event based sampling.
2923 .It Li p4-resource-stall Op Li ,mask= Ns Ar flags
2925 Count the occurrence or latency of stalls in the allocator.
2928 can take the following value (which is also the default):
2930 .Bl -tag -width indent -compact
2932 A stall due to the lack of store buffers.
2936 Count different types of responses.
2937 Further documentation on this event is not available.
2938 .It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags
2940 Count branches retired.
2945 separated list of strings:
2947 .Bl -tag -width indent -compact
2949 Count conditional jumps.
2951 Count direct and indirect call branches.
2953 Count return branches.
2955 Count returns, indirect calls or indirect jumps.
2958 The default qualifier counts all the above branch types.
2959 .It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags
2961 Count mispredicted branches retired.
2966 separated list of strings:
2968 .Bl -tag -width indent -compact
2970 Count conditional jumps.
2972 Count indirect call branches.
2974 Count return branches.
2976 Count returns, indirect calls or indirect jumps.
2979 The default qualifier counts all the above branch types.
2980 .It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags
2982 Count the number of scalar double-precision uops.
2985 can take the following value (which is also the default):
2987 .Bl -tag -width indent -compact
2989 Count the number of scalar double-precision uops.
2991 .It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags
2993 Count the number of scalar single-precision uops.
2996 can take the following value (which is also the default):
2998 .Bl -tag -width indent -compact
3000 Count all uops operating on scalar single-precision operands.
3004 Count snoop traffic.
3005 Further documentation on this event is not available.
3006 .It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags
3008 Count the number of times an assist is required to handle problems
3009 with the operands for SSE and SSE2 operations.
3012 can take the following value (which is also the default):
3014 .Bl -tag -width indent -compact
3016 Count assists for all SSE and SSE2 uops.
3018 .It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier
3020 Count events replayed at the store port.
3023 can take on one value:
3025 .Bl -tag -width indent -compact
3030 The default value for
3034 .It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier
3036 Count the duration in cycles of operating modes of the trace cache and
3038 The desired operating mode is selected by
3040 which is a list of the following strings separated by
3044 .Bl -tag -width indent -compact
3046 Both logical processors are in deliver mode.
3048 Logical processor 0 is in deliver mode while logical processor 1 is in
3051 Logical processor 0 is in deliver mode while logical processor 1 is
3052 halted, or in machine clear, or transitioning to a long microcode
3055 Logical processor 0 is in build mode while logical processor 1 is in
3058 Both logical processors are in build mode.
3060 Logical processor 0 is in build mode while logical processor 1 is
3061 halted, or in machine clear or transitioning to a long microcode
3064 Logical processor 0 is halted, or in machine clear or transitioning to
3065 a long microcode flow while logical processor 1 is in deliver mode.
3067 Logical processor 0 is halted, or in machine clear or transitioning to
3068 a long microcode flow while logical processor 1 is in build mode.
3071 If there is only one logical processor in the processor package then
3072 the qualifier for logical processor 1 is ignored.
3073 If no qualifier is specified, the default qualifier is
3074 .Dq Li DD+DB+DI+BD+BB+BI+ID+IB .
3075 .It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags
3077 Count the number of times uop delivery changed from the trace cache to
3081 can take the following value (which is also the default):
3083 .Bl -tag -width indent -compact
3085 Count TC to MS transfers.
3087 .It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags
3089 Count the number of valid uops written to the uop queue.
3092 is a list of the following strings, separated by
3096 .Bl -tag -width indent -compact
3097 .It Li from-tc-build
3098 Count uops being written from the trace cache in build mode.
3099 .It Li from-tc-deliver
3100 Count uops being written from the trace cache in deliver mode.
3102 Count uops being written from microcode ROM.
3105 The default qualifier counts all the above kinds of uops.
3106 .It Li p4-uop-type Op Li ,mask= Ns Ar flags
3108 This event is used in conjunction with the front-end at-retirement
3109 mechanism to tag load and store uops.
3112 comprises the following strings separated by
3116 .Bl -tag -width indent -compact
3118 Mark uops that are load operations.
3120 Mark uops that are store operations.
3123 The default qualifier counts both kinds of uops.
3124 .It Li p4-uops-retired Op Li ,mask= Ns Ar flags
3126 Count uops retired during a clock cycle.
3129 comprises the following strings separated by
3133 .Bl -tag -width indent -compact
3135 Count marked uops that are not bogus.
3137 Count marked uops that are bogus.
3140 The default qualifier counts both kinds of uops.
3141 .It Li p4-wc-buffer Op Li ,mask= Ns Ar flags
3143 Count write-combining buffer operations.
3146 contains the following strings separated by
3150 .Bl -tag -width indent -compact
3152 WC buffer evictions due to any cause.
3153 .It Li wcb-full-evict
3154 WC buffer evictions due to no WC buffer being available.
3157 The default qualifer counts both kinds of evictions.
3158 .It Li p4-x87-assist Op Li ,mask= Ns Ar flags
3160 Count the retirement of x87 instructions that required special
3164 contains the following strings separated by
3168 .Bl -tag -width indent -compact
3170 Count instructions that saw an FP stack underflow.
3172 Count instructions that saw an FP stack overflow.
3174 Count instructions that saw an x87 output overflow.
3176 Count instructions that saw an x87 output underflow.
3178 Count instructions that needed an x87 input assist.
3181 The default qualifier counts all the above types of instruction
3183 .It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags
3185 Count x87 floating-point uops.
3188 can take the following value (which is also the default):
3190 .Bl -tag -width indent -compact
3192 Count all x87 floating-point uops.
3195 If an instruction contains more than one x87 floating-point uops, then
3196 all x87 floating-point uops will be counted.
3197 This event does not count x87 floating-point data movement operations.
3198 .It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags
3200 Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store
3201 data or perform register-to-register moves.
3202 This event does not count integer move uops.
3205 may contain the following keywords separated by
3209 .Bl -tag -width indent -compact
3211 Count all x87 and SIMD store and move uops.
3213 Count all x87 and SIMD load uops.
3216 The default is to count all uops.
3218 This event may be affected by processor errata N43.
3220 .Ss "Cascading P4 PMCs"
3221 PMC cascading support is currently poorly implemented.
3222 While individual event counters may be allocated with a
3224 qualifier, the current API does not offer the ability
3225 to name and allocate all the resources needed for a
3226 cascaded event counter pair in a single operation.
3227 .Ss "Precise Event Based Sampling"
3228 Support for precise event based sampling is currently
3231 .Sh IMPLEMENTATION NOTES
3232 On the i386 architecture,
3234 has historically allowed the use of the RDTSC instruction from
3235 user-mode (i.e., at a processor CPL of 3) by any process.
3236 This behaviour is preserved by
3240 .Fn pmc_name_of_capability ,
3241 .Fn pmc_name_of_class ,
3242 .Fn pmc_name_of_cputype ,
3243 .Fn pmc_name_of_disposition ,
3244 .Fn pmc_name_of_event ,
3245 .Fn pmc_name_of_mode ,
3247 .Fn pmc_name_of_state
3248 functions return a pointer to the human readable form of their argument.
3249 These pointers may point to statically allocated storage and must
3252 In case of an error, these functions return
3254 and set the global variable
3261 return the number of CPUs and number of PMCs configured respectively;
3262 in case of an error they return the value
3264 and set the global variable
3267 All other functions return the value
3269 if successful; otherwise the value
3271 is returned and the global variable
3273 is set to indicate the error.
3275 The interface between the
3279 driver is intended to be private to the implementation and may
3281 In order to ease forward compatibility with future versions of the
3283 driver, applications are urged to dynamically link with the
3294 may fail with the following errors in addition to those returned by
3301 An unknown CPU type was encountered during initialization.
3302 .It Bq Er EPROGMISMATCH
3303 The version number of the
3305 kernel module did not match that compiled into the
3311 .Fn pmc_capabilities ,
3312 .Fn pmc_name_of_capability ,
3313 .Fn pmc_name_of_disposition ,
3314 .Fn pmc_name_of_state ,
3315 .Fn pmc_name_of_event ,
3316 .Fn pmc_name_of_mode
3317 .Fn pmc_name_of_class
3320 may fail with the following error:
3323 An invalid argument was passed to the function.
3330 may fail with the following error:
3335 has not been initialized.
3340 may fail with the following errors:
3343 The argument passed in was out of range.
3347 library has not been initialized.
3352 may fail with the following errors, in addition to those returned by
3358 library is not yet initialized.
3363 may fail with the following errors, in addition to those returned by
3369 argument passed in had an illegal value, or the event specification
3371 was unrecognized for this CPU type.
3376 .Fn pmc_configure_logfile ,
3380 .Fn pmc_get_driver_stats ,
3391 may fail with the errors described in
3394 If a log file was configured using
3395 .Fn pmc_configure_logfile
3398 driver encountered an error while logging data to it, then
3399 logging will be stopped and a subsequent call to
3400 .Fn pmc_flush_logfile
3401 will fail with the error code seen by the
3415 library first appeared in
3418 The information returned by
3423 should really be available all the time, through a better designed
3424 interface and not just when
3426 is present in the kernel.