1 .\" Copyright (c) 2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" This software is provided by Joseph Koshy ``as is'' and
13 .\" any express or implied warranties, including, but not limited to, the
14 .\" implied warranties of merchantability and fitness for a particular purpose
15 .\" are disclaimed. in no event shall Joseph Koshy be liable
16 .\" for any direct, indirect, incidental, special, exemplary, or consequential
17 .\" damages (including, but not limited to, procurement of substitute goods
18 .\" or services; loss of use, data, or profits; or business interruption)
19 .\" however caused and on any theory of liability, whether in contract, strict
20 .\" liability, or tort (including negligence or otherwise) arising in any way
21 .\" out of the use of this software, even if advised of the possibility of
26 .Dd September 21, 2008
31 .Nd measurement events for
46 CPUs contain PMCs conforming to version 1 of the
48 performance measurement architecture.
50 These PMCs are documented in
52 .%B "IA-32 Intel(R) Architecture Software Developer's Manual"
53 .%T "Volume 3: System Programming Guide"
54 .%N "Order Number 253669-027US"
56 .%Q "Intel Corporation"
59 CPUs conforming to version 1 of the
61 performance measurement architecture contain two programmable PMCs of
64 The PMCs are 40 bits width and offer the following capabilities:
65 .Bl -column "PMC_CAP_INTERRUPT" "Support"
66 .It Em Capability Ta Em Support
67 .It PMC_CAP_CASCADE Ta \&No
68 .It PMC_CAP_EDGE Ta Yes
69 .It PMC_CAP_INTERRUPT Ta Yes
70 .It PMC_CAP_INVERT Ta Yes
71 .It PMC_CAP_READ Ta Yes
72 .It PMC_CAP_PRECISE Ta \&No
73 .It PMC_CAP_SYSTEM Ta Yes
74 .It PMC_CAP_TAGGING Ta \&No
75 .It PMC_CAP_THRESHOLD Ta Yes
76 .It PMC_CAP_USER Ta Yes
77 .It PMC_CAP_WRITE Ta Yes
80 Event specifiers for these PMCs support the following common
82 .Bl -tag -width indent
83 .It Li cmask= Ns Ar value
84 Configure the PMC to increment only if the number of configured
85 events measured in a cycle is greater than or equal to
88 Configure the PMC to count the number of deasserted to asserted
89 transitions of the conditions expressed by the other qualifiers.
90 If specified, the counter will increment only once whenever a
91 condition becomes true, irrespective of the number of clocks during
92 which the condition remains true.
94 Invert the sense of comparision when the
96 qualifier is present, making the counter increment when the number of
97 events per cycle is less than the value specified by the
101 Configure the PMC to count events happening at processor privilege
103 .It Li umask= Ns Ar value
104 This qualifier is used to further qualify the event selected (see
107 Configure the PMC to count events occurring at privilege levels 1, 2
115 qualifiers are specified, the default is to enable both.
117 Events that require core-specificity to be specified use a
119 .Dq Li core= Ns Ar value ,
123 .Bl -tag -width indent -compact
125 Measure event conditions on all cores.
127 Measure event conditions on this core.
132 Events that require an agent qualifier to be specified use an
134 .Dq Li agent= Ns value ,
138 .Bl -tag -width indent -compact
140 Measure events associated with this bus agent.
142 Measure events caused by any bus agent.
147 Events that require a hardware prefetch qualifier to be specified use an
149 .Dq Li prefetch= Ns Ar value ,
153 .Bl -tag -width "exclude" -compact
155 Include all prefetches.
157 Only count hardware prefetches.
159 Exclude hardware prefetches.
164 Events that require a cache coherence qualifier to be specified use an
166 .Dq Li cachestate= Ns Ar value ,
169 contains one or more of the following letters:
170 .Bl -tag -width indent -compact
172 Count cache lines in the exclusive state.
174 Count cache lines in the invalid state.
176 Count cache lines in the modified state.
178 Count cache lines in the shared state.
183 The following event names are case insensitive.
184 Whitespace, hyphens and underscore characters in these names are
187 Core PMCs support the following events:
188 .Bl -tag -width indent
191 The number of BAClear conditions asserted.
194 The number of branches for which the branch table buffer did not
195 produce a prediction.
196 .It Li Br_BAC_Missp_Exec
198 The number of branch instructions executed that were mispredicted at
202 The number of bogus branches.
207 instructions executed.
208 .It Li Br_Call_Missp_Exec
212 instructions executed that were mispredicted.
215 The number of conditional branch instructions executed.
216 .It Li Br_Cnd_Missp_Exec
218 The number of conditional branch instructions executed that were mispredicted.
219 .It Li Br_Ind_Call_Exec
221 The number of indirect
223 instructions executed.
226 The number of indirect branches executed.
227 .It Li Br_Ind_Missp_Exec
229 The number of indirect branch instructions executed that were mispredicted.
232 The number of branch instructions executed including speculative branches.
233 .It Li Br_Instr_Decoded
235 The number of branch instructions decoded.
238 The number of branch instructions retired.
239 .It Li Br_MisPred_Ret
241 The number of mispredicted branch instructions retired.
242 .It Li Br_MisPred_Taken_Ret
244 The number of taken and mispredicted branches retired.
247 The number of branch instructions executed and mispredicted at
248 execution including branches that were not predicted.
249 .It Li Br_Ret_BAC_Missp_Exec
251 The number of return branch instructions that were mispredicted at the
255 The number of return branch instructions executed.
256 .It Li Br_Ret_Missp_Exec
258 The number of return branch instructions executed that were mispredicted.
261 The number of taken branches retired.
262 .It Li Bus_BNR_Clocks
264 The number of external bus cycles while BNR (bus not ready) was asserted.
265 .It Li Bus_DRDY_Clocks Op ,agent= Ns Ar agent
267 The number of external bus cycles while DRDY was asserted.
270 .\" XXX Using the description in Core2 PMC documentation.
271 The number of cycles during which the processor is busy receiving data.
272 .It Li Bus_Locks_Clocks Op ,core= Ns Ar core
274 The number of external bus cycles while the bus lock signal was asserted.
275 .It Li Bus_Not_In_Use Op ,core= Ns Ar core
277 The number of cycles when there is no transaction from the core.
278 .It Li Bus_Req_Outstanding Xo
279 .Op ,agent= Ns Ar agent
280 .Op ,core= Ns Ar core
283 The weighted cycles of cacheable bus data read requests
284 from the data cache unit or hardware prefetcher.
285 .It Li Bus_Snoop_Stall
287 The number bus cycles while a bus snoop is stalled.
289 .Op ,agent= Ns Ar agent
290 .Op ,cachestate= Ns Ar mesi
293 .\" XXX Using the description in Core2 PMC documentation.
294 The number of snoop responses to bus transactions.
295 .It Li Bus_Trans_Any Op ,agent= Ns Ar agent
297 The number of completed bus transactions.
298 .It Li Bus_Trans_Brd Op ,core= Ns Ar core
300 The number of read bus transactions.
301 .It Li Bus_Trans_Burst Op ,agent= Ns Ar agent
303 The number of completed burst transactions.
304 Retried transactions may be counted more than once.
305 .It Li Bus_Trans_Def Op ,core= Ns Ar core
307 The number of completed deferred transactions.
308 .It Li Bus_Trans_IO Xo
309 .Op ,agent= Ns Ar agent
310 .Op ,core= Ns Ar core
313 The number of completed I/O transactions counting both reads and
315 .It Li Bus_Trans_Ifetch Xo
316 .Op ,agent= Ns Ar agent
317 .Op ,core= Ns Ar core
320 Completed instruction fetch transactions.
321 .It Li Bus_Trans_Inval Xo
322 .Op ,agent= Ns Ar agent
323 .Op ,core= Ns Ar core
326 The number completed invalidate transactions.
327 .It Li Bus_Trans_Mem Op ,agent= Ns Ar agent
329 The number of completed memory transactions.
330 .It Li Bus_Trans_P Xo
331 .Op ,agent= Ns Ar agent
332 .Op ,core= Ns Ar core
335 The number of completed partial transactions.
336 .It Li Bus_Trans_Pwr Xo
337 .Op ,agent= Ns Ar agent
338 .Op ,core= Ns Ar core
341 The number of completed partial write transactions.
342 .It Li Bus_Trans_RFO Xo
343 .Op ,agent= Ns Ar agent
344 .Op ,core= Ns Ar core
347 The number of completed read-for-ownership transactions.
348 .It Li Bus_Trans_WB Op ,agent= Ns Ar agent
350 The number of completed writeback transactions from the data cache
351 unit, excluding L2 writebacks.
352 .It Li Cycles_Div_Busy
354 The number of cycles the divider is busy.
355 The event is only only available for on PMC0.
356 .It Li Cycles_Int_Masked
358 The number of cycles while interrupts were disabled.
359 .It Li Cycles_Int_Pending_Masked
361 The number of cycles while interrupts were disabled and interrupts
363 .It Li DCU_Snoop_To_Share Op ,core= Ns core
365 The number of data cache unit snoops to L1 cache lines in the shared
367 .It Li DCache_Cache_Lock Op ,cachestate= Ns Ar mesi
368 .\" XXX needs clarification
370 The number of cacheable locked read operations to invalid state.
371 .It Li DCache_Cache_LD Op ,cachestate= Ns Ar mesi
373 The number of cacheable L1 data read operations.
374 .It Li DCache_Cache_ST Op ,cachestate= Ns Ar mesi
376 The number cacheable L1 data write operations.
377 .It Li DCache_M_Evict
379 The number of M state data cache lines that were evicted.
382 The number of M state data cache lines that were allocated.
383 .It Li DCache_Pend_Miss
385 The weighted cycles an L1 miss was outstanding.
388 The number of data cache line replacements.
389 .It Li Data_Mem_Cache_Ref
391 The number of cacheable read and write operations to L1 data cache.
394 The number of L1 data reads and writes, both cacheable and
396 .It Li Dbus_Busy Op ,core= Ns Ar core
398 The number of core cycles during which the data bus was busy.
399 .It Li Dbus_Busy_Rd Op ,core= Ns Ar core
401 The nunber of cycles during which the data bus was busy transferring
405 The number of divide operations including speculative operations for
406 integer and floating point divides.
407 This event can only be counted on PMC1.
410 The number of data references that missed the TLB.
413 The number of ESP folding instructions decoded.
414 .It Li EST_Trans Op ,trans= Ns Ar transition
416 Count the number of Intel Enhanced SpeedStep transitions.
419 can be one of the following values:
420 .Bl -tag -width indent -compact
422 (Umask 00H) Count all transitions.
424 (Umask 01H) Count frequency transitions.
430 The number of floating point operations that required microcode
432 The event is only available on PMC1.
433 .It Li FP_Comp_Instr_Ret
435 The number of X87 floating point compute instructions retired.
436 The event is only available on PMC0.
437 .It Li FP_Comps_Op_Exe
439 The number of floating point computational instructions executed.
441 .Pq Event CCH , Umask 01H
442 The number of transitions from X87 to MMX.
443 .It Li Fused_Ld_Uops_Ret
444 .Pq Event DAH , Umask 01H
445 The number of fused load uops retired.
446 .It Li Fused_St_Uops_Ret
447 .Pq Event DAH , Umask 02H
448 The number of fused store uops retired.
449 .It Li Fused_Uops_Ret
450 .Pq Event DAH , Umask 00H
451 The number of fused uops retired.
454 The number of hardware interrupts received.
457 The number of instruction fetch misses in the instruction cache and
461 The number of instruction fetches from the the instruction cache and
462 streaming buffers counting both cacheable and uncacheable fetches.
465 The number of cycles the instruction fetch unit was stalled while
466 waiting for data from memory.
469 The number of instruction length decoder stalls.
472 The number of instruction TLB misses.
475 The number of instructions decoded.
478 The number of instructions retired.
481 The number of L1 prefetch request due to data cache misses.
482 .It Li L2_ADS Op ,core= Ns core
484 The number of L2 address strobes.
486 .Op ,cachestate= Ns Ar mesi
487 .Op ,core= Ns Ar core
490 The number of instruction fetches by the instruction fetch unit from
491 L2 cache including speculative fetches.
493 .Op ,cachestate= Ns Ar mesi
494 .Op ,core= Ns Ar core
497 The number of L2 cache reads.
498 .It Li L2_Lines_In Xo
499 .Op ,core= Ns Ar core
500 .Op ,prefetch= Ns Ar prefetch
503 The number of L2 cache lines allocated.
504 .It Li L2_Lines_Out Xo
505 .Op ,core= Ns Ar core
506 .Op ,prefetch= Ns Ar prefetch
509 The number of L2 cache lines evicted.
510 .It Li L2_M_Lines_In Op ,core= Ns Ar core
512 The number of L2 M state cache lines allocated.
513 .It Li L2_M_Lines_Out Xo
514 .Op ,core= Ns Ar core
515 .Op ,prefetch= Ns Ar prefetch
518 The number of L2 M state cache lines evicted.
519 .It Li L2_No_Request_Cycles Xo
520 .Op ,cachestate= Ns Ar mesi
521 .Op ,core= Ns Ar core
522 .Op ,prefetch= Ns Ar prefetch
525 The number of cycles there was no request to access L2 cache.
526 .It Li L2_Reject_Cycles Xo
527 .Op ,cachestate= Ns Ar mesi
528 .Op ,core= Ns Ar core
529 .Op ,prefetch= Ns Ar prefetch
532 The number of cycles the L2 cache was busy and rejecting new requests.
534 .Op ,cachestate= Ns Ar mesi
535 .Op ,core= Ns Ar core
536 .Op ,prefetch= Ns Ar prefetch
539 The number of L2 cache requests.
541 .Op ,cachestate= Ns Ar mesi
542 .Op ,core= Ns Ar core
545 The number of L2 cache writes including speculative writes.
548 The number of load operations delayed due to store buffer blocks.
551 The number of EMMX instructions executed.
553 .Pq Event CCH , Umask 00H
554 The number of transitions from MMX to X87.
555 .It Li MMX_Instr_Exec
557 The number of MMX instructions executed excluding
564 The number of MMX instructions retired.
565 .It Li Misalign_Mem_Ref
567 The number of misaligned data memory references, counting loads and
571 The number of multiply operations include speculative floating point
572 and integer multiplies.
573 This event is available on PMC1 only.
574 .It Li NonHlt_Ref_Cycles
575 .Pq Event 3CH , Umask 01H
576 The number of non-halted bus cycles.
579 The number of hardware prefetch requests issued in backward streams.
582 The number of hardware prefetch requests issued in forward streams.
583 .It Li Resource_Stall
585 The number of cycles where there is a resource related stall.
588 The number of cycles while draining store buffers.
589 .It Li SIMD_FP_DP_P_Ret
590 .Pq Event D8H , Umask 02H
591 The number of SSE/SSE2 packed double precision instructions retired.
592 .It Li SIMD_FP_DP_P_Comp_Ret
593 .Pq Event D9H , Umask 02H
594 The number of SSE/SSE2 packed double precision compute instructions
596 .It Li SIMD_FP_DP_S_Ret
597 .Pq Event D8H , Umask 03H
598 The number of SSE/SSE2 scalar double precision instructions retired.
599 .It Li SIMD_FP_DP_S_Comp_Ret
600 .Pq Event D9H , Umask 03H
601 The number of SSE/SSE2 scalar double precision compute instructions
603 .It Li SIMD_FP_SP_P_Comp_Ret
604 .Pq Event D9H , Umask 00H
605 The number of SSE/SSE2 packed single precision compute instructions
607 .It Li SIMD_FP_SP_Ret
608 .Pq Event D8H , Umask 00H
609 The number of SSE/SSE2 scalar single precision instructions retired,
610 both packed and scalar.
611 .It Li SIMD_FP_SP_S_Ret
612 .Pq Event D8H , Umask 01H
613 The number of SSE/SSE2 scalar single precision instructions retired.
614 .It Li SIMD_FP_SP_S_Comp_Ret
615 .Pq Event D9H , Umask 01H
616 The number of SSE/SSE2 single precision compute instructions retired.
617 .It Li SIMD_Int_128_Ret
618 .Pq Event D8H , Umask 04H
619 The number of SSE2 128-bit integer instructions retired.
620 .It Li SIMD_Int_Pari_Exec
621 .Pq Event B3H , Umask 20H
622 The number of SIMD integer packed arithmetic instructions executed.
623 .It Li SIMD_Int_Pck_Exec
624 .Pq Event B3H , Umask 04H
625 The number of SIMD integer pack operations instructions executed.
626 .It Li SIMD_Int_Plog_Exec
627 .Pq Event B3H , Umask 10H
628 The number of SIMD integer packed logical instructions executed.
629 .It Li SIMD_Int_Pmul_Exec
630 .Pq Event B3H , Umask 01H
631 The number of SIMD integer packed multiply instructions executed.
632 .It Li SIMD_Int_Psft_Exec
633 .Pq Event B3H , Umask 02H
634 The number of SIMD integer packed shift instructions executed.
635 .It Li SIMD_Int_Sat_Exec
637 The number of SIMD integer saturating instructions executed.
638 .It Li SIMD_Int_Upck_Exec
639 .Pq Event B3H , Umask 08H
640 The number of SIMD integer unpack instructions executed.
643 The number of times self-modifying code was detected.
644 .It Li SSE_NTStores_Miss
645 .Pq Event 4BH , Umask 03H
646 The number of times an SSE streaming store instruction missed all caches.
647 .It Li SSE_NTStores_Ret
648 .Pq Event 07H , Umask 03H
649 The number of SSE streaming store instructions executed.
650 .It Li SSE_PrefNta_Miss
651 .Pq Event 4BH , Umask 00H
655 .It Li SSE_PrefNta_Ret
656 .Pq Event 07H , Umask 00H
659 instructions retired.
660 .It Li SSE_PrefT1_Miss
661 .Pq Event 4BH , Umask 01H
665 .It Li SSE_PrefT1_Ret
666 .Pq Event 07H , Umask 01H
669 instructions retired.
670 .It Li SSE_PrefT2_Miss
671 .Pq Event 4BH , Umask 02H
675 .It Li SSE_PrefT2_Ret
676 .Pq Event 07H , Umask 02H
679 instructions retired.
682 The number of segment register loads.
683 .It Li Serial_Execution_Cycles
684 .Pq Event 3CH , Umask 02H
685 The number of non-halted bus cycles of this code while the other core
688 .Pq Event 3BH , Umask C0H
689 The duration in a thermal trip based on the current core clock.
692 The number of unfusion events.
695 The number of micro-ops retired.
697 .Ss Event Name Aliases
698 The following table shows the mapping between the PMC-independent
701 and the underlying hardware events used.
702 .Bl -column "branch-mispredicts" "Description"
703 .It Em Alias Ta Em Event
704 .It Li branches Ta Li Br_Instr_Ret
705 .It Li branch-mispredicts Ta Li Br_MisPred_Ret
706 .It Li dc-misses Ta (unsupported)
707 .It Li ic-misses Ta Li ICache_Misses
708 .It Li instructions Ta Li Instr_Ret
709 .It Li interrupts Ta Li HW_Int_Rx
710 .It Li unhalted-cycles Ta (unsupported)
728 library first appeared in
733 library was written by
735 .Aq jkoshy@FreeBSD.org .