1 .\" Copyright (c) 2008 Joseph Koshy. All rights reserved.
3 .\" Redistribution and use in source and binary forms, with or without
4 .\" modification, are permitted provided that the following conditions
6 .\" 1. Redistributions of source code must retain the above copyright
7 .\" notice, this list of conditions and the following disclaimer.
8 .\" 2. Redistributions in binary form must reproduce the above copyright
9 .\" notice, this list of conditions and the following disclaimer in the
10 .\" documentation and/or other materials provided with the distribution.
12 .\" This software is provided by Joseph Koshy ``as is'' and
13 .\" any express or implied warranties, including, but not limited to, the
14 .\" implied warranties of merchantability and fitness for a particular purpose
15 .\" are disclaimed. in no event shall Joseph Koshy be liable
16 .\" for any direct, indirect, incidental, special, exemplary, or consequential
17 .\" damages (including, but not limited to, procurement of substitute goods
18 .\" or services; loss of use, data, or profits; or business interruption)
19 .\" however caused and on any theory of liability, whether in contract, strict
20 .\" liability, or tort (including negligence or otherwise) arising in any way
21 .\" out of the use of this software, even if advised of the possibility of
31 .Nd measurement events for
46 CPUs contain PMCs conforming to version 1 of the
48 performance measurement architecture.
50 These PMCs are documented in
52 .%B "IA-32 Intel(R) Architecture Software Developer's Manual"
53 .%T "Volume 3: System Programming Guide"
54 .%N "Order Number 253669-027US"
56 .%Q "Intel Corporation"
59 CPUs conforming to version 1 of the
61 performance measurement architecture contain two programmable PMCs of
64 The PMCs are 40 bits width and offer the following capabilities:
65 .Bl -column "PMC_CAP_INTERRUPT" "Support"
66 .It Em Capability Ta Em Support
67 .It PMC_CAP_CASCADE Ta \&No
68 .It PMC_CAP_EDGE Ta Yes
69 .It PMC_CAP_INTERRUPT Ta Yes
70 .It PMC_CAP_INVERT Ta Yes
71 .It PMC_CAP_READ Ta Yes
72 .It PMC_CAP_PRECISE Ta \&No
73 .It PMC_CAP_SYSTEM Ta Yes
74 .It PMC_CAP_TAGGING Ta \&No
75 .It PMC_CAP_THRESHOLD Ta Yes
76 .It PMC_CAP_USER Ta Yes
77 .It PMC_CAP_WRITE Ta Yes
80 Event specifiers for these PMCs support the following common
82 .Bl -tag -width indent
83 .It Li cmask= Ns Ar value
84 Configure the PMC to increment only if the number of configured
85 events measured in a cycle is greater than or equal to
88 Configure the PMC to count the number of deasserted to asserted
89 transitions of the conditions expressed by the other qualifiers.
90 If specified, the counter will increment only once whenever a
91 condition becomes true, irrespective of the number of clocks during
92 which the condition remains true.
94 Invert the sense of comparision when the
96 qualifier is present, making the counter increment when the number of
97 events per cycle is less than the value specified by the
101 Configure the PMC to count events happening at processor privilege
103 .It Li umask= Ns Ar value
104 This qualifier is used to further qualify the event selected (see
107 Configure the PMC to count events occurring at privilege levels 1, 2
115 qualifiers are specified, the default is to enable both.
117 Events that require core-specificity to be specified use a
119 .Dq Li core= Ns Ar value ,
123 .Bl -tag -width indent -compact
125 Measure event conditions on all cores.
127 Measure event conditions on this core.
132 Events that require an agent qualifier to be specified use an
134 .Dq Li agent= Ns value ,
138 .Bl -tag -width indent -compact
140 Measure events associated with this bus agent.
142 Measure events caused by any bus agent.
147 Events that require a hardware prefetch qualifier to be specified use an
149 .Dq Li prefetch= Ns Ar value ,
153 .Bl -tag -width "exclude" -compact
155 Include all prefetches.
157 Only count hardware prefetches.
159 Exclude hardware prefetches.
164 Events that require a cache coherence qualifier to be specified use an
166 .Dq Li cachestate= Ns Ar value ,
169 contains one or more of the following letters:
170 .Bl -tag -width indent -compact
172 Count cache lines in the exclusive state.
174 Count cache lines in the invalid state.
176 Count cache lines in the modified state.
178 Count cache lines in the shared state.
183 The following event names are case insensitive.
184 Whitespace, hyphens and underscore characters in these names are
187 Core PMCs support the following events:
188 .Bl -tag -width indent
190 .Pq Event E6H , Umask 00H
191 The number of BAClear conditions asserted.
193 .Pq Event E2H , Umask 00H
194 The number of branches for which the branch table buffer did not
195 produce a prediction.
196 .It Li Br_BAC_Missp_Exec
197 .Pq Event 8AH , Umask 00H
198 The number of branch instructions executed that were mispredicted at
201 .Pq Event E4H , Umask 00H
202 The number of bogus branches.
204 .Pq Event 92H , Umask 00H
207 instructions executed.
208 .It Li Br_Call_Missp_Exec
209 .Pq Event 93H , Umask 00H
212 instructions executed that were mispredicted.
214 .Pq Event 8BH , Umask 00H
215 The number of conditional branch instructions executed.
216 .It Li Br_Cnd_Missp_Exec
217 .Pq Event 8CH , Umask 00H
218 The number of conditional branch instructions executed that were mispredicted.
219 .It Li Br_Ind_Call_Exec
220 .Pq Event 94H , Umask 00H
221 The number of indirect
223 instructions executed.
225 .Pq Event 8DH , Umask 00H
226 The number of indirect branches executed.
227 .It Li Br_Ind_Missp_Exec
228 .Pq Event 8EH , Umask 00H
229 The number of indirect branch instructions executed that were mispredicted.
231 .Pq Event 88H , Umask 00H
232 The number of branch instructions executed including speculative branches.
233 .It Li Br_Instr_Decoded
234 .Pq Event E0H , Umask 00H
235 The number of branch instructions decoded.
237 .Pq Event C4H, Umask 00H
238 .Pq Alias Qq "Branch Instruction Retired"
239 The number of branch instructions retired.
240 This is an architectural performance event.
241 .It Li Br_MisPred_Ret
242 .Pq Event C5H, Umask 00H
243 .Pq Alias Qq "Branch Misses Retired"
244 The number of mispredicted branch instructions retired.
245 This is an architectural performance event.
246 .It Li Br_MisPred_Taken_Ret
247 .Pq Event CAH , Umask 00H
248 The number of taken and mispredicted branches retired.
250 .Pq Event 89H , Umask 00H
251 The number of branch instructions executed and mispredicted at
252 execution including branches that were not predicted.
253 .It Li Br_Ret_BAC_Missp_Exec
254 .Pq Event 91H , Umask 00H
255 The number of return branch instructions that were mispredicted at the
258 .Pq Event 8FH , Umask 00H
259 The number of return branch instructions executed.
260 .It Li Br_Ret_Missp_Exec
261 .Pq Event 90H , Umask 00H
262 The number of return branch instructions executed that were mispredicted.
264 .Pq Event C9H , Umask 00H
265 The number of taken branches retired.
266 .It Li Bus_BNR_Clocks
267 .Pq Event 61H , Umask 00H
268 The number of external bus cycles while BNR (bus not ready) was asserted.
269 .It Li Bus_DRDY_Clocks Op ,agent= Ns Ar agent
270 .Pq Event 62H , Umask 00H
271 The number of external bus cycles while DRDY was asserted.
273 .Pq Event 64H , Umask 40H
274 .\" XXX Using the description in Core2 PMC documentation.
275 The number of cycles during which the processor is busy receiving data.
276 .It Li Bus_Locks_Clocks Op ,core= Ns Ar core
278 The number of external bus cycles while the bus lock signal was asserted.
279 .It Li Bus_Not_In_Use Op ,core= Ns Ar core
281 The number of cycles when there is no transaction from the core.
282 .It Li Bus_Req_Outstanding Xo
283 .Op ,agent= Ns Ar agent
284 .Op ,core= Ns Ar core
287 The weighted cycles of cacheable bus data read requests
288 from the data cache unit or hardware prefetcher.
289 .It Li Bus_Snoop_Stall
290 .Pq Event 7EH , Umask 00H
291 The number bus cycles while a bus snoop is stalled.
293 .Op ,agent= Ns Ar agent
294 .Op ,cachestate= Ns Ar mesi
297 .\" XXX Using the description in Core2 PMC documentation.
298 The number of snoop responses to bus transactions.
299 .It Li Bus_Trans_Any Op ,agent= Ns Ar agent
301 The number of completed bus transactions.
302 .It Li Bus_Trans_Brd Op ,core= Ns Ar core
304 The number of read bus transactions.
305 .It Li Bus_Trans_Burst Op ,agent= Ns Ar agent
307 The number of completed burst transactions.
308 Retried transactions may be counted more than once.
309 .It Li Bus_Trans_Def Op ,core= Ns Ar core
311 The number of completed deferred transactions.
312 .It Li Bus_Trans_IO Xo
313 .Op ,agent= Ns Ar agent
314 .Op ,core= Ns Ar core
317 The number of completed I/O transactions counting both reads and
319 .It Li Bus_Trans_Ifetch Xo
320 .Op ,agent= Ns Ar agent
321 .Op ,core= Ns Ar core
324 Completed instruction fetch transactions.
325 .It Li Bus_Trans_Inval Xo
326 .Op ,agent= Ns Ar agent
327 .Op ,core= Ns Ar core
330 The number completed invalidate transactions.
331 .It Li Bus_Trans_Mem Op ,agent= Ns Ar agent
333 The number of completed memory transactions.
334 .It Li Bus_Trans_P Xo
335 .Op ,agent= Ns Ar agent
336 .Op ,core= Ns Ar core
339 The number of completed partial transactions.
340 .It Li Bus_Trans_Pwr Xo
341 .Op ,agent= Ns Ar agent
342 .Op ,core= Ns Ar core
345 The number of completed partial write transactions.
346 .It Li Bus_Trans_RFO Xo
347 .Op ,agent= Ns Ar agent
348 .Op ,core= Ns Ar core
351 The number of completed read-for-ownership transactions.
352 .It Li Bus_Trans_WB Op ,agent= Ns Ar agent
354 The number of completed writeback transactions from the data cache
355 unit, excluding L2 writebacks.
356 .It Li Cycles_Div_Busy
357 .Pq Event 14H , Umask 00H
358 The number of cycles the divider is busy.
359 The event is only only available for on PMC0.
360 .It Li Cycles_Int_Masked
361 .Pq Event C6H , Umask 00H
362 The number of cycles while interrupts were disabled.
363 .It Li Cycles_Int_Pending_Masked
364 .Pq Event C7H , Umask 00H
365 The number of cycles while interrupts were disabled and interrupts
367 .It Li DCU_Snoop_To_Share Op ,core= Ns core
369 The number of data cache unit snoops to L1 cache lines in the shared
371 .It Li DCache_Cache_Lock Op ,cachestate= Ns Ar mesi
372 .\" XXX needs clarification
374 The number of cacheable locked read operations to invalid state.
375 .It Li DCache_Cache_LD Op ,cachestate= Ns Ar mesi
377 The number of cacheable L1 data read operations.
378 .It Li DCache_Cache_ST Op ,cachestate= Ns Ar mesi
380 The number cacheable L1 data write operations.
381 .It Li DCache_M_Evict
382 .Pq Event 47H , Umask 00H
383 The number of M state data cache lines that were evicted.
385 .Pq Event 46H , Umask 00H
386 The number of M state data cache lines that were allocated.
387 .It Li DCache_Pend_Miss
388 .Pq Event 48H , Umask 00H
389 The weighted cycles an L1 miss was outstanding.
391 .Pq Event 45H , Umask 0FH
392 The number of data cache line replacements.
393 .It Li Data_Mem_Cache_Ref
394 .Pq Event 44H , Umask 02H
395 The number of cacheable read and write operations to L1 data cache.
397 .Pq Event 43H , Umask 01H
398 The number of L1 data reads and writes, both cacheable and
400 .It Li Dbus_Busy Op ,core= Ns Ar core
402 The number of core cycles during which the data bus was busy.
403 .It Li Dbus_Busy_Rd Op ,core= Ns Ar core
405 The nunber of cycles during which the data bus was busy transferring
408 .Pq Event 13H , Umask 00H
409 The number of divide operations including speculative operations for
410 integer and floating point divides.
411 This event can only be counted on PMC1.
413 .Pq Event 49H , Umask 00H
414 The number of data references that missed the TLB.
416 .Pq Event D7H , Umask 00H
417 The number of ESP folding instructions decoded.
418 .It Li EST_Trans Op ,trans= Ns Ar transition
420 Count the number of Intel Enhanced SpeedStep transitions.
423 can be one of the following values:
424 .Bl -tag -width indent -compact
426 (Umask 00H) Count all transitions.
428 (Umask 01H) Count frequency transitions.
433 .Pq Event 11H , Umask 00H
434 The number of floating point operations that required microcode
436 The event is only available on PMC1.
437 .It Li FP_Comp_Instr_Ret
438 .Pq Event C1H , Umask 00H
439 The number of X87 floating point compute instructions retired.
440 The event is only available on PMC0.
441 .It Li FP_Comps_Op_Exe
442 .Pq Event 10H , Umask 00H
443 The number of floating point computational instructions executed.
445 .Pq Event CCH , Umask 01H
446 The number of transitions from X87 to MMX.
447 .It Li Fused_Ld_Uops_Ret
448 .Pq Event DAH , Umask 01H
449 The number of fused load uops retired.
450 .It Li Fused_St_Uops_Ret
451 .Pq Event DAH , Umask 02H
452 The number of fused store uops retired.
453 .It Li Fused_Uops_Ret
454 .Pq Event DAH , Umask 00H
455 The number of fused uops retired.
457 .Pq Event C8H , Umask 00H
458 The number of hardware interrupts received.
460 .Pq Event 81H , Umask 00H
461 The number of instruction fetch misses in the instruction cache and
464 .Pq Event 80H , Umask 00H
465 The number of instruction fetches from the the instruction cache and
466 streaming buffers counting both cacheable and uncacheable fetches.
468 .Pq Event 86H , Umask 00H
469 The number of cycles the instruction fetch unit was stalled while
470 waiting for data from memory.
472 .Pq Event 87H , Umask 00H
473 The number of instruction length decoder stalls.
475 .Pq Event 85H , Umask 00H
476 The number of instruction TLB misses.
478 .Pq Event D0H , Umask 00H
479 The number of instructions decoded.
481 .Pq Event C0H , Umask 00H
482 .Pq Alias Qq "Instruction Retired"
483 The number of instructions retired.
484 This is an architectural performance event.
486 .Pq Event 4FH , Umask 00H
487 The number of L1 prefetch request due to data cache misses.
488 .It Li L2_ADS Op ,core= Ns core
490 The number of L2 address strobes.
492 .Op ,cachestate= Ns Ar mesi
493 .Op ,core= Ns Ar core
496 The number of instruction fetches by the instruction fetch unit from
497 L2 cache including speculative fetches.
499 .Op ,cachestate= Ns Ar mesi
500 .Op ,core= Ns Ar core
503 The number of L2 cache reads.
504 .It Li L2_Lines_In Xo
505 .Op ,core= Ns Ar core
506 .Op ,prefetch= Ns Ar prefetch
509 The number of L2 cache lines allocated.
510 .It Li L2_Lines_Out Xo
511 .Op ,core= Ns Ar core
512 .Op ,prefetch= Ns Ar prefetch
515 The number of L2 cache lines evicted.
516 .It Li L2_M_Lines_In Op ,core= Ns Ar core
518 The number of L2 M state cache lines allocated.
519 .It Li L2_M_Lines_Out Xo
520 .Op ,core= Ns Ar core
521 .Op ,prefetch= Ns Ar prefetch
524 The number of L2 M state cache lines evicted.
525 .It Li L2_No_Request_Cycles Xo
526 .Op ,cachestate= Ns Ar mesi
527 .Op ,core= Ns Ar core
528 .Op ,prefetch= Ns Ar prefetch
531 The number of cycles there was no request to access L2 cache.
532 .It Li L2_Reject_Cycles Xo
533 .Op ,cachestate= Ns Ar mesi
534 .Op ,core= Ns Ar core
535 .Op ,prefetch= Ns Ar prefetch
538 The number of cycles the L2 cache was busy and rejecting new requests.
540 .Op ,cachestate= Ns Ar mesi
541 .Op ,core= Ns Ar core
542 .Op ,prefetch= Ns Ar prefetch
545 The number of L2 cache requests.
547 .Op ,cachestate= Ns Ar mesi
548 .Op ,core= Ns Ar core
551 The number of L2 cache writes including speculative writes.
553 .Pq Event 03H , Umask 00H
554 The number of load operations delayed due to store buffer blocks.
556 .Pq Event 2EH, Umask 41H
557 The number of cache misses for references to the last level cache,
558 excluding misses due to hardware prefetches.
559 This is an architectural performance event.
561 The number of references to the last level cache,
562 excluding those due to hardware prefetches.
563 This is an architectural performance event.
564 .Pq Event 2EH, Umask 4FH
565 This is an architectural performance event.
567 .Pq Event CDH , Umask 00H
568 The number of EMMX instructions executed.
570 .Pq Event CCH , Umask 00H
571 The number of transitions from MMX to X87.
572 .It Li MMX_Instr_Exec
573 .Pq Event B0H , Umask 00H
574 The number of MMX instructions executed excluding
580 .Pq Event CEH , Umask 00H
581 The number of MMX instructions retired.
582 .It Li Misalign_Mem_Ref
583 .Pq Event 05H , Umask 00H
584 The number of misaligned data memory references, counting loads and
587 .Pq Event 12H , Umask 00H
588 The number of multiply operations include speculative floating point
589 and integer multiplies.
590 This event is available on PMC1 only.
591 .It Li NonHlt_Ref_Cycles
592 .Pq Event 3CH , Umask 01H
593 .Pq Alias Qq "Unhalted Reference Cycles"
594 The number of non-halted bus cycles.
595 This is an architectural performance event.
597 .Pq Event F8H , Umask 00H
598 The number of hardware prefetch requests issued in backward streams.
600 .Pq Event F0H , Umask 00H
601 The number of hardware prefetch requests issued in forward streams.
602 .It Li Resource_Stall
603 .Pq Event A2H , Umask 00H
604 The number of cycles where there is a resource related stall.
606 .Pq Event 04H , Umask 00H
607 The number of cycles while draining store buffers.
608 .It Li SIMD_FP_DP_P_Ret
609 .Pq Event D8H , Umask 02H
610 The number of SSE/SSE2 packed double precision instructions retired.
611 .It Li SIMD_FP_DP_P_Comp_Ret
612 .Pq Event D9H , Umask 02H
613 The number of SSE/SSE2 packed double precision compute instructions
615 .It Li SIMD_FP_DP_S_Ret
616 .Pq Event D8H , Umask 03H
617 The number of SSE/SSE2 scalar double precision instructions retired.
618 .It Li SIMD_FP_DP_S_Comp_Ret
619 .Pq Event D9H , Umask 03H
620 The number of SSE/SSE2 scalar double precision compute instructions
622 .It Li SIMD_FP_SP_P_Comp_Ret
623 .Pq Event D9H , Umask 00H
624 The number of SSE/SSE2 packed single precision compute instructions
626 .It Li SIMD_FP_SP_Ret
627 .Pq Event D8H , Umask 00H
628 The number of SSE/SSE2 scalar single precision instructions retired,
629 both packed and scalar.
630 .It Li SIMD_FP_SP_S_Ret
631 .Pq Event D8H , Umask 01H
632 The number of SSE/SSE2 scalar single precision instructions retired.
633 .It Li SIMD_FP_SP_S_Comp_Ret
634 .Pq Event D9H , Umask 01H
635 The number of SSE/SSE2 single precision compute instructions retired.
636 .It Li SIMD_Int_128_Ret
637 .Pq Event D8H , Umask 04H
638 The number of SSE2 128-bit integer instructions retired.
639 .It Li SIMD_Int_Pari_Exec
640 .Pq Event B3H , Umask 20H
641 The number of SIMD integer packed arithmetic instructions executed.
642 .It Li SIMD_Int_Pck_Exec
643 .Pq Event B3H , Umask 04H
644 The number of SIMD integer pack operations instructions executed.
645 .It Li SIMD_Int_Plog_Exec
646 .Pq Event B3H , Umask 10H
647 The number of SIMD integer packed logical instructions executed.
648 .It Li SIMD_Int_Pmul_Exec
649 .Pq Event B3H , Umask 01H
650 The number of SIMD integer packed multiply instructions executed.
651 .It Li SIMD_Int_Psft_Exec
652 .Pq Event B3H , Umask 02H
653 The number of SIMD integer packed shift instructions executed.
654 .It Li SIMD_Int_Sat_Exec
655 .Pq Event B1H , Umask 00H
656 The number of SIMD integer saturating instructions executed.
657 .It Li SIMD_Int_Upck_Exec
658 .Pq Event B3H , Umask 08H
659 The number of SIMD integer unpack instructions executed.
662 The number of times self-modifying code was detected.
663 .It Li SSE_NTStores_Miss
664 .Pq Event 4BH , Umask 03H
665 The number of times an SSE streaming store instruction missed all caches.
666 .It Li SSE_NTStores_Ret
667 .Pq Event 07H , Umask 03H
668 The number of SSE streaming store instructions executed.
669 .It Li SSE_PrefNta_Miss
670 .Pq Event 4BH , Umask 00H
674 .It Li SSE_PrefNta_Ret
675 .Pq Event 07H , Umask 00H
678 instructions retired.
679 .It Li SSE_PrefT1_Miss
680 .Pq Event 4BH , Umask 01H
684 .It Li SSE_PrefT1_Ret
685 .Pq Event 07H , Umask 01H
688 instructions retired.
689 .It Li SSE_PrefT2_Miss
690 .Pq Event 4BH , Umask 02H
694 .It Li SSE_PrefT2_Ret
695 .Pq Event 07H , Umask 02H
698 instructions retired.
700 .Pq Event 06H , Umask 00H
701 The number of segment register loads.
702 .It Li Serial_Execution_Cycles
703 .Pq Event 3CH , Umask 02H
704 The number of non-halted bus cycles of this code while the other core
707 .Pq Event 3BH , Umask C0H
708 The duration in a thermal trip based on the current core clock.
710 .Pq Event DBH , Umask 00H
711 The number of unfusion events.
712 .It Li Unhalted_Core_Cycles
713 .Pq Event 3CH , Umask 00H
714 The number of core clock cycles when the clock signal on a specific
716 This is an architectural performance event.
718 .Pq Event C2H , Umask 00H
719 The number of micro-ops retired.
721 .Ss Event Name Aliases
722 The following table shows the mapping between the PMC-independent
725 and the underlying hardware events used.
726 .Bl -column "branch-mispredicts" "Description"
727 .It Em Alias Ta Em Event
728 .It Li branches Ta Li Br_Instr_Ret
729 .It Li branch-mispredicts Ta Li Br_MisPred_Ret
730 .It Li dc-misses Ta (unsupported)
731 .It Li ic-misses Ta Li ICache_Misses
732 .It Li instructions Ta Li Instr_Ret
733 .It Li interrupts Ta Li HW_Int_Rx
734 .It Li unhalted-cycles Ta (unsupported)
752 library first appeared in
757 library was written by
759 .Aq jkoshy@FreeBSD.org .