lib/libpmc/pmc.mips24k.3

   1 .\" Copyright (c) 2010 George Neville-Neil.  All rights reserved.
   2 .\"
   3 .\" Redistribution and use in source and binary forms, with or without
   4 .\" modification, are permitted provided that the following conditions
   5 .\" are met:
   6 .\" 1. Redistributions of source code must retain the above copyright
   7 .\"    notice, this list of conditions and the following disclaimer.
   8 .\" 2. Redistributions in binary form must reproduce the above copyright
   9 .\"    notice, this list of conditions and the following disclaimer in the
  10 .\"    documentation and/or other materials provided with the distribution.
  11 .\"
  12 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  13 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  14 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  15 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  16 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  17 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  18 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  19 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  20 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  21 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  22 .\" SUCH DAMAGE.
  23 .\"
  24 .\" $FreeBSD$
  25 .\"
  26 .Dd March 24, 2012
  27 .Dt PMC.MIPS24K 3
  28 .Os
  29 .Sh NAME
  30 .Nm pmc.mips24k
  31 .Nd measurement events for
  32 .Tn MIPS24K
  33 family CPUs
  34 .Sh LIBRARY
  35 .Lb libpmc
  36 .Sh SYNOPSIS
  37 .In pmc.h
  38 .Sh DESCRIPTION
  39 MIPS PMCs are present in MIPS
  40 .Tn "24k"
  41 and other processors in the MIPS family.
  42 .Pp
  43 There are two counters supported by the hardware and each is 32 bits
  44 wide.
  45 .Pp
  46 MIPS PMCs are documented in
  47 .Rs
  48 .%B "MIPS32 24K Processor Core Family Software User's Manual"
  49 .%D December 2008
  50 .%Q "MIPS Technologies Inc."
  51 .Re
  52 .Ss Event Specifiers (Programmable PMCs)
  53 MIPS programmable PMCs support the following events:
  54 .Bl -tag -width indent
  55 .It Li CYCLE
  56 .Pq Event 0, Counter 0/1
  57 Total number of cycles.
  58 The performance counters are clocked by the
  59 top-level gated clock.
  60 If the core is built with that clock gater
  61 present, none of the counters will increment while the clock is
  62 stopped - due to a WAIT instruction.
  63 .It Li INSTR_EXECUTED
  64 .Pq Event 1, Counter 0/1
  65 Total number of instructions completed.
  66 .It Li BRANCH_COMPLETED
  67 .Pq Event 2, Counter 0
  68 Total number of branch instructions completed.
  69 .It Li BRANCH_MISPRED
  70 .Pq Event 2, Counter 1
  71 Counts all branch instructions which completed, but were mispredicted.
  72 .It Li RETURN
  73 .Pq Event 3, Counter 0
  74 Counts all JR R31 instructions completed.
  75 .It Li RETURN_MISPRED
  76 .Pq Event 3, Counter 1
  77 Counts all JR $31 instructions which completed, used the RPS for a prediction, but were mispredicted.
  78 .It Li RETURN_NOT_31
  79 .Pq Event 4, Counter 0
  80 Counts all JR $xx (not $31) and JALR instructions (indirect jumps).
  81 .It Li RETURN_NOTPRED
  82 .Pq Event 4, Counter 1
  83 If RPS use is disabled, JR $31 will not be predicted.
  84 .It Li ITLB_ACCESS
  85 .Pq Event 5, Counter 0
  86 Counts ITLB accesses that are due to fetches showing up in the
  87 instruction fetch stage of the pipeline and which do not use a fixed
  88 mapping or are not in unmapped space.
  89 If an address is fetched twice from the pipe (as in the case of a
  90 cache miss), that instruction willcount as 2 ITLB accesses.
  91 Since each fetch gets us 2 instructions,there is one access marked per double
  92 word.
  93 .It Li ITLB_MISS
  94 .Pq Event 5, Counter 1
  95 Counts all misses in the ITLB except ones that are on the back of another
  96 miss.
  97 We cannot process back to back misses and thus those are
  98 ignored.
  99 They are also ignored if there is some form of address error.
 100 .It Li DTLB_ACCESS
 101 .Pq Event 6, Counter 0
 102 Counts DTLB access including those in unmapped address spaces.
 103 .It Li DTLB_MISS
 104 .Pq Event 6, Counter 1
 105 Counts DTLB misses.
 106 Back to back misses that result in only one DTLB
 107 entry getting refilled are counted as a single miss.
 108 .It Li JTLB_IACCESS
 109 .Pq Event 7, Counter 0
 110 Instruction JTLB accesses are counted exactly the same as ITLB misses.
 111 .It Li JTLB_IMISS
 112 .Pq Event 7, Counter 1
 113 Counts instruction JTLB accesses that result in no match or a match on
 114 an invalid translation.
 115 .It Li JTLB_DACCESS
 116 .Pq Event 8, Counter 0
 117 Data JTLB accesses.
 118 .It Li JTLB_DMISS
 119 .Pq Event 8, Counter 1
 120 Counts data JTLB accesses that result in no match or a match on an invalid translation.
 121 .It Li IC_FETCH
 122 .Pq Event 9, Counter 0
 123 Counts every time the instruction cache is accessed.
 124 All replays,
 125 wasted fetches etc. are counted.
 126 For example, following a branch, even though the prediction is taken,
 127 the fall through access is counted.
 128 .It Li IC_MISS
 129 .Pq Event 9, Counter 1
 130 Counts all instruction cache misses that result in a bus request.
 131 .It Li DC_LOADSTORE
 132 .Pq Event 10, Counter 0
 133 Counts cached loads and stores.
 134 .It Li DC_WRITEBACK
 135 .Pq Event 10, Counter 1
 136 Counts cache lines written back to memory due to replacement or cacheops.
 137 .It Li DC_MISS
 138 .Pq Event 11,   Counter 0/1
 139 Counts loads and stores that miss in the cache
 140 .It Li LOAD_MISS
 141 .Pq Event 13, Counter 0
 142 Counts number of cacheable loads that miss in the cache.
 143 .It Li STORE_MISS
 144 .Pq Event 13, Counter 1
 145 Counts number of cacheable stores that miss in the cache.
 146 .It Li INTEGER_COMPLETED
 147 .Pq Event 14, Counter 0
 148 Non-floating point, non-Coprocessor 2 instructions.
 149 .It Li FP_COMPLETED
 150 .Pq Event 14, Counter 1
 151 Floating point instructions completed.
 152 .It Li LOAD_COMPLETED
 153 .Pq Event 15, Counter 0
 154 Integer and co-processor loads completed.
 155 .It Li STORE_COMPLETED
 156 .Pq Event 15, Counter 1
 157 Integer and co-processor stores completed.
 158 .It Li BARRIER_COMPLETED
 159 .Pq Event 16, Counter 0
 160 Direct jump (and link) instructions completed.
 161 .It Li MIPS16_COMPLETED
 162 .Pq Event 16, Counter 1
 163 MIPS16c instructions completed.
 164 .It Li NOP_COMPLETED
 165 .Pq Event 17, Counter 0
 166 NOPs completed.
 167 This includes all instructions that normally write to a general
 168 purpose register, but where the destination register was set to r0.
 169 .It Li INTEGER_MULDIV_COMPLETED
 170 .Pq Event 17, Counter 1
 171 Integer multiply and divide instructions completed.  (MULxx, DIVx, MADDx, MSUBx).
 172 .It Li RF_STALL
 173 .Pq Event 18, Counter 0
 174 Counts the total number of cycles where no instructions are issued
 175 from the IFU to ALU (the RF stage does not advance) which includes
 176 both of the previous two events.
 177 The RT_STALL is different than the sum of them though because cycles
 178 when both stalls are active will only be counted once.
 179 .It Li INSTR_REFETCH
 180 .Pq Event 18, Counter 1
 181 replay traps (other than uTLB)
 182 .It Li STORE_COND_COMPLETED
 183 .Pq Event 19, Counter 0
 184 Conditional stores completed.
 185 Counts all events, including failed stores.
 186 .It Li STORE_COND_FAILED
 187 .Pq Event 19, Counter 1
 188 Conditional store instruction that did not update memory.
 189 Note: While this event and the SC instruction count event can be configured to
 190 count in specific operating modes, the timing of the events is much
 191 different and the observed operating mode could change between them,
 192 causing some inaccuracy in the measured ratio.
 193 .It Li ICACHE_REQUESTS
 194 .Pq Event 20, Counter 0
 195 Note that this only counts PREFs that are actually attempted.
 196 PREFs to uncached addresses or ones with translation errors are not counted
 197 .It Li ICACHE_HIT
 198 .Pq Event 20, Counter 1
 199 Counts PREF instructions that hit in the cache
 200 .It Li L2_WRITEBACK
 201 .Pq Event 21, Counter 0
 202 Counts cache lines written back to memory due to replacement or cacheops.
 203 .It Li L2_ACCESS
 204 .Pq Event 21, Counter 1
 205 Number of accesses to L2 Cache.
 206 .It Li L2_MISS
 207 .Pq Event 22, Counter 0
 208 Number of accesses that missed in the L2 cache.
 209 .It Li L2_ERR_CORRECTED
 210 .Pq Event 22, Counter 1
 211 Single bit errors in L2 Cache that were detected and corrected.
 212 .It Li EXCEPTIONS
 213 .Pq Event 23, Counter 0
 214 Any type of exception taken.
 215 .It Li RF_CYCLES_STALLED
 216 .Pq Event 24, Counter 0
 217 Counts cycles where the LSU is in fixup and cannot accept a new
 218 instruction from the ALU.
 219 Fixups are replays within the LSU that occur when an instruction needs
 220 to re-access the cache or the DTLB.
 221 .It Li IFU_CYCLES_STALLED
 222 .Pq Event 25, Counter 0
 223 Counts the number of cycles where the fetch unit is not providing a
 224 valid instruction to the ALU.
 225 .It Li ALU_CYCLES_STALLED
 226 .Pq Event 25, Counter 1
 227 Counts the number of cycles where the ALU pipeline cannot advance.
 228 .It Li UNCACHED_LOAD
 229 .Pq Event 33, Counter 0
 230 Counts uncached and uncached accelerated loads.
 231 .It Li UNCACHED_STORE
 232 .Pq Event 33, Counter 1
 233 Counts uncached and uncached accelerated stores.
 234 .It Li CP2_REG_TO_REG_COMPLETED
 235 .Pq Event 35, Counter 0
 236 Co-processor 2 register to register instructions completed.
 237 .It Li MFTC_COMPLETED
 238 .Pq Event 35, Counter 1
 239 Co-processor 2 move to and from instructions as well as loads and stores.
 240 .It Li IC_BLOCKED_CYCLES
 241 .Pq Event 37, Counter 0
 242 Cycles when IFU stalls because an instruction miss caused the IFU not
 243 to have any runnable instructions.
 244 Ignores the stalls due to ITLB misses as well as the 4 cycles
 245 following a redirect.
 246 .It Li DC_BLOCKED_CYCLES
 247 .Pq Event 37, Counter 1
 248 Counts all cycles where integer pipeline waits on Load return data due
 249 to a D-cache miss.
 250 The LSU can signal a "long stall" on a D-cache misses, in which case
 251 the waiting TC might be rescheduled so other TCs can execute
 252 instructions till the data returns.
 253 .It Li L2_IMISS_STALL_CYCLES
 254 .Pq Event 38, Counter 0
 255 Cycles where the main pipeline is stalled waiting for a SYNC to complete.
 256 .It Li L2_DMISS_STALL_CYCLES
 257 .Pq Event 38, Counter 1
 258 Cycles where the main pipeline is stalled because of an index conflict
 259 in the Fill Store Buffer.
 260 .It Li DMISS_CYCLES
 261 .Pq Event 39, Counter 0
 262 Data miss is outstanding, but not necessarily stalling the pipeline.
 263 The difference between this and D$ miss stall cycles can show the gain
 264 from non-blocking cache misses.
 265 .It Li L2_MISS_CYCLES
 266 .Pq Event 39, Counter 1
 267 L2 miss is outstanding, but not necessarily stalling the pipeline.
 268 .It Li UNCACHED_BLOCK_CYCLES
 269 .Pq Event 40, Counter 0
 270 Cycles where the processor is stalled on an uncached fetch, load, or store.
 271 .It Li MDU_STALL_CYCLES
 272 .Pq Event 41, Counter 0
 273 Cycles where the processor is stalled on an uncached fetch, load, or store.
 274 .It Li FPU_STALL_CYCLES
 275 .Pq Event 41, Counter 1
 276 Counts all cycles where integer pipeline waits on FPU return data.
 277 .It Li CP2_STALL_CYCLES
 278 .Pq Event 42, Counter 0
 279 Counts all cycles where integer pipeline waits on CP2 return data.
 280 .It Li COREXTEND_STALL_CYCLES
 281 .Pq Event 42, Counter 1
 282 Counts all cycles where integer pipeline waits on CorExtend return data.
 283 .It Li ISPRAM_STALL_CYCLES
 284 .Pq Event 43, Counter 0
 285 Count all pipeline bubbles that are a result of multicycle ISPRAM
 286 access.
 287 Pipeline bubbles are defined as all cycles that IFU doesn't present an
 288 instruction to ALU.
 289 The four cycles after a redirect are not counted.
 290 .It Li DSPRAM_STALL_CYCLES
 291 .Pq Event 43, Counter 1
 292 Counts stall cycles created by an instruction waiting for access to DSPRAM.
 293 .It Li CACHE_STALL_CYCLES
 294 .Pq Event 44, Counter 0
 295 Counts all cycles the where pipeline is stalled due to CACHE
 296 instructions.
 297 Includes cycles where CACHE instructions themselves are
 298 stalled in the ALU, and cycles where CACHE instructions cause
 299 subsequent instructions to be stalled.
 300 .It Li LOAD_TO_USE_STALLS
 301 .Pq Event 45, Counter 0
 302 Counts all cycles where integer pipeline waits on Load return data.
 303 .It Li BASE_MISPRED_STALLS
 304 .Pq Event 45, Counter 1
 305 Counts stall cycles due to skewed ALU where the bypass to the address
 306 generation takes an extra cycle.
 307 .It Li CPO_READ_STALLS
 308 .Pq Event 46, Counter 0
 309 Counts all cycles where integer pipeline waits on return data from
 310 MFC0, RDHWR instructions.
 311 .It Li BRANCH_MISPRED_CYCLES
 312 .Pq Event 46, Counter 1
 313 This counts the number of cycles from a mispredicted branch until the
 314 next non-delay slot instruction executes.
 315 .It Li IFETCH_BUFFER_FULL
 316 .Pq Event 48, Counter 0
 317 Counts the number of times an instruction cache miss was detected, but
 318 both fill buffers were already allocated.
 319 .It Li FETCH_BUFFER_ALLOCATED
 320 .Pq Event 48, Counter 1
 321 Number of cycles where at least one of the IFU fill buffers is
 322 allocated (miss pending).
 323 .It Li EJTAG_ITRIGGER
 324 .Pq Event 49, Counter 0
 325 Number of times an EJTAG Instruction Trigger Point condition matched.
 326 .It Li EJTAG_DTRIGGER
 327 .Pq Event 49, Counter 1
 328 Number of times an EJTAG Data Trigger Point condition matched.
 329 .It Li FSB_LT_QUARTER
 330 .Pq Event 50, Counter 0
 331 Fill store buffer less than one quarter full.
 332 .It Li FSB_QUARTER_TO_HALF
 333 .Pq Event 50, Counter 1
 334 Fill store buffer between one quarter and one half full.
 335 .It Li FSB_GT_HALF
 336 .Pq Event 51, Counter 0
 337 Fill store buffer more than half full.
 338 .It Li FSB_FULL_PIPELINE_STALLS
 339 .Pq Event 51, Counter 1
 340 Cycles where the pipeline is stalled because the Fill-Store Buffer in LSU is full.
 341 .It Li LDQ_LT_QUARTER
 342 .Pq Event 52, Counter 0
 343 Load data queue less than one quarter full.
 344 .It Li LDQ_QUARTER_TO_HALF
 345 .Pq Event 52, Counter 1
 346 Load data queue between one quarter and one half full.
 347 .It Li LDQ_GT_HALF
 348 .Pq Event 53, Counter 0
 349 Load data queue more than one half full.
 350 .It Li LDQ_FULL_PIPELINE_STALLS
 351 .Pq Event 53, Counter 1
 352 Cycles where the pipeline is stalled because the Load Data Queue in the LSU is full.
 353 .It Li WBB_LT_QUARTER
 354 .Pq Event 54, Counter 0
 355 Write back buffer less than one quarter full.
 356 .It Li WBB_QUARTER_TO_HALF
 357 .Pq Event 54, Counter 1
 358 Write back buffer between one quarter and one half full.
 359 .It Li WBB_GT_HALF
 360 .Pq Event 55, Counter 0
 361 Write back buffer more than one half full.
 362 .It Li WBB_FULL_PIPELINE_STALLS
 363 .Pq Event 55 Counter 1
 364 Cycles where the pipeline is stalled because the Load Data Queue in the LSU is full.
 365 .It Li REQUEST_LATENCY
 366 .Pq Event 61, Counter 0
 367 Measures latency from miss detection until critical dword of response
 368 is returned, Only counts for cacheable reads.
 369 .It Li REQUEST_COUNT
 370 .Pq Event 61, Counter 1
 371 Counts number of cacheable read requests used for previous latency counter.
 372 .El
 373 .Ss Event Name Aliases
 374 The following table shows the mapping between the PMC-independent
 375 aliases supported by
 376 .Lb libpmc
 377 and the underlying hardware events used.
 378 .Bl -column "branch-mispredicts" "cpu_clk_unhalted.core_p"
 379 .It Em Alias Ta Em Event
 380 .It Li instructions Ta Li INSTR_EXECUTED
 381 .It Li branches Ta Li BRANCH_COMPLETED
 382 .It Li branch-mispredicts Ta Li BRANCH_MISPRED
 383 .El
 384 .Sh SEE ALSO
 385 .Xr pmc 3 ,
 386 .Xr pmc.atom 3 ,
 387 .Xr pmc.core 3 ,
 388 .Xr pmc.iaf 3 ,
 389 .Xr pmc.k7 3 ,
 390 .Xr pmc.k8 3 ,
 391 .Xr pmc.octeon 3 ,
 392 .Xr pmc.p4 3 ,
 393 .Xr pmc.p5 3 ,
 394 .Xr pmc.p6 3 ,
 395 .Xr pmc.soft 3 ,
 396 .Xr pmc.tsc 3 ,
 397 .Xr pmc_cpuinfo 3 ,
 398 .Xr pmclog 3 ,
 399 .Xr hwpmc 4
 400 .Sh HISTORY
 401 The
 402 .Nm pmc
 403 library first appeared in
 404 .Fx 6.0 .
 405 .Sh AUTHORS
 406 The
 407 .Lb libpmc
 408 library was written by
 409 .An "Joseph Koshy"
 410 .Aq jkoshy@FreeBSD.org .
 411 MIPS support was added by
 412 .An "George Neville-Neil"
 413 .Aq gnn@FreeBSD.org .