[Beowulf] itanium vs. x86-64
kyron at neuralbs.com
kyron at neuralbs.com
Tue Feb 10 07:09:59 PST 2009
>> Next caliper allows to get a lot of diagnostics from the cpu (also
>> because
>> ia64 supports all that while x86-64 does not AFAICT) like number of
>> bubbles
>> in the pipeline, L2-cache misses, clock-cycles per line of C-code etc.
>
> these are just the performance-counting MSR's, which are available
> on Opterons as well as Xeons too.
Even back to the PIII processors (and more?). Check out PAPI
(http://icl.cs.utk.edu/papi/) for more details but, as an example, here is
the output from an old cluster node:
eric at thinkbig1 ~ $ papi_avail -a
Available events and hardware information.
-------------------------------------------------------------------------
Vendor string and code : AuthenticAMD (2)
Model string and code : AMD K7 (9)
CPU Revision : 0.000000
CPU Megahertz : 2083.157959
CPU's in this Node : 1
Nodes in this System : 1
Total CPU's : 1
Number Hardware Counters : 4
Max Multiplex Counters : 32
-------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.
Name Derived Description (Mgr. Note)
PAPI_L1_DCM Yes Level 1 data cache misses
PAPI_L1_ICM No Level 1 instruction cache misses
PAPI_L2_DCM No Level 2 data cache misses
PAPI_L2_ICM No Level 2 instruction cache misses
PAPI_L1_TCM Yes Level 1 cache misses
PAPI_L2_TCM Yes Level 2 cache misses
PAPI_TLB_DM No Data translation lookaside buffer misses
PAPI_TLB_IM No Instruction translation lookaside buffer misses
PAPI_TLB_TL Yes Total translation lookaside buffer misses
PAPI_L1_LDM No Level 1 load misses
PAPI_L1_STM No Level 1 store misses
PAPI_L2_LDM No Level 2 load misses
PAPI_L2_STM No Level 2 store misses
PAPI_HW_INT No Hardware interrupts
PAPI_BR_UCN No Unconditional branch instructions
PAPI_BR_CN No Conditional branch instructions
PAPI_BR_TKN No Conditional branch instructions taken
PAPI_BR_NTK Yes Conditional branch instructions not taken
PAPI_BR_MSP No Conditional branch instructions mispredicted
PAPI_BR_PRC Yes Conditional branch instructions correctly predicted
PAPI_TOT_INS No Instructions completed
PAPI_BR_INS No Branch instructions
PAPI_RES_STL No Cycles stalled on any resource
PAPI_TOT_CYC No Total cycles
PAPI_L1_DCH Yes Level 1 data cache hits
PAPI_L2_DCH No Level 2 data cache hits
PAPI_L1_DCA No Level 1 data cache accesses
PAPI_L2_DCA Yes Level 2 data cache accesses
PAPI_L2_DCR No Level 2 data cache reads
PAPI_L2_DCW No Level 2 data cache writes
PAPI_L1_ICA No Level 1 instruction cache accesses
PAPI_L2_ICA No Level 2 instruction cache accesses
PAPI_L1_ICR No Level 1 instruction cache reads
PAPI_L1_TCA Yes Level 1 total cache accesses
-------------------------------------------------------------------------
avail.c PASSED
And from a newer cluster node. Note the addition of floating point metrics
now available:
eric at h2 ~ $ papi_avail -a
Available events and hardware information.
--------------------------------------------------------------------------------
Vendor string and code : GenuineIntel (1)
Model string and code : Intel Core 2 (18)
CPU Revision : 11.000000
CPU Megahertz : 2394.000000
CPU Clock Megahertz : 2394
CPU's in this Node : 4
Nodes in this System : 1
Total CPU's : 4
Number Hardware Counters : 5
Max Multiplex Counters : 32
--------------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.
Name Code Deriv Description (Note)
PAPI_L1_DCM 0x80000000 No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 Yes Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No Level 2 instruction cache misses
PAPI_L1_TCM 0x80000006 No Level 1 cache misses
PAPI_L2_TCM 0x80000007 No Level 2 cache misses
PAPI_CA_SHR 0x8000000a No Requests for exclusive access to shared
cache line
PAPI_CA_CLN 0x8000000b No Requests for exclusive access to clean cache
line
PAPI_CA_ITV 0x8000000d No Requests for cache line intervention
PAPI_TLB_DM 0x80000014 No Data translation lookaside buffer misses
PAPI_TLB_IM 0x80000015 No Instruction translation lookaside buffer misses
PAPI_L1_LDM 0x80000017 No Level 1 load misses
PAPI_L1_STM 0x80000018 No Level 1 store misses
PAPI_L2_LDM 0x80000019 Yes Level 2 load misses
PAPI_L2_STM 0x8000001a No Level 2 store misses
PAPI_HW_INT 0x80000029 No Hardware interrupts
PAPI_BR_CN 0x8000002b No Conditional branch instructions
PAPI_BR_TKN 0x8000002c No Conditional branch instructions taken
PAPI_BR_NTK 0x8000002d No Conditional branch instructions not taken
PAPI_BR_MSP 0x8000002e No Conditional branch instructions mispredicted
PAPI_BR_PRC 0x8000002f Yes Conditional branch instructions correctly
predicted
PAPI_TOT_IIS 0x80000031 No Instructions issued
PAPI_TOT_INS 0x80000032 No Instructions completed
PAPI_FP_INS 0x80000034 No Floating point instructions
PAPI_BR_INS 0x80000037 No Branch instructions
PAPI_VEC_INS 0x80000038 No Vector/SIMD instructions
PAPI_RES_STL 0x80000039 No Cycles stalled on any resource
PAPI_TOT_CYC 0x8000003b No Total cycles
PAPI_L1_DCH 0x8000003e Yes Level 1 data cache hits
PAPI_L1_DCA 0x80000040 No Level 1 data cache accesses
PAPI_L2_DCA 0x80000041 Yes Level 2 data cache accesses
PAPI_L2_DCR 0x80000044 No Level 2 data cache reads
PAPI_L2_DCW 0x80000047 No Level 2 data cache writes
PAPI_L1_ICH 0x80000049 Yes Level 1 instruction cache hits
PAPI_L2_ICH 0x8000004a Yes Level 2 instruction cache hits
PAPI_L1_ICA 0x8000004c No Level 1 instruction cache accesses
PAPI_L2_ICA 0x8000004d No Level 2 instruction cache accesses
PAPI_L2_TCH 0x80000056 Yes Level 2 total cache hits
PAPI_L1_TCA 0x80000058 Yes Level 1 total cache accesses
PAPI_L2_TCA 0x80000059 No Level 2 total cache accesses
PAPI_L2_TCR 0x8000005c Yes Level 2 total cache reads
PAPI_L2_TCW 0x8000005f No Level 2 total cache writes
PAPI_FML_INS 0x80000061 No Floating point multiply instructions
PAPI_FDV_INS 0x80000063 No Floating point divide instructions
PAPI_FP_OPS 0x80000066 No Floating point operations
-------------------------------------------------------------------------
Of 45 available events, 10 are derived.
avail.c PASSED
The limiting factor here is the number of available hardware counters (ie:
5 for the Q6600)...check out Blue Gene's table ;) :
http://www.nic.uoregon.edu/mediawiki-tau/index.php?title=Guide:BlueGene_PAPI_Counter_Analysis&printable=yes#PAPI_Events_Available_on_Blue_Gene
Eric
More information about the Beowulf
mailing list