Comparison of CPU microarchitectures

The following is a comparison of CPU microarchitectures.

Microarchitecture Year Pipeline stages Misc
Elbrus-8S 2014 VLIW, Elbrus (proprietary, closed) version 5, 64-bit
AMD K5 1996 5 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[a]
AMD K6 1997 6 Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming[b]
AMD K6-III 1999 Branch prediction, speculative execution, out-of-order execution[1]
AMD K7 1999 Out-of-order execution, branch prediction, Harvard architecture
AMD K8 2003 64-bit, integrated memory controller, 16 byte instruction prefetching
AMD K10 2007 Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching
ARM7TDMI (-S) 2001 3
ARM7EJ-S 2001 5
ARM810 5 static branch prediction, double-bandwidth memory
ARM9TDMI 1998 5
ARM1020E 6
XScale PXA210/PXA250 2002 7
ARM1136J(F)-S 8
ARM1156T2(F)-S 9
ARM Cortex-A5 8 Multi-core, single issue, in-order
ARM Cortex-A7 MPCore 8 Partial dual-issue, in-order, 2-way set associative level 1 instruction cache
ARM Cortex-A8 2005 13 Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode
ARM Cortex-A9 MPCore 2007 8–11 Out-of-order, speculative issue, superscalar
ARM Cortex-A15 MPCore 2010 15 Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar
ARM Cortex-A53 2012 Partial dual-issue, in-order
ARM Cortex-A55 2017 8 in-order, speculative execution
ARM Cortex-A57 2012 Deeply out-of-order, wide multi-issue, 3-way superscalar
ARM Cortex-A72 2015
ARM Cortex-A73 2016 Out-of-order superscalar
ARM Cortex-A75 2017 11–13 Out-of-order superscalar, speculative execution, register renaming, 3-way
ARM Cortex-A76 2018 13 Out-of-order superscalar, 4-way pipeline decode
ARM Cortex-A77 2019 13 Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache
ARM Cortex-A78 2020 13 Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache
ARM Cortex-A710 2021 10
ARM Cortex-X1 2020 13 5-wide decode out-of-order superscalar, L3 cache
ARM Cortex-X2 2021 10
ARM Cortex-X3 2022 9
ARM Cortex-X4 2023 10
AVR32 AP7 7
AVR32 UC3 3 Harvard architecture
Bobcat 2011 Out-of-order execution
Bulldozer 2011 20 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading[2]
Piledriver 2012 Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading,[2] up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core
Steamroller 2014 Multi-core, branch prediction
Excavator 2015 20 Multi-core
Zen 2017 19 Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache
Zen+ 2018 19 Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 2 2019 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 3 2020 19 Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache
Zen 4 2022 Multi-chip module, multi-core, superscalar, L3 cache
Crusoe 2000 In-order execution, 128-bit VLIW, integrated memory controller
Efficeon 2004 In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x86 1995 6[3] Branch prediction
Cyrix 6x86 1996 Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX 5
eSi-3200 5 In-order, speculative issue
eSi-3250 5 In-order, speculative issue
EV4 (Alpha 21064) Superscalar
EV7 (Alpha 21364) Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller
EV8 (Alpha 21464) Superscalar design with out-of-order execution
65k Ultra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock
P5 (Pentium) 1993 5 Superscalar
P6 (Pentium Pro) 14 Speculative execution, register renaming, superscalar design with out-of-order execution
P6 (Pentium II) 14[4] Branch prediction
P6 (Pentium III) 1995 14[4]
Intel Itanium "Merced" 2001 Single core, L3 cache
Intel Itanium 2 "McKinley" 2002 11[5] Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology
Intel NetBurst (Willamette) 2000 20 2-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order
NetBurst (Northwood) 2002 20 2-way simultaneous multithreading
NetBurst (Prescott) 2004 31 2-way simultaneous multithreading
NetBurst (Cedar Mill) 2006 31 2-way simultaneous multithreading
Intel Core 2006 12 Multi-core, out-of-order, 4-way superscalar
Intel Atom 16 2-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming
Intel Atom Oak Trail 2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache
Intel Atom Bonnell 2008 SMT
Intel Atom Silvermont 2013 Out-of-order execution
Intel Atom Goldmont 2016 Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache
Intel Atom Goldmont Plus 2017 Multi-core
Intel Atom Tremont 2019 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Gracemont 2021 Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Crestmont 2023 Multi-core
Intel Atom Skymont 2024 Multi-core
Nehalem 2008 14 2-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost
Sandy Bridge 2011 14 2-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost,
Intel Haswell 2013 14–19 SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme)
Broadwell 2014 14–19 Multi-core, multithreading
Skylake 2015 14–19 Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues.
Kaby Lake 2016 14–19 Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y),
Intel Sunny Cove 2019 14–20 Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue.
Intel Cypress Cove 2021 14 multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design.
Intel Willow Cove 2020 Multicore, SMT
Intel Golden Cove 2021 Multicore, SMT
Intel Redwood Cove 2023 Multicore, SMT
Intel Lion Cove 2024 Multicore, without SMT
Intel Xeon Phi 7120x 2013 7-stage integer, 6-stage vector Multi-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU
LatticeMico32 2006 6 Harvard architecture
Nvidia Denver 2014 Multicore, superscalar, 2-way decode, L2
Nvidia Carmel 2018 Multicore, 10-way superscalar, L3
POWER1 1990 Superscalar, out-of-order execution
POWER3 1998 Superscalar, out-of-order execution
POWER4 2001 Superscalar, speculative execution, out-of-order execution
POWER5 2004 2-way simultaneous multithreading, out-of-order execution, integrated memory controller
IBM POWER6 2007 2-way simultaneous multithreading, in-order execution, up to 5 GHz
IBM POWER7+ Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit
IBM POWER8 2013 15–23 Superscalar, L4 cache
IBM POWER9 2017 12–16 Superscalar, out-of-order execution, L4 cache
IBM Power10 2021 Superscalar
IBM Cell 2006 Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution
IBM Cyclops64 Multi-core, multithreading, 2 threads per core, in-order
IBM zEnterprise zEC12 2012 15/16/17 Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache
IBM A2 15 multicore, 4-way simultaneous multithreaded
PowerPC 401 1996 3
PowerPC 405 1998 5
PowerPC 440 1999 7
PowerPC 470 2009 9 Symmetric multiprocessing (SMP)
PowerPC e300 4 Superscalar, branch prediction
PowerPC e500 Dual 7 stage Multi-core
PowerPC e600 3-issue 7 stage Superscalar out-of-order execution, branch prediction
PowerPC e5500 2010 4-issue 7 stage Out-of-order, multi-core
PowerPC e6500 2012 Multi-core
PowerPC 603 4 5 execution units, branch prediction, no SMP
PowerPC 603q 1996 5 In-order
PowerPC 604 1994 6 Superscalar, out-of-order execution, 6 execution units, SMP support
PowerPC 620 1997 5 Out-of-order execution, SMP support
PWRficient PA6T 2007 Superscalar, out-of-order execution, 6 execution units
R4000 1991 8 Scalar
StrongARM SA-110 1996 5 Scalar, in-order
SuperH SH2 5
SuperH SH2A 2006 5 Superscalar, Harvard architecture
SPARC Superscalar
hyperSPARC 1993 Superscalar
SuperSPARC 1992 Superscalar, in-order
SPARC64 VI/VII/VII+ 2007 Superscalar, out-of-order[6]
UltraSPARC 1995 9
UltraSPARC T1 2005 6 Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU
UltraSPARC T2 2007 8 Open source, multithreading, multi-core, 8 threads per core
SPARC T3 2010 8 Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator
Oracle SPARC T4 2011 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator
Oracle Corporation SPARC T5 2013 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator
Oracle SPARC M5 16 Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator
Fujitsu SPARC64 X Multithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features
Imagination Technologies MIPS Warrior
VIA C7 2005 In-order execution
VIA Nano (Isaiah) 2008 Superscalar out-of-order execution, branch prediction, 7 execution units
WinChip 1997 4 In-order execution

See also

edit

Notes

edit
  1. ^ According to AMDs K5 data sheet. The design incorporates many ideas and functional parts from AMDs Am29000 32-bit RISC microprocessor design.
  2. ^ According to AMDs K6 data sheet. The design is based on NexGen's Nx686 and therefore not a direct successor to the K5.

References

edit
  1. ^ "Products We Design". amd.com. Retrieved 19 January 2014.
  2. ^ a b "wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer". cdn3.wccftech.com. Archived from the original on 17 October 2013. Retrieved 19 January 2014.
  3. ^ "Cyrix 5x86 ("M1sc")". pcguide.com. Retrieved 19 January 2014.
  4. ^ a b "Computer Science 246: Computer Architecture" (PDF). Harvard University. Archived from the original (PDF) on 24 December 2013. Retrieved 23 December 2013. P6 pipeline
  5. ^ Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. http://www.intel.com/design/itanium2/manuals/25110901.pdf (2002) Retrieved 28 November 2011
  6. ^ "Multi Core Processor SPARC64 Series : Fujitsu Global". fujitsu.com. Retrieved 19 January 2014.