Multi-core processor

Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache
An Intel Core 2 Duo E6750 dual-core processor
An AMD Athlon X2 6400+ dual-core processor
An embedded system on a plug-in card with processor, memory, power supply, and external interfaces

Computer processor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions.

- Multi-core processor

497 related topics

Relevance

CPU cache

Hardware cache used by the central processing unit of a computer to reduce the average cost (time or energy) to access data from the main memory.

Motherboard of a NeXTcube computer (1990). At the lower edge of the image left from the middle, there is the CPU Motorola 68040 operated at 25 MHz with two separate level 1 caches of 4 KiB each on the chip, one for the instructions and one for data. The board has no external L2 cache.
An illustration of different ways in which memory locations can be cached by particular cache locations
Memory hierarchy of an AMD Bulldozer server
Cache hierarchy of the K8 core in the AMD Athlon 64 CPU.
Read path for a 2-way associative cache

Every core of a multi-core processor has a dedicated L1 cache and is usually not shared between the cores.

Superscalar processor

CPU that implements a form of parallelism called instruction-level parallelism within a single processor.

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed. (IF = instruction fetch, ID = instruction decode, EX = execute, MEM = memory access, WB = register write-back, i = instruction number, t = clock cycle [i.e. time])
Processor board of a CRAY T3e supercomputer with four superscalar Alpha 21164 processors

Each execution unit is not a separate processor (or a core if the processor is a multi-core processor), but an execution resource within a single CPU such as an arithmetic logic unit.

Software

Set of computer programs and associated documentation and data.

A diagram showing how the user interacts with application software on a typical desktop computer. The application software layer interfaces with the operating system, which in turn communicates with the hardware. The arrows indicate information flow.
Blender, a free software program

, most personal computers, smartphone devices and servers have processors with multiple execution units or multiple processors performing computation together, and computing has become a much more concurrent activity than in the past.

Symmetric multiprocessing

Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes.

Diagram of a symmetric multiprocessing system
Diagram of a typical SMP system. Three processors are connected to the same memory module through a system bus or crossbar switch

In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.

Heterogeneous computing

Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache

Heterogeneous computing refers to systems that use more than one kind of processor or cores.

Multiprocessing

Use of two or more central processing units within a single computer system.

EDVAC, one of the first stored-program computers

There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple dies in one package, multiple packages in one system unit, etc.).

Parallel computing

Type of computation in which many calculations or processes are carried out simultaneously.

IBM's Blue Gene/P massively parallel supercomputer
A graphical representation of Amdahl's law. The speedup of a program from parallelization is limited by how much of the program can be parallelized. For example, if 90% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 10 times no matter how many processors are used.
Assume that a task has two independent parts, A and B. Part B takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this only reduces the time for the whole computation by a little. In contrast, one may need to perform less work to make part A be twice as fast. This will make the computation much faster than by optimizing part B, even though part B's speedup is greater by ratio, (5 times versus 2 times).
Taiwania 3 of Taiwan, a parallel supercomputing device that joined COVID-19 research.
A canonical processor without pipeline. It takes five clock cycles to complete one instruction and thus the processor can issue subscalar performance.
A canonical five-stage pipelined processor. In the best case scenario, it takes one clock cycle to complete one instruction and thus the processor can issue scalar performance.
A canonical five-stage pipelined processor with two execution units. In the best case scenario, it takes one clock cycle to complete two instructions and thus the processor can issue superscalar performance.
A logical view of a non-uniform memory access (NUMA) architecture. Processors in one directory can access that directory's memory with less latency than they can access memory in the other directory's memory.
A Beowulf cluster
A cabinet from IBM's Blue Gene/L massively parallel supercomputer
Nvidia's Tesla GPGPU card
The Cray-1 is a vector processor
ILLIAC IV, "the most infamous of supercomputers"

As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Multithreading (computer architecture)

A process with two threads of execution, running on a single processor.

In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system.

Central processing unit

Electronic circuitry that executes instructions comprising a computer program.

EDVAC, one of the first stored-program computers
IBM PowerPC 604e processor
Fujitsu board with SPARC64 VIIIfx processors
CPU, core memory and external bus interface of a DEC PDP-8/I, made of medium-scale integrated circuits
Inside of laptop, with CPU removed from socket
Block diagram of a basic uniprocessor-CPU computer. Black lines indicate data flow, whereas red lines indicate control flow; arrows indicate flow directions.
Symbolic representation of an ALU and its input and output signals
A six-bit word containing the binary encoded representation of decimal value 40. Most modern CPUs employ word sizes that are a power of two, for example 8, 16, 32 or 64 bits.
Model of a subscalar CPU, in which it takes fifteen clock cycles to complete three instructions
Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per clock cycle.
A simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per clock cycle can be completed.

Microprocessor chips with multiple CPUs are multi-core processors.

Concurrent computing

Form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

Computer simulation, one of the main cross-computing methodologies.

In parallel computing, execution occurs at the same physical instant: for example, on separate processors of a multi-processor machine, with the goal of speeding up computations—parallel computing is impossible on a (one-core) single processor, as only one computation can occur at any instant (during any single clock cycle).