Systems Architecture

Machine Instructions

A computer’s fundamental building blocks give it the capabilities to do simple operations like:

Add two numbers
Check if a number is zero
Copy data from one memory location to another

Layers of abstraction are built upon these basic machine instructions, leading to more human-friendly ways of describing logic, and ultimately build out complex systems, user interfaces, networking, etc… — everything that makes up a modern computer.

Six-level Architecture

While it’s not always a simple delineation, it’s handy to think of most computer architectures as being roughly split into the following levels:

Digital logic/hardware: Gates, registers, digital logic (AND, OR, etc…)
Microarchitecture: Specific memory registers, basic math operations
Instruction set: A standardized set of operations supported by varied implementations of the microarchitecture level and the hardware
Operating system: Orchestrate processes, manage memory and storage, I/O devices, and handle permissions
Assembly language: Converts assembly language to architecture-specific instruction set byte strings
Problem space/user code: Application logic written in high-level languages

“Any operation performed by software can also be built directly into the hardware, preferably after it is sufficiently well understood.” - Structured Computer Organization (pg. 8)

Operating systems date back to the 1960s, when it was designed to replace much of the tedium performed by human operators, like loading interpreters and programs from cards into memory. In addition to simply automating tasks, it became an abstraction layer on top of the ISA on which application logic could be built. This layer includes what we refer to as system calls. The operating system also introduced the capability for scheduling workloads, as well as remote access.

The microprogramming layer historically has been in flux due to growing and shrinking numbers of instructions, depending on whether or not it was more performant or cost-effective to implement those instructions in hardware. However it’s not entirely important how this microprogramming layer behaves for software engineers, we tend to care about the abstractions above this ISA.

Due to problems with heat dissipation, it’s been difficult for personal computer clock speeds to break the 4GHz threshold. Because of this, CPU manufacturers have invested in different ways to squeeze performance out of our chips, either by clever branch prediction or by packing multiple processing cores into a single CPU die. However, to take advantage of this, programs must be written to handle the problems of parallelism.

Digital Logic/Hardware Level

The Arithmetic Logic Unit is a multi-functional digital logic circuit comprised of rudimentary computational pieces like boolean logic gates and full adders. They are the primary workhorse of a microprocessor.

Instruction Set Architectures

x86

The x86 architecture refers to a class of compatible CPUs that stem from the 80386 and 80486 chips, successors to the Intel 8088. This is the predominant architecture found in personal computers and server systems.

ARM

ARM is chiefly used in mobile applications and newer video game consoles (low-power, mobile and embedded), however Apple’s M* chipset has notably pulled ARM into the personal computing space.

RISC-V

RISC-V is a relatively new open-source architecture with a rapidly growing ecosystem.

AVR

AVR is a common ISA used in microcontrollers, chiefly the ATMega (used in the Arduino).

MIPS

MIPS is mostly used for educational purposes.

Pipelines, Instruction- and Processor-level Parallelism

The five-stage pipeline has the following components:

Instruction fetch unit
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit

Processors can improve performance by using multiple pipelines or “superscalar” parallelism (multiple operations on different parts of the CPU). Instruction-level parallelism commonly only helps by a factor of 5-10, but processor-level parallelism pushes beyond that.

Additional Terminology

Superscalar: Single pipeline leads into many specialized instruction execution units that can operate in parallel. This way long-running memory fetches and stores don’t block operations.
SIMD (Single Instruction Stream Multiple Data Stream) is used by GPUs to run data in parallel through the same instruction over multiple processing units.
Vector processors are capable of operating on register arrays.
Multiprocessors are individual CPUs with shared and/or local memory.
Multicomputers are networked computer systems that compute large-scale problems together.
Direct Memory Access allows I/O devices to supply data directly to memory and trigger an interrupt.

Operating System

The operating system provides a layer of abstraction over the ISA level to (among other things) simplify tedious tasks like:

File I/O

A pretty grueling task through CPU instructions alone, the OS provides an opinionated abstraction layer over operations. The OS establishes a standardized way to assemble bytes

Virtual Memory and Paging

Direct memory management is another difficult problem, especially because operating systems allow timesharing between multiple processes which may compete for resources. Application code uses virtual memory which is mapped (with a combination of hardware and OS software) to physical memory, either through paging or segmentation. Either way, as memory becomes scare or multiple virtual memory locations collide on the same physical address, pages are swapped on and off of disk.

Note: When resources are taxed, frequent swaps can lead to a performance degradation called thrashing.

Process Synchronization and IPC

Threads
Message Queues
Shared Memory
Pipes

System Calls

Hardware (ALU, registers, memory, bus devices) is accessed by microcode and the instruction set. The operating system uses the instruction set to create a running metaprogram that orchaestrates user codes and all of the other things the OS needs to do. Commonly, the OS provides an API out to user space code to allow it to do some of these things when it’s necessary and permitted. Additionally, these OS-level abstractions make it much simpler to make syscalls, as there are complex constraints to how parameters are supplied to the kernel. UNIX includes a great deal of replacement helper functions in its C library that perform syscalls.

Common roles of system calls are:

Process
File
Device
Information
Communication
Protection

Kyle Edwards