Teaching

Our comprehensive curriculum covers fundamental circuit theory to advanced integrated circuit design, preparing students for cutting-edge research and industry applications in analog IC design and AI systems.

Graduate Courses

RF Integrated Circuit (RFIC) Design 2019 ~ 2024
Memory Interface Design 2025

Undergraduate Courses

Analog Circuit Design/Lab (AC) Fall 2019 ~ Present
Electric Circuit 1, 2 (EC) 2010 ~ Present
Circuit Theory (CT) 2017 ~ 2023

Capstone Design Topics

Click a topic below to explore architecture slides, key specs, and a step-by-step design guide for each capstone project.

>9600 MT/sMax Data Rate

DDR5RCD05Latest JEDEC

Dual-ChSub-Channel

DFE 8-TapCA Equalization

7-bit DDRCA Input

14-bit SDRCA Output

I3C 12.5MHzSideband

1.1V PODLLow Power

>9600 MT/sMax Data Rate

DDR5RCD05Latest JEDEC

Dual-ChSub-Channel

DFE 8-TapCA Equalization

7-bit DDRCA Input

14-bit SDRCA Output

I3C 12.5MHzSideband

1.1V PODLLow Power

CD — Clock Driver for DDR5 Memory Modules

The CD (Clock Driver) sits between the CPU and multiple DRAM chips on a DDR5 RDIMM/LRDIMM server memory module. It re-drives the address/command (CA) bus, chip-select signals, and clock — like an amplifier that boosts a signal before broadcasting it to many speakers. Features: dual independent channels, low-jitter PLL (Phase-Locked Loop), 8-tap DFE (Decision Feedback Equalizer, a circuit from your Electronics course) receiver, parity error checking, and I2C/I3C serial control bus. Silicon-proven at 4800–9600+ MT/s.

1 / 10

⚡

CA Bus Buffer & Re-Driver

Receives 7-bit DDR CA signals and re-drives them as 14-bit SDR CA ×2 to DRAM ranks. PODL signaling eliminates static (DC) power dissipation — unlike SSTL which always draws current.

CA Bus

⚡

Low-Jitter PLL & Clock

Provides 4 independent differential clock pairs per board side. Each DRAM group gets its own clock, minimizing clock skew and maximizing signal integrity.

Clock Driver

⚡

DFE-Enabled CA Receivers

8-tap DFE (Decision Feedback Equalizer, a classic feedback circuit from Electronic Circuits) with 1.5 mV step size — corrects inter-symbol interference (ISI) at 9600+ MT/s.

Signal Integrity

⚡

Parity & I3C Sideband

Per-channel parity checking with ALERT_n error flag. I3C serial bus (12.5 MHz) for register access and BMC communication.

Control Bus

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

Why a Clock Driver? — Signal Fan-Out Basics

The CD chip sits between the CPU and multiple DRAM chips on a DDR5 RDIMM. Like a speaker amplifier, it re-drives the address/command (CA) bus and clock so many DRAMs can be driven at full speed.

Find the CD chip on an RDIMM photo; draw the CPU → CD → DRAM fan-out block diagram.

Explain why the CA bus is duplicated ×2 and how PODL eliminates static DC power (vs SSTL).

Signal fan-outRDIMM architecturePODL vs SSTL signaling

📖 Reference Keywords

DDR5 RDIMM clock driver fan-outPODL SSTL signaling comparisonRazavi buffer amplifier

Phase 2

PLL and DFE Receiver Design

Inside the CD: a PLL synthesizes multiple DRAM clocks from the CPU reference; an 8-tap DFE receiver corrects ISI on the high-speed CA bus.

Trace the PLL loop: Phase Detector → RC Loop Filter → VCO → Divider → back. Simulate in LTspice.

Build a 2-tap DFE in MATLAB: show how subtracting h1×bit[n-1] opens the eye at 9600 MT/s.

PLL Phase-Locked LoopVCODFE equalizationISI inter-symbol interference

📖 Reference Keywords

PLL design Razavi MicroelectronicsDFE decision feedback equalizer high-speedLTspice PLL simulation

Phase 3

FSM Control and I2C/I3C Serial Bus

An FSM decodes commands arriving over I2C/I3C (2-wire serial bus), updates control registers, and asserts ALERT_n on parity errors.

Write a 4-state Verilog FSM (IDLE→RECEIVE→EXECUTE→DONE); simulate with ModelSim.

Implement a parity checker with XOR gates; inject an error and verify ALERT_n toggles.

FSM Finite State MachineI2C serial busParity check XORVerilog HDL

📖 Reference Keywords

Verilog FSM design tutorialI2C protocol waveform oscilloscopeparity checker Verilog

Phase 4

PCB Signal Integrity and Bring-Up

Validate signal quality on a real DDR5 board: measure the Eye Diagram, verify impedance matching (50 Ω), and confirm ZQ calibration and PLL lock time.

Capture an Eye Diagram on a DDR5 eval board; record eye height (mV) and width (ps).

Demonstrate reflection from a mismatched termination using Γ = (ZL−Z0)/(ZL+Z0).

Eye Diagram margin50 Ω impedance matchingZQ calibrationTransmission line reflection

📖 Reference Keywords

DDR5 eye diagram measurementtransmission line impedance matchingZQ calibration DDR5

Full Detail Page →

>9600 MT/sMax Transfer

DDR5DB01JEDEC Compliant

10 DBsPer LRDIMM

Dual 4-bitHost & DRAM

DFESignal Recovery

BCOMRCD Sideband

Si-ProvenSi-Proven

2× Capacityvs RDIMM

>9600 MT/sMax Transfer

DDR5DB01JEDEC Compliant

10 DBsPer LRDIMM

Dual 4-bitHost & DRAM

DFESignal Recovery

BCOMRCD Sideband

Si-ProvenSi-Proven

2× Capacityvs RDIMM

DB — Data Buffer for High-Capacity LRDIMM

The DB (Data Buffer) buffers the DQ (data) and DQS (data-strobe sampling clock) signals between the CPU memory controller and DRAM chips on an LRDIMM module. One DB connects a 4-bit host-side data bus to two ×4 DRAM chips, reducing the electrical load so the system can fit more DRAM chips — doubling memory capacity vs. standard RDIMM. Uses DFE equalization (a feedback circuit you study in Electronic Circuits), DQS clock regeneration, and BCOM serial control from the CD chip.

1 / 10

⚡

Host-Side DQ/DQS I/F

Two 4-bit bidirectional DQ (data) buses with DQS (data-strobe sampling clock) signals. VDD-terminated output drivers present a well-defined impedance to the memory controller — impedance matching from Circuit Theory.

Host I/F

⚡

DRAM-Side DQ/DQS I/F

Each DB drives two ×4 DRAM chips with a fresh 4-bit DQ interface. DQS (sampling clock) is regenerated — not just buffered — to restore clean clock edges for the DRAM. Capacitive load on the bus is drastically reduced.

DRAM I/F

⚡

Signal Recovery & DFE

DFE (Decision Feedback Equalizer) applied on both host-side and DRAM-side RX paths. Removes ISI caused by PCB trace inductance and capacitance — a direct application of your circuit theory filter concepts.

Signal Integrity

⚡

BCOM Control & ZQ Cal

BCOM (4-bit sideband bus from the CD chip) carries control commands to the DB: enter loopback, change impedance, power-down mode. Dedicated ZQ output impedance calibration and parity error alert.

Control

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

Why a Data Buffer? — Reducing Bus Load for 2× Capacity

On a standard RDIMM, every DQ (data) line connects directly to all DRAM chips, creating heavy capacitive load. The DB isolates this load, letting the system double the number of DRAM chips — doubling capacity.

Compare RDIMM vs LRDIMM: draw the DQ bus with 8 vs 20 DRAM loads; calculate RC bandwidth limit.

Understand DQS (data-strobe sampling clock): it toggles at each bit center so the receiver samples DQ at the right instant.

LRDIMM architectureDQ data bus loadingDQS data-strobe clock

📖 Reference Keywords

LRDIMM vs RDIMM memory moduleDDR5 DQ DQS signal rolebus capacitance bandwidth limit

Phase 2

Bidirectional TX/RX and DFE Design

The DB core is a bidirectional tristate buffer: it switches direction (Write: CPU→DRAM / Read: DRAM→CPU) based on the DQS preamble, and applies DFE to restore a clean eye.

Design a tristate buffer in Verilog; simulate Write vs Read mode switching on DQS edge.

Implement a 2-tap DFE in LTspice; show eye opening improvement at 9600 MT/s.

Bidirectional tristate bufferD flip-flop clocked by DQSDFE equalization

📖 Reference Keywords

DDR5 data buffer bidirectional designDFE eye diagram improvementLTspice tristate buffer

Phase 3

BCOM Decoder and ZQ Calibration

The DB receives 4-bit BCOM commands from the CD chip (BCK clock, BCS_n select). A decoder interprets each code — loopback mode, power-down, drive-strength change — and updates internal registers.

Implement a 4-to-16 command decoder + register file in Verilog; simulate BCOM write/read.

Code a ZQ binary-search in C: compare driver resistance to 240 Ω reference, converge in 7 steps.

4-to-16 decoderBCOM command busZQ binary searchRegister file

📖 Reference Keywords

BCOM protocol DDR5 data bufferbinary search ZQ calibrationVerilog register file design

Phase 4

Full LRDIMM System Verification

Run a Write-then-Read test across CD + 10 DBs + 20 DRAMs. Verify the Eye Diagram stays open after passing through a DB, and that latency (CL, tRCD, tRP) meets JEDEC spec.

Write a Verilog testbench: apply WRITE 0xDEADBEEF → READ back; verify data integrity end-to-end.

Measure Eye Diagram height and width; compare to JEDEC DDR5 minimum margins.

Eye Diagram height/width marginWrite-Read data integrityCL tRCD tRP latency

📖 Reference Keywords

LRDIMM integration testDDR5 eye diagram margin specVerilog testbench memory simulation

Full Detail Page →

RCD+DB+MCIntegrated BIST

PRBS-31Pattern Gen

Per-DQ/DQSPer-Lane Test

BER <10⁻¹²Target

Loop-BackDQ/DQS Path

2D Eye ShmooMargin Map

No ATENo ATE Cost

I3C ReportPass/Fail

RCD+DB+MCIntegrated BIST

PRBS-31Pattern Gen

Per-DQ/DQSPer-Lane Test

BER <10⁻¹²Target

Loop-BackDQ/DQS Path

2D Eye ShmooMargin Map

No ATENo ATE Cost

I3C ReportPass/Fail

BIST — Built-in Self-Test for Memory Buffers

BIST (Built-in Self-Test) integrates three test engines on a single chip: the CD BIST (tests the clock/address path), the DB BIST (tests each DQ data line and DQS strobe clock), and a MC (Memory Controller) exerciser. Instead of expensive ATE (Automatic Test Equipment worth millions of dollars), the chip tests itself at power-on: generating PRBS-31 pseudo-random patterns, counting bit errors (BER) per DQ data lane, sweeping voltage and timing to draw a 2D eye-opening map, and reporting pass/fail over I2C/I3C.

1 / 10

⚡

RCD BIST Engine

Clock/CA path self-test: PLL lock verification, parity checking, DCS/DCA training, ALERT_n assertion test, per-channel loopback — all without external test equipment.

Clock/CA Path

⚡

DB BIST Engine

Tests all 10 DB chips simultaneously via BCOM: sends PRBS (pseudo-random bit sequence) patterns through every DQ (data) line, verifies DFE equalization, ZQ impedance calibration, and signal recovery.

DQ/DQS Path

⚡

MC Protocol Exerciser

MC (Memory Controller) exerciser replays full JEDEC command sequences — ACTIVATE, READ, WRITE, PRECHARGE, REFRESH — with multi-rank interleaving and power state transitions, stressing the complete path.

Protocol

⚡

Eye Margin & BER Analyzer

Sweeps sampling voltage (VREF) and sampling time across a 2D grid for every DQ (data) lane — drawing an eye-opening "shmoo" map. BER (Bit Error Rate) threshold from 10⁻⁹ to 10⁻¹⁵; result reported via I2C/I3C.

Diagnostics

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

Why BIST? — On-Chip Self-Test Replaces $10 M ATE

ATE (Automatic Test Equipment) costs $2 M–$10 M. BIST moves the test logic onto the chip itself. At power-on, the chip runs three loopback segments — CD path, DB path, full path — and reports pass/fail over I2C/I3C.

Build a comparison table: ATE vs BIST (cost, coverage, test speed, field re-testability).

Draw the three loopback paths: ① CD CA/clock loop, ② DB DQ/DQS loop (10 DBs), ③ full CPU→CD→DB→DRAM round-trip.

BIST Built-in Self-TestLoopback pathATE Automatic Test Equipment

📖 Reference Keywords

BIST vs ATE memory testloopback test principleBER bit error rate measurement

Phase 2

PRBS Pattern Generator — LFSR with XOR Feedback

PRBS (Pseudo-Random Binary Sequence) is generated by an LFSR (Linear Feedback Shift Register): D flip-flops in a chain with XOR feedback. PRBS-31 produces 2³¹−1 bits before repeating — ideal for catching all bit-transition errors.

Draw a 7-bit LFSR schematic; verify it cycles through 2⁷−1 = 127 unique states.

Write PRBS-7 in Verilog; add a 32-bit error counter that increments on expected ≠ received.

PRBS pattern generatorLFSR Linear Feedback Shift RegisterBER error counter

📖 Reference Keywords

LFSR Verilog implementationPRBS self-synchronizing receiverBER counter circuit

Phase 3

2D Eye Shmoo — Mapping Voltage × Timing Margin

Sweep VREF (Y axis, DAC-controlled) and sampling time (X axis, delay-chain-controlled) across a 2D grid. At each point, measure BER. The "green" (pass) region forms the eye contour — wider = better design.

Connect a delay chain (inverter pairs, ~15 ps/step) to shift the sampling clock across ±UI/2.

Use an R-2R DAC (Circuit Theory) to sweep VREF; plot the 2D BER map in MATLAB.

Setup/Hold time violationDelay chain inverterVREF DAC R-2R ladder2D Eye Shmoo

📖 Reference Keywords

D flip-flop setup hold time metastabilityR-2R DAC circuit theoryeye shmoo measurement DDR

Phase 4

Full BIST Integration and Pass/Fail Reporting

Chain all BIST blocks into a top-level FSM: PLL Lock → CD Loopback → DB Loopback (×10) → Full Path → Report. Results written to a register file; MCU reads pass/fail via I2C.

Implement the 5-state FSM in Verilog with per-state timeout counters.

Write C firmware (Arduino/STM32): start BIST over I2C, poll DONE bit, print per-lane BER.

BIST top FSMI2C firmware MCUPass/Fail decision BER threshold

📖 Reference Keywords

memory BIST FSM designI2C C firmware MCUBER threshold pass fail criterion

Full Detail Page →

36 Gb/sPeak Bandwidth

18 Gb/sUni-directional

On-Chip BISTSelf-Test

CRC/ECCError Correction

-40~125°CIndustrial Temp

Si-ProvenSi-Proven

ChipletChiplet Ready

RDL RoutingPro-Grade

36 Gb/sPeak Bandwidth

18 Gb/sUni-directional

On-Chip BISTSelf-Test

CRC/ECCError Correction

-40~125°CIndustrial Temp

Si-ProvenSi-Proven

ChipletChiplet Ready

RDL RoutingPro-Grade

D2D — Die-to-Die High-Speed Link

A D2D (Die-to-Die) link connects two separate chips placed side-by-side on the same package substrate — like a very short, very fast cable between chips. Supports 18 Gb/s and 36 Gb/s using SerDes (Serializer/Deserializer: converts parallel data to serial and back). Features a forwarded clock (the transmitter sends the clock along with data, like RS-232 in your lab course), on-chip PRBS self-test, and optional CRC/ECC error correction. Validated over glass and organic interposers at −40°C to +125°C.

1 / 10

⚡

TX Circuit

High-speed TX with a low-jitter PLL (Phase-Locked Loop) and differential clock pair (like the differential amplifier in your Electronics course) for reliable data transmission at 18–36 Gb/s.

⚡

RX Circuit

RX with automatic timing calibration (aligns the sampling clock to the eye center) and embedded BIST for end-of-line production testing without external ATE.

⚡

BIST Circuit

Per-lane PRBS pattern generator and checker built in. Read/write leveling, lane-to-lane skew de-skew, and timing calibration all run on-chip automatically at power-on.

Self-Test

⚡

Clocking Circuit

TX sends a forwarded clock alongside data lanes (source-synchronous clocking — same concept as SPI/I2S protocols you know). Each channel has a programmable delay element for fine-grained skew compensation.

Clock

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

Chiplets and Die-to-Die Links — Why Connect Two Chips So Fast?

Modern CPUs (AMD, Intel) are built from multiple small chiplets on one package. A D2D link connects them at 18–36 Gb/s over a millimeter-scale distance — shorter than USB, so lower voltage swing and no need for heavy equalization.

Look up AMD Ryzen 9000 or Intel Meteor Lake die shot; identify compute die, I/O die, memory die.

Compare monolithic vs chiplet yield: large chip area → low yield → expensive. Chiplet = high yield per die.

Chiplet architecturePackage substrateSerDes Serializer/Deserializer

📖 Reference Keywords

chiplet D2D interface overviewAMD FOVEROS Intel tile die-to-dieSerDes high-speed principle

Phase 2

SerDes TX/RX — CML Driver and DLL Clock Alignment

The TX uses a CML (Current Mode Logic) differential pair driver (from Electronic Circuits). The RX uses a DLL (Delay-Locked Loop) to align the forwarded clock to the center of the received data eye.

Simulate a CML TX in LTspice: tail-current differential pair driving 50 Ω; observe the output eye.

Compare DLL vs PLL: DLL shifts an existing clock with a delay chain; PLL multiplies frequency with a VCO.

CML Current Mode LogicDLL Delay-Locked Loop50 Ω terminationLane skew deskew

📖 Reference Keywords

CML driver CMOS designDLL delay-locked loop principletransmission line termination reflection

Phase 3

PRBS BIST and CRC Error Detection

At power-on, the TX sends PRBS through a loopback back to the RX — any bit error flags a faulty link. CRC-8 (polynomial division) appends an 8-bit checksum to each packet; the receiver re-computes and compares.

Extend PRBS-7 Verilog to PRBS-31; instantiate per lane with loopback and error counter.

Implement CRC-8 with an XOR shift-register; flip 1 bit in simulation and verify detection.

PRBS loopback BISTCRC-8 error detectionRDL trace RC model

📖 Reference Keywords

CRC8 Verilog XOR shift registerPRBS D2D loopbackRC trace model low-pass filter

Phase 4

PVT Corner Validation (−40 °C to +125 °C)

MOSFET mobility μ decreases and threshold Vth drops with temperature. Run 9 PVT corners (3 temp × 3 VDD) in LTspice; verify the eye stays open at worst-case corner.

Set up a parametric sweep (.step temp −40 25 125) in LTspice on the CML TX; record eye height.

Compile a corner table: data rate, BER, power at each PVT corner for the final report.

PVT corner simulationMOSFET μ Vth vs temperaturePower supply sensitivity PSRR

📖 Reference Keywords

PVT corner simulation SPICEMOSFET temperature characteristicsSerDes power breakdown

Full Detail Page →

14.4 Gb/sLPDDR6 Max

LPDDR6JESD209-6 2025

DFE+FFE+CTLETriple EQ

WCK CKR 2:1Forwarded Clock

48-bit Sub-Channel

20% ↓ Powervs LPDDR5

ISSCC 2026ISSCC Validated

1.025V VDD2CSupply

14.4 Gb/sLPDDR6 Max

LPDDR6JESD209-6 2025

DFE+FFE+CTLETriple EQ

WCK CKR 2:1Forwarded Clock

48-bit Sub-Channel

20% ↓ Powervs LPDDR5

ISSCC 2026ISSCC Validated

1.025V VDD2CSupply

PHY — Physical Layer Interface IP

The PHY (Physical Layer) is the circuit block that physically drives and receives the electrical signals between the memory controller inside the CPU and the LPDDR6/DDR5 DRAM chip. Think of it as the final analog amplifier stage before the signal leaves the chip. Validated at ISSCC 2026 (the top IC design conference): 14.4 Gb/s per pin using three equalizer stages — FFE (pre-emphasis, from Circuit Theory: boost high freq before sending), CTLE (continuous-time linear equalizer, a passive RC filter), and DFE (decision-feedback: subtract the echo of past bits). 20% lower power than LPDDR5 using dynamic voltage/frequency scaling (DVFS).

1 / 10

⚡

TX Driver & FFE

Output driver with ZQ-calibrated impedance matching (50 Ω termination, from Circuit Theory transmission-line concepts) and 2-tap FFE pre-emphasis — boosts high-frequency content to pre-compensate for cable/PCB loss before transmission.

⚡

RX & DFE/CTLE

Three-stage equalization chain: CTLE (passive RC high-pass filter from Circuit Theory) → FFE → DFE (4-tap feedback from Electronic Circuits). Per-pin independent DFE adapts to each lane's unique channel loss.

⚡

DLL/PLL & WCK LDO

Shared WCK (Write Clock) LDO (Low-Dropout Regulator — a linear voltage regulator from Electronics) reduces clock jitter by 30% (measured at ISSCC 2026). WCK runs at 2× the data rate for precise DQ sampling.

Clock

⚡

Per-Bit Calibration

Per-bit Phase Interpolator (PI): interpolates between two clock phases to achieve sub-picosecond timing resolution. Full training suite: gate training, read/write leveling, VREF sweep, CA training, ZQ impedance calibration.

Calibration

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

What Is a PHY? — The Final Analog Stage to DRAM

The PHY is the last analog circuit inside the CPU that physically drives DQ (data) wires to DRAM at 14.4 Gb/s — one bit every 69 ps. It also samples incoming DQ using the DQS strobe clock and calibrates timing/impedance at every power-on.

Calculate: UI = 1 / 14.4 Gb/s = 69 ps. Express the required timing margin as % of UI.

Summarize the PHY's three jobs: ① TX drive, ② RX sample on DQS edge, ③ calibration (ZQ, VREF, leveling).

PHY Physical LayerUI Unit IntervalDFI DDR PHY InterfaceDQS strobe clock

📖 Reference Keywords

LPDDR6 PHY overviewDFI DDR PHY interface standardmemory PHY design introduction

Phase 2

TX FFE and RX DFE/CTLE Equalizer Design

PCB traces act as RC low-pass filters, smearing bits (ISI). The TX applies FFE (pre-emphasize the first edge); the RX applies CTLE (RC high-pass) then DFE (subtract predicted ISI of previous bits).

In LTspice, show how a 50 Ω / 1 pF RC load rounds a 10 Gb/s pulse; calculate f_3dB = 1/(2πRC).

Simulate a 2-tap FFE driver (pre-cursor + main tap); compare eye height with vs without FFE.

FFE Feed-Forward EqualizationDFE Decision Feedback EqualizerCTLE RC filterISI Inter-Symbol Interference

📖 Reference Keywords

FFE pre-emphasis high-speed linkDFE equalizer MATLAB simulationCTLE transfer function

Phase 3

Phase Interpolator and Training Sequence

A Phase Interpolator (PI) blends two quadrature clocks to produce any phase with ~1 ps resolution. The training sequence auto-corrects every DQ lane: gate training → write leveling → VREF sweep → ZQ calibration.

Derive PI output: α×CK_0° + (1−α)×CK_90°; draw the weighted current-mode diff-pair circuit.

Write VREF training pseudocode: binary search low/high until BER = 0; converges in 7 iterations.

Phase Interpolator PIWrite Leveling DQ alignmentVREF binary searchZQ impedance calibration

📖 Reference Keywords

phase interpolator current mode designDDR write leveling algorithmVREF binary search PHY training

Phase 4

Low-Power Design — DVFS and Clock Gating

LPDDR6 PHY saves 20% power vs LPDDR5. Key techniques: Clock Gating (cut clock to idle blocks), DVFS (scale VDD and frequency together: P = C × VDD² × f). Verify Eye Diagram at worst-case PVT meets JEDEC spec.

Implement clock gating in Verilog: assign gated_clk = clk & enable; explain why it cuts toggle power.

Calculate DVFS savings: if VDD 1.1→0.9 V and f 7.2→5.0 GHz, compute P_new/P_old using P = CfV².

Clock Gating ICG cellDVFS Dynamic Voltage Frequency ScalingP = CfV² power equation

📖 Reference Keywords

clock gating Verilog low powerDVFS dynamic voltage frequency scalingDDR PHY power breakdown

Full Detail Page →

>12.8 Gb/sHBM4E per pin

HBM3E8.4 Gb/s

2.5D / 3DPackage

TSVThrough-Si Via

RDL 2µmFine Pitch

BISTLoopback Test

Si-ProvenSi-Proven

AI / HPCTarget Platform

>12.8 Gb/sHBM4E per pin

HBM3E8.4 Gb/s

2.5D / 3DPackage

TSVThrough-Si Via

RDL 2µmFine Pitch

BISTLoopback Test

Si-ProvenSi-Proven

AI / HPCTarget Platform

HBM — High Bandwidth Memory Interface

AI HPC SoC 3D stacking (TSV-based) DRAM I/F. Si-Proven HBM3E Ctrl+PHY, TGV/TSV Micro-bump (solder ball, 55 µm pitch) 2.5D , 2µm/5µm/10µm RDL . HBM4E ( 12.8 Gb/s ) Target DVFS .

1 / 10

⚡

HBM4E / HBM4 Controller

HBM4E/4 MC (Memory Controller) maximizes data throughput with advanced scheduling. On-chip digital calibration and training with BIST loopback — verifies all 16 channels at power-on.

HBM4E Ctrl

⚡

HBM3E / HBM3 Controller

Silicon-proven HBM3E controller at 8.4 Gb/s per pin. Multi-channel (16 channels × 128-bit DQ data bus each) architecture with built-in BIST and full loopback self-test.

HBM3E Ctrl

⚡

HBM PHY & Interposer I/F

HBM PHY connects to the GPU through micro-bumps (tiny solder balls ~40 µm pitch) and RDL (Re-Distribution Layer: metal routing on the interposer, like PCB traces but at µm scale). Interposer uses TGV or pseudo-TSV for vertical connections.

PHY

⚡

2.5D/3D Package & RDL

Professional RDL routing at 2 µm, 5 µm, and 10 µm pitch options. Full SI/PI (Signal Integrity / Power Integrity) co-design: minimizes crosstalk between DQ data lanes and IR-drop on the power grid.

Package

📐 Project Roadmap — Step-by-Step Design Guide

Phase 1

Why HBM? — AI Needs >1 TB/s of Memory Bandwidth

Training an LLM requires reading billions of weight values per second. DDR5 delivers ~50 GB/s; HBM3E delivers >1 TB/s per stack by stacking 8–12 DRAM dies vertically with TSV (Through-Silicon Via) and placing the stack millimeters from the GPU.

Fill a comparison table: DDR5 vs HBM3E — bandwidth, capacity, power/GB/s, pin count, package distance.

Explain TSV: a vertical copper wire drilled through a silicon die (~5 µm diameter) — draw the 3D cross-section.

HBM High Bandwidth MemoryTSV Through-Silicon Via3D stackingAI memory bandwidth bottleneck

📖 Reference Keywords

HBM vs DDR5 bandwidth comparisonTSV 3D stacked memoryNVIDIA H100 HBM3 specification

Phase 2

HBM PHY — Ultra-Short-Reach High-Density Interface

HBM link distance is <2 mm, so voltage swing is only ~200 mV (vs 600 mV for DDR5 on PCB). The wide 128-bit DQ bus per channel is possible because routing stays on the interposer — no PCB trace fan-out.

Explain LVDS: two wires carry DQ and DQ_n; the differential receiver cancels common-mode noise (CMRR from Electronic Circuits).

Calculate micro-bump density at 55 µm pitch: (1000/55)² ≈ 330 bumps/mm². Compare to BGA (1 mm pitch).

LVDS Low-Voltage DifferentialMicro-bump 55 µm pitchRDL Re-Distribution LayerCMRR differential receiver

📖 Reference Keywords

HBM PHY micro-bump interposerLVDS differential signaling circuitRDL routing SI PI

Phase 3

Memory Controller — Scheduling, ECC, and BIST

The MC (Memory Controller) schedules READ/WRITE requests (Open-Page vs Closed-Page policy), corrects 1-bit errors with SECDED Hamming ECC, and self-tests with BIST patterns (Address Sweep, Checkerboard, March C).

Compare Open-Page vs Closed-Page scheduling: when does each give lower average latency?

Implement Hamming(72,64) ECC encoder/decoder in Verilog; flip 1 bit and verify auto-correction.

Open-Page vs Closed-Page schedulingSECDED Hamming ECCMemory BIST patternsThermal throttling

📖 Reference Keywords

memory controller scheduling policyHamming SECDED ECC Verilogmemory BIST march C pattern

Phase 4

2.5D Package and AI System Validation

The 2.5D package places GPU + HBM stacks on a silicon interposer. AI matrix-multiply operations are memory-bandwidth-limited (Roofline model). Verify bandwidth, power, and DVFS optimal operating point.

Draw the 2.5D cross-section: PCB BGA → C4 bumps → Silicon interposer → micro-bumps → HBM stack + GPU.

Use the Roofline model: compute arithmetic intensity (FLOP/byte) for GEMM at batch size 1 vs 1024 — show the memory-bound regime.

2.5D interposer packageRoofline model compute vs memory boundGEMM matrix multiply bandwidthDVFS optimal point

📖 Reference Keywords

2.5D package silicon interposer HBM GPURoofline model AI inferenceHBM DVFS power optimization

Teaching

Graduate Courses

Undergraduate Courses

Capstone Design Topics

Helpful Educational Videos

What is Integrated Circuits?

What is 5G HPC IoT?

Future 6G and AI Systems

Advanced Circuit Design Concepts