Teaching

Our comprehensive curriculum covers fundamental circuit theory to advanced integrated circuit design, preparing students for cutting-edge research and industry applications in analog IC design and AI systems.

Graduate Courses

  • RF Integrated Circuit (RFIC) Design 2019 ~ 2024
  • Memory Interface Design 2025

Undergraduate Courses

  • Analog Circuit Design/Lab (AC) Fall 2019 ~ Present
  • Electric Circuit 1, 2 (EC) 2010 ~ Present
  • Circuit Theory (CT) 2017 ~ 2023

Capstone Design Topics

Click a topic below to explore architecture slides, key specs, and a step-by-step design guide for each capstone project.

>9600 MT/sMax Data Rate
DDR5RCD05Latest JEDEC
Dual-ChSub-Channel
DFE 8-TapCA Equalization
7-bit DDRCA Input
14-bit SDRCA Output
I3C 12.5MHzSideband
1.1V PODLLow Power
>9600 MT/sMax Data Rate
DDR5RCD05Latest JEDEC
Dual-ChSub-Channel
DFE 8-TapCA Equalization
7-bit DDRCA Input
14-bit SDRCA Output
I3C 12.5MHzSideband
1.1V PODLLow Power
CD — Clock Driver for DDR5 Memory Modules

The CD (Clock Driver) sits between the CPU and multiple DRAM chips on a DDR5 RDIMM/LRDIMM server memory module. It re-drives the address/command (CA) bus, chip-select signals, and clock — like an amplifier that boosts a signal before broadcasting it to many speakers. Features: dual independent channels, low-jitter PLL (Phase-Locked Loop), 8-tap DFE (Decision Feedback Equalizer, a circuit from your Electronics course) receiver, parity error checking, and I2C/I3C serial control bus. Silicon-proven at 4800–9600+ MT/s.

1 / 10
CA Bus Buffer & Re-Driver

Receives 7-bit DDR CA signals and re-drives them as 14-bit SDR CA ×2 to DRAM ranks. PODL signaling eliminates static (DC) power dissipation — unlike SSTL which always draws current.

CA Bus
Low-Jitter PLL & Clock

Provides 4 independent differential clock pairs per board side. Each DRAM group gets its own clock, minimizing clock skew and maximizing signal integrity.

Clock Driver
DFE-Enabled CA Receivers

8-tap DFE (Decision Feedback Equalizer, a classic feedback circuit from Electronic Circuits) with 1.5 mV step size — corrects inter-symbol interference (ISI) at 9600+ MT/s.

Signal Integrity
Parity & I3C Sideband

Per-channel parity checking with ALERT_n error flag. I3C serial bus (12.5 MHz) for register access and BMC communication.

Control Bus

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
Why a Clock Driver? — Signal Fan-Out Basics

The CD chip sits between the CPU and multiple DRAM chips on a DDR5 RDIMM. Like a speaker amplifier, it re-drives the address/command (CA) bus and clock so many DRAMs can be driven at full speed.

Find the CD chip on an RDIMM photo; draw the CPU → CD → DRAM fan-out block diagram.
Explain why the CA bus is duplicated ×2 and how PODL eliminates static DC power (vs SSTL).
Signal fan-outRDIMM architecturePODL vs SSTL signaling
📖 Reference Keywords
DDR5 RDIMM clock driver fan-outPODL SSTL signaling comparisonRazavi buffer amplifier
Phase 2
PLL and DFE Receiver Design

Inside the CD: a PLL synthesizes multiple DRAM clocks from the CPU reference; an 8-tap DFE receiver corrects ISI on the high-speed CA bus.

Trace the PLL loop: Phase Detector → RC Loop Filter → VCO → Divider → back. Simulate in LTspice.
Build a 2-tap DFE in MATLAB: show how subtracting h1×bit[n-1] opens the eye at 9600 MT/s.
PLL Phase-Locked LoopVCODFE equalizationISI inter-symbol interference
📖 Reference Keywords
PLL design Razavi MicroelectronicsDFE decision feedback equalizer high-speedLTspice PLL simulation
Phase 3
FSM Control and I2C/I3C Serial Bus

An FSM decodes commands arriving over I2C/I3C (2-wire serial bus), updates control registers, and asserts ALERT_n on parity errors.

Write a 4-state Verilog FSM (IDLE→RECEIVE→EXECUTE→DONE); simulate with ModelSim.
Implement a parity checker with XOR gates; inject an error and verify ALERT_n toggles.
FSM Finite State MachineI2C serial busParity check XORVerilog HDL
📖 Reference Keywords
Verilog FSM design tutorialI2C protocol waveform oscilloscopeparity checker Verilog
Phase 4
PCB Signal Integrity and Bring-Up

Validate signal quality on a real DDR5 board: measure the Eye Diagram, verify impedance matching (50 Ω), and confirm ZQ calibration and PLL lock time.

Capture an Eye Diagram on a DDR5 eval board; record eye height (mV) and width (ps).
Demonstrate reflection from a mismatched termination using Γ = (ZL−Z0)/(ZL+Z0).
Eye Diagram margin50 Ω impedance matchingZQ calibrationTransmission line reflection
📖 Reference Keywords
DDR5 eye diagram measurementtransmission line impedance matchingZQ calibration DDR5
Full Detail Page →
>9600 MT/sMax Transfer
DDR5DB01JEDEC Compliant
10 DBsPer LRDIMM
Dual 4-bitHost & DRAM
DFESignal Recovery
BCOMRCD Sideband
Si-ProvenSi-Proven
2× Capacityvs RDIMM
>9600 MT/sMax Transfer
DDR5DB01JEDEC Compliant
10 DBsPer LRDIMM
Dual 4-bitHost & DRAM
DFESignal Recovery
BCOMRCD Sideband
Si-ProvenSi-Proven
2× Capacityvs RDIMM
DB — Data Buffer for High-Capacity LRDIMM

The DB (Data Buffer) buffers the DQ (data) and DQS (data-strobe sampling clock) signals between the CPU memory controller and DRAM chips on an LRDIMM module. One DB connects a 4-bit host-side data bus to two ×4 DRAM chips, reducing the electrical load so the system can fit more DRAM chips — doubling memory capacity vs. standard RDIMM. Uses DFE equalization (a feedback circuit you study in Electronic Circuits), DQS clock regeneration, and BCOM serial control from the CD chip.

1 / 10
Host-Side DQ/DQS I/F

Two 4-bit bidirectional DQ (data) buses with DQS (data-strobe sampling clock) signals. VDD-terminated output drivers present a well-defined impedance to the memory controller — impedance matching from Circuit Theory.

Host I/F
DRAM-Side DQ/DQS I/F

Each DB drives two ×4 DRAM chips with a fresh 4-bit DQ interface. DQS (sampling clock) is regenerated — not just buffered — to restore clean clock edges for the DRAM. Capacitive load on the bus is drastically reduced.

DRAM I/F
Signal Recovery & DFE

DFE (Decision Feedback Equalizer) applied on both host-side and DRAM-side RX paths. Removes ISI caused by PCB trace inductance and capacitance — a direct application of your circuit theory filter concepts.

Signal Integrity
BCOM Control & ZQ Cal

BCOM (4-bit sideband bus from the CD chip) carries control commands to the DB: enter loopback, change impedance, power-down mode. Dedicated ZQ output impedance calibration and parity error alert.

Control

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
Why a Data Buffer? — Reducing Bus Load for 2× Capacity

On a standard RDIMM, every DQ (data) line connects directly to all DRAM chips, creating heavy capacitive load. The DB isolates this load, letting the system double the number of DRAM chips — doubling capacity.

Compare RDIMM vs LRDIMM: draw the DQ bus with 8 vs 20 DRAM loads; calculate RC bandwidth limit.
Understand DQS (data-strobe sampling clock): it toggles at each bit center so the receiver samples DQ at the right instant.
LRDIMM architectureDQ data bus loadingDQS data-strobe clock
📖 Reference Keywords
LRDIMM vs RDIMM memory moduleDDR5 DQ DQS signal rolebus capacitance bandwidth limit
Phase 2
Bidirectional TX/RX and DFE Design

The DB core is a bidirectional tristate buffer: it switches direction (Write: CPU→DRAM / Read: DRAM→CPU) based on the DQS preamble, and applies DFE to restore a clean eye.

Design a tristate buffer in Verilog; simulate Write vs Read mode switching on DQS edge.
Implement a 2-tap DFE in LTspice; show eye opening improvement at 9600 MT/s.
Bidirectional tristate bufferD flip-flop clocked by DQSDFE equalization
📖 Reference Keywords
DDR5 data buffer bidirectional designDFE eye diagram improvementLTspice tristate buffer
Phase 3
BCOM Decoder and ZQ Calibration

The DB receives 4-bit BCOM commands from the CD chip (BCK clock, BCS_n select). A decoder interprets each code — loopback mode, power-down, drive-strength change — and updates internal registers.

Implement a 4-to-16 command decoder + register file in Verilog; simulate BCOM write/read.
Code a ZQ binary-search in C: compare driver resistance to 240 Ω reference, converge in 7 steps.
4-to-16 decoderBCOM command busZQ binary searchRegister file
📖 Reference Keywords
BCOM protocol DDR5 data bufferbinary search ZQ calibrationVerilog register file design
Phase 4
Full LRDIMM System Verification

Run a Write-then-Read test across CD + 10 DBs + 20 DRAMs. Verify the Eye Diagram stays open after passing through a DB, and that latency (CL, tRCD, tRP) meets JEDEC spec.

Write a Verilog testbench: apply WRITE 0xDEADBEEF → READ back; verify data integrity end-to-end.
Measure Eye Diagram height and width; compare to JEDEC DDR5 minimum margins.
Eye Diagram height/width marginWrite-Read data integrityCL tRCD tRP latency
📖 Reference Keywords
LRDIMM integration testDDR5 eye diagram margin specVerilog testbench memory simulation
Full Detail Page →
RCD+DB+MCIntegrated BIST
PRBS-31Pattern Gen
Per-DQ/DQSPer-Lane Test
BER <10⁻¹²Target
Loop-BackDQ/DQS Path
2D Eye ShmooMargin Map
No ATENo ATE Cost
I3C ReportPass/Fail
RCD+DB+MCIntegrated BIST
PRBS-31Pattern Gen
Per-DQ/DQSPer-Lane Test
BER <10⁻¹²Target
Loop-BackDQ/DQS Path
2D Eye ShmooMargin Map
No ATENo ATE Cost
I3C ReportPass/Fail
BIST — Built-in Self-Test for Memory Buffers

BIST (Built-in Self-Test) integrates three test engines on a single chip: the CD BIST (tests the clock/address path), the DB BIST (tests each DQ data line and DQS strobe clock), and a MC (Memory Controller) exerciser. Instead of expensive ATE (Automatic Test Equipment worth millions of dollars), the chip tests itself at power-on: generating PRBS-31 pseudo-random patterns, counting bit errors (BER) per DQ data lane, sweeping voltage and timing to draw a 2D eye-opening map, and reporting pass/fail over I2C/I3C.

1 / 10
RCD BIST Engine

Clock/CA path self-test: PLL lock verification, parity checking, DCS/DCA training, ALERT_n assertion test, per-channel loopback — all without external test equipment.

Clock/CA Path
DB BIST Engine

Tests all 10 DB chips simultaneously via BCOM: sends PRBS (pseudo-random bit sequence) patterns through every DQ (data) line, verifies DFE equalization, ZQ impedance calibration, and signal recovery.

DQ/DQS Path
MC Protocol Exerciser

MC (Memory Controller) exerciser replays full JEDEC command sequences — ACTIVATE, READ, WRITE, PRECHARGE, REFRESH — with multi-rank interleaving and power state transitions, stressing the complete path.

Protocol
Eye Margin & BER Analyzer

Sweeps sampling voltage (VREF) and sampling time across a 2D grid for every DQ (data) lane — drawing an eye-opening "shmoo" map. BER (Bit Error Rate) threshold from 10⁻⁹ to 10⁻¹⁵; result reported via I2C/I3C.

Diagnostics

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
Why BIST? — On-Chip Self-Test Replaces $10 M ATE

ATE (Automatic Test Equipment) costs $2 M–$10 M. BIST moves the test logic onto the chip itself. At power-on, the chip runs three loopback segments — CD path, DB path, full path — and reports pass/fail over I2C/I3C.

Build a comparison table: ATE vs BIST (cost, coverage, test speed, field re-testability).
Draw the three loopback paths: ① CD CA/clock loop, ② DB DQ/DQS loop (10 DBs), ③ full CPU→CD→DB→DRAM round-trip.
BIST Built-in Self-TestLoopback pathATE Automatic Test Equipment
📖 Reference Keywords
BIST vs ATE memory testloopback test principleBER bit error rate measurement
Phase 2
PRBS Pattern Generator — LFSR with XOR Feedback

PRBS (Pseudo-Random Binary Sequence) is generated by an LFSR (Linear Feedback Shift Register): D flip-flops in a chain with XOR feedback. PRBS-31 produces 2³¹−1 bits before repeating — ideal for catching all bit-transition errors.

Draw a 7-bit LFSR schematic; verify it cycles through 2⁷−1 = 127 unique states.
Write PRBS-7 in Verilog; add a 32-bit error counter that increments on expected ≠ received.
PRBS pattern generatorLFSR Linear Feedback Shift RegisterBER error counter
📖 Reference Keywords
LFSR Verilog implementationPRBS self-synchronizing receiverBER counter circuit
Phase 3
2D Eye Shmoo — Mapping Voltage × Timing Margin

Sweep VREF (Y axis, DAC-controlled) and sampling time (X axis, delay-chain-controlled) across a 2D grid. At each point, measure BER. The "green" (pass) region forms the eye contour — wider = better design.

Connect a delay chain (inverter pairs, ~15 ps/step) to shift the sampling clock across ±UI/2.
Use an R-2R DAC (Circuit Theory) to sweep VREF; plot the 2D BER map in MATLAB.
Setup/Hold time violationDelay chain inverterVREF DAC R-2R ladder2D Eye Shmoo
📖 Reference Keywords
D flip-flop setup hold time metastabilityR-2R DAC circuit theoryeye shmoo measurement DDR
Phase 4
Full BIST Integration and Pass/Fail Reporting

Chain all BIST blocks into a top-level FSM: PLL Lock → CD Loopback → DB Loopback (×10) → Full Path → Report. Results written to a register file; MCU reads pass/fail via I2C.

Implement the 5-state FSM in Verilog with per-state timeout counters.
Write C firmware (Arduino/STM32): start BIST over I2C, poll DONE bit, print per-lane BER.
BIST top FSMI2C firmware MCUPass/Fail decision BER threshold
📖 Reference Keywords
memory BIST FSM designI2C C firmware MCUBER threshold pass fail criterion
Full Detail Page →
36 Gb/sPeak Bandwidth
18 Gb/sUni-directional
On-Chip BISTSelf-Test
CRC/ECCError Correction
-40~125°CIndustrial Temp
Si-ProvenSi-Proven
ChipletChiplet Ready
RDL RoutingPro-Grade
36 Gb/sPeak Bandwidth
18 Gb/sUni-directional
On-Chip BISTSelf-Test
CRC/ECCError Correction
-40~125°CIndustrial Temp
Si-ProvenSi-Proven
ChipletChiplet Ready
RDL RoutingPro-Grade
D2D — Die-to-Die High-Speed Link

A D2D (Die-to-Die) link connects two separate chips placed side-by-side on the same package substrate — like a very short, very fast cable between chips. Supports 18 Gb/s and 36 Gb/s using SerDes (Serializer/Deserializer: converts parallel data to serial and back). Features a forwarded clock (the transmitter sends the clock along with data, like RS-232 in your lab course), on-chip PRBS self-test, and optional CRC/ECC error correction. Validated over glass and organic interposers at −40°C to +125°C.

1 / 10
TX Circuit

High-speed TX with a low-jitter PLL (Phase-Locked Loop) and differential clock pair (like the differential amplifier in your Electronics course) for reliable data transmission at 18–36 Gb/s.

TX
RX Circuit

RX with automatic timing calibration (aligns the sampling clock to the eye center) and embedded BIST for end-of-line production testing without external ATE.

RX
BIST Circuit

Per-lane PRBS pattern generator and checker built in. Read/write leveling, lane-to-lane skew de-skew, and timing calibration all run on-chip automatically at power-on.

Self-Test
Clocking Circuit

TX sends a forwarded clock alongside data lanes (source-synchronous clocking — same concept as SPI/I2S protocols you know). Each channel has a programmable delay element for fine-grained skew compensation.

Clock

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
Chiplets and Die-to-Die Links — Why Connect Two Chips So Fast?

Modern CPUs (AMD, Intel) are built from multiple small chiplets on one package. A D2D link connects them at 18–36 Gb/s over a millimeter-scale distance — shorter than USB, so lower voltage swing and no need for heavy equalization.

Look up AMD Ryzen 9000 or Intel Meteor Lake die shot; identify compute die, I/O die, memory die.
Compare monolithic vs chiplet yield: large chip area → low yield → expensive. Chiplet = high yield per die.
Chiplet architecturePackage substrateSerDes Serializer/Deserializer
📖 Reference Keywords
chiplet D2D interface overviewAMD FOVEROS Intel tile die-to-dieSerDes high-speed principle
Phase 2
SerDes TX/RX — CML Driver and DLL Clock Alignment

The TX uses a CML (Current Mode Logic) differential pair driver (from Electronic Circuits). The RX uses a DLL (Delay-Locked Loop) to align the forwarded clock to the center of the received data eye.

Simulate a CML TX in LTspice: tail-current differential pair driving 50 Ω; observe the output eye.
Compare DLL vs PLL: DLL shifts an existing clock with a delay chain; PLL multiplies frequency with a VCO.
CML Current Mode LogicDLL Delay-Locked Loop50 Ω terminationLane skew deskew
📖 Reference Keywords
CML driver CMOS designDLL delay-locked loop principletransmission line termination reflection
Phase 3
PRBS BIST and CRC Error Detection

At power-on, the TX sends PRBS through a loopback back to the RX — any bit error flags a faulty link. CRC-8 (polynomial division) appends an 8-bit checksum to each packet; the receiver re-computes and compares.

Extend PRBS-7 Verilog to PRBS-31; instantiate per lane with loopback and error counter.
Implement CRC-8 with an XOR shift-register; flip 1 bit in simulation and verify detection.
PRBS loopback BISTCRC-8 error detectionRDL trace RC model
📖 Reference Keywords
CRC8 Verilog XOR shift registerPRBS D2D loopbackRC trace model low-pass filter
Phase 4
PVT Corner Validation (−40 °C to +125 °C)

MOSFET mobility μ decreases and threshold Vth drops with temperature. Run 9 PVT corners (3 temp × 3 VDD) in LTspice; verify the eye stays open at worst-case corner.

Set up a parametric sweep (.step temp −40 25 125) in LTspice on the CML TX; record eye height.
Compile a corner table: data rate, BER, power at each PVT corner for the final report.
PVT corner simulationMOSFET μ Vth vs temperaturePower supply sensitivity PSRR
📖 Reference Keywords
PVT corner simulation SPICEMOSFET temperature characteristicsSerDes power breakdown
Full Detail Page →
14.4 Gb/sLPDDR6 Max
LPDDR6JESD209-6 2025
DFE+FFE+CTLETriple EQ
WCK CKR 2:1Forwarded Clock
48-bit Sub-Channel
20% ↓ Powervs LPDDR5
ISSCC 2026ISSCC Validated
1.025V VDD2CSupply
14.4 Gb/sLPDDR6 Max
LPDDR6JESD209-6 2025
DFE+FFE+CTLETriple EQ
WCK CKR 2:1Forwarded Clock
48-bit Sub-Channel
20% ↓ Powervs LPDDR5
ISSCC 2026ISSCC Validated
1.025V VDD2CSupply
PHY — Physical Layer Interface IP

The PHY (Physical Layer) is the circuit block that physically drives and receives the electrical signals between the memory controller inside the CPU and the LPDDR6/DDR5 DRAM chip. Think of it as the final analog amplifier stage before the signal leaves the chip. Validated at ISSCC 2026 (the top IC design conference): 14.4 Gb/s per pin using three equalizer stages — FFE (pre-emphasis, from Circuit Theory: boost high freq before sending), CTLE (continuous-time linear equalizer, a passive RC filter), and DFE (decision-feedback: subtract the echo of past bits). 20% lower power than LPDDR5 using dynamic voltage/frequency scaling (DVFS).

1 / 10
TX Driver & FFE

Output driver with ZQ-calibrated impedance matching (50 Ω termination, from Circuit Theory transmission-line concepts) and 2-tap FFE pre-emphasis — boosts high-frequency content to pre-compensate for cable/PCB loss before transmission.

TX
RX & DFE/CTLE

Three-stage equalization chain: CTLE (passive RC high-pass filter from Circuit Theory) → FFE → DFE (4-tap feedback from Electronic Circuits). Per-pin independent DFE adapts to each lane's unique channel loss.

RX
DLL/PLL & WCK LDO

Shared WCK (Write Clock) LDO (Low-Dropout Regulator — a linear voltage regulator from Electronics) reduces clock jitter by 30% (measured at ISSCC 2026). WCK runs at 2× the data rate for precise DQ sampling.

Clock
Per-Bit Calibration

Per-bit Phase Interpolator (PI): interpolates between two clock phases to achieve sub-picosecond timing resolution. Full training suite: gate training, read/write leveling, VREF sweep, CA training, ZQ impedance calibration.

Calibration

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
What Is a PHY? — The Final Analog Stage to DRAM

The PHY is the last analog circuit inside the CPU that physically drives DQ (data) wires to DRAM at 14.4 Gb/s — one bit every 69 ps. It also samples incoming DQ using the DQS strobe clock and calibrates timing/impedance at every power-on.

Calculate: UI = 1 / 14.4 Gb/s = 69 ps. Express the required timing margin as % of UI.
Summarize the PHY's three jobs: ① TX drive, ② RX sample on DQS edge, ③ calibration (ZQ, VREF, leveling).
PHY Physical LayerUI Unit IntervalDFI DDR PHY InterfaceDQS strobe clock
📖 Reference Keywords
LPDDR6 PHY overviewDFI DDR PHY interface standardmemory PHY design introduction
Phase 2
TX FFE and RX DFE/CTLE Equalizer Design

PCB traces act as RC low-pass filters, smearing bits (ISI). The TX applies FFE (pre-emphasize the first edge); the RX applies CTLE (RC high-pass) then DFE (subtract predicted ISI of previous bits).

In LTspice, show how a 50 Ω / 1 pF RC load rounds a 10 Gb/s pulse; calculate f_3dB = 1/(2πRC).
Simulate a 2-tap FFE driver (pre-cursor + main tap); compare eye height with vs without FFE.
FFE Feed-Forward EqualizationDFE Decision Feedback EqualizerCTLE RC filterISI Inter-Symbol Interference
📖 Reference Keywords
FFE pre-emphasis high-speed linkDFE equalizer MATLAB simulationCTLE transfer function
Phase 3
Phase Interpolator and Training Sequence

A Phase Interpolator (PI) blends two quadrature clocks to produce any phase with ~1 ps resolution. The training sequence auto-corrects every DQ lane: gate training → write leveling → VREF sweep → ZQ calibration.

Derive PI output: α×CK_0° + (1−α)×CK_90°; draw the weighted current-mode diff-pair circuit.
Write VREF training pseudocode: binary search low/high until BER = 0; converges in 7 iterations.
Phase Interpolator PIWrite Leveling DQ alignmentVREF binary searchZQ impedance calibration
📖 Reference Keywords
phase interpolator current mode designDDR write leveling algorithmVREF binary search PHY training
Phase 4
Low-Power Design — DVFS and Clock Gating

LPDDR6 PHY saves 20% power vs LPDDR5. Key techniques: Clock Gating (cut clock to idle blocks), DVFS (scale VDD and frequency together: P = C × VDD² × f). Verify Eye Diagram at worst-case PVT meets JEDEC spec.

Implement clock gating in Verilog: assign gated_clk = clk & enable; explain why it cuts toggle power.
Calculate DVFS savings: if VDD 1.1→0.9 V and f 7.2→5.0 GHz, compute P_new/P_old using P = CfV².
Clock Gating ICG cellDVFS Dynamic Voltage Frequency ScalingP = CfV² power equation
📖 Reference Keywords
clock gating Verilog low powerDVFS dynamic voltage frequency scalingDDR PHY power breakdown
Full Detail Page →
>12.8 Gb/sHBM4E per pin
HBM3E8.4 Gb/s
2.5D / 3DPackage
TSVThrough-Si Via
RDL 2µmFine Pitch
BISTLoopback Test
Si-ProvenSi-Proven
AI / HPCTarget Platform
>12.8 Gb/sHBM4E per pin
HBM3E8.4 Gb/s
2.5D / 3DPackage
TSVThrough-Si Via
RDL 2µmFine Pitch
BISTLoopback Test
Si-ProvenSi-Proven
AI / HPCTarget Platform
HBM — High Bandwidth Memory Interface

AI HPC SoC 3D stacking (TSV-based) DRAM I/F. Si-Proven HBM3E Ctrl+PHY, TGV/TSV Micro-bump (solder ball, 55 µm pitch) 2.5D , 2µm/5µm/10µm RDL . HBM4E ( 12.8 Gb/s ) Target DVFS .

1 / 10
HBM4E / HBM4 Controller

HBM4E/4 MC (Memory Controller) maximizes data throughput with advanced scheduling. On-chip digital calibration and training with BIST loopback — verifies all 16 channels at power-on.

HBM4E Ctrl
HBM3E / HBM3 Controller

Silicon-proven HBM3E controller at 8.4 Gb/s per pin. Multi-channel (16 channels × 128-bit DQ data bus each) architecture with built-in BIST and full loopback self-test.

HBM3E Ctrl
HBM PHY & Interposer I/F

HBM PHY connects to the GPU through micro-bumps (tiny solder balls ~40 µm pitch) and RDL (Re-Distribution Layer: metal routing on the interposer, like PCB traces but at µm scale). Interposer uses TGV or pseudo-TSV for vertical connections.

PHY
2.5D/3D Package & RDL

Professional RDL routing at 2 µm, 5 µm, and 10 µm pitch options. Full SI/PI (Signal Integrity / Power Integrity) co-design: minimizes crosstalk between DQ data lanes and IR-drop on the power grid.

Package

📐 Project Roadmap — Step-by-Step Design Guide
Phase 1
Why HBM? — AI Needs >1 TB/s of Memory Bandwidth

Training an LLM requires reading billions of weight values per second. DDR5 delivers ~50 GB/s; HBM3E delivers >1 TB/s per stack by stacking 8–12 DRAM dies vertically with TSV (Through-Silicon Via) and placing the stack millimeters from the GPU.

Fill a comparison table: DDR5 vs HBM3E — bandwidth, capacity, power/GB/s, pin count, package distance.
Explain TSV: a vertical copper wire drilled through a silicon die (~5 µm diameter) — draw the 3D cross-section.
HBM High Bandwidth MemoryTSV Through-Silicon Via3D stackingAI memory bandwidth bottleneck
📖 Reference Keywords
HBM vs DDR5 bandwidth comparisonTSV 3D stacked memoryNVIDIA H100 HBM3 specification
Phase 2
HBM PHY — Ultra-Short-Reach High-Density Interface

HBM link distance is <2 mm, so voltage swing is only ~200 mV (vs 600 mV for DDR5 on PCB). The wide 128-bit DQ bus per channel is possible because routing stays on the interposer — no PCB trace fan-out.

Explain LVDS: two wires carry DQ and DQ_n; the differential receiver cancels common-mode noise (CMRR from Electronic Circuits).
Calculate micro-bump density at 55 µm pitch: (1000/55)² ≈ 330 bumps/mm². Compare to BGA (1 mm pitch).
LVDS Low-Voltage DifferentialMicro-bump 55 µm pitchRDL Re-Distribution LayerCMRR differential receiver
📖 Reference Keywords
HBM PHY micro-bump interposerLVDS differential signaling circuitRDL routing SI PI
Phase 3
Memory Controller — Scheduling, ECC, and BIST

The MC (Memory Controller) schedules READ/WRITE requests (Open-Page vs Closed-Page policy), corrects 1-bit errors with SECDED Hamming ECC, and self-tests with BIST patterns (Address Sweep, Checkerboard, March C).

Compare Open-Page vs Closed-Page scheduling: when does each give lower average latency?
Implement Hamming(72,64) ECC encoder/decoder in Verilog; flip 1 bit and verify auto-correction.
Open-Page vs Closed-Page schedulingSECDED Hamming ECCMemory BIST patternsThermal throttling
📖 Reference Keywords
memory controller scheduling policyHamming SECDED ECC Verilogmemory BIST march C pattern
Phase 4
2.5D Package and AI System Validation

The 2.5D package places GPU + HBM stacks on a silicon interposer. AI matrix-multiply operations are memory-bandwidth-limited (Roofline model). Verify bandwidth, power, and DVFS optimal operating point.

Draw the 2.5D cross-section: PCB BGA → C4 bumps → Silicon interposer → micro-bumps → HBM stack + GPU.
Use the Roofline model: compute arithmetic intensity (FLOP/byte) for GEMM at batch size 1 vs 1024 — show the memory-bound regime.
2.5D interposer packageRoofline model compute vs memory boundGEMM matrix multiply bandwidthDVFS optimal point
📖 Reference Keywords
2.5D package silicon interposer HBM GPURoofline model AI inferenceHBM DVFS power optimization
Full Detail Page →

Helpful Educational Videos

What is Integrated Circuits?

What is 5G HPC IoT?

Future 6G and AI Systems

Advanced Circuit Design Concepts