Teaching
Our comprehensive curriculum covers fundamental circuit theory to advanced integrated circuit design, preparing students for cutting-edge research and industry applications in analog IC design and AI systems.
Graduate Courses
- RF Integrated Circuit (RFIC) Design 2019 ~ 2024
- Memory Interface Design 2025
Undergraduate Courses
- Analog Circuit Design/Lab (AC) Fall 2019 ~ Present
- Electric Circuit 1, 2 (EC) 2010 ~ Present
- Circuit Theory (CT) 2017 ~ 2023
Capstone Design Topics
Click a topic below to explore architecture slides, key specs, and a step-by-step design guide for each capstone project.
The CD (Clock Driver) sits between the CPU and multiple DRAM chips on a DDR5 RDIMM/LRDIMM server memory module. It re-drives the address/command (CA) bus, chip-select signals, and clock — like an amplifier that boosts a signal before broadcasting it to many speakers. Features: dual independent channels, low-jitter PLL (Phase-Locked Loop), 8-tap DFE (Decision Feedback Equalizer, a circuit from your Electronics course) receiver, parity error checking, and I2C/I3C serial control bus. Silicon-proven at 4800–9600+ MT/s.
Receives 7-bit DDR CA signals and re-drives them as 14-bit SDR CA ×2 to DRAM ranks. PODL signaling eliminates static (DC) power dissipation — unlike SSTL which always draws current.
CA BusProvides 4 independent differential clock pairs per board side. Each DRAM group gets its own clock, minimizing clock skew and maximizing signal integrity.
Clock Driver8-tap DFE (Decision Feedback Equalizer, a classic feedback circuit from Electronic Circuits) with 1.5 mV step size — corrects inter-symbol interference (ISI) at 9600+ MT/s.
Signal IntegrityPer-channel parity checking with ALERT_n error flag. I3C serial bus (12.5 MHz) for register access and BMC communication.
Control BusThe CD chip sits between the CPU and multiple DRAM chips on a DDR5 RDIMM. Like a speaker amplifier, it re-drives the address/command (CA) bus and clock so many DRAMs can be driven at full speed.
Inside the CD: a PLL synthesizes multiple DRAM clocks from the CPU reference; an 8-tap DFE receiver corrects ISI on the high-speed CA bus.
An FSM decodes commands arriving over I2C/I3C (2-wire serial bus), updates control registers, and asserts ALERT_n on parity errors.
Validate signal quality on a real DDR5 board: measure the Eye Diagram, verify impedance matching (50 Ω), and confirm ZQ calibration and PLL lock time.
The DB (Data Buffer) buffers the DQ (data) and DQS (data-strobe sampling clock) signals between the CPU memory controller and DRAM chips on an LRDIMM module. One DB connects a 4-bit host-side data bus to two ×4 DRAM chips, reducing the electrical load so the system can fit more DRAM chips — doubling memory capacity vs. standard RDIMM. Uses DFE equalization (a feedback circuit you study in Electronic Circuits), DQS clock regeneration, and BCOM serial control from the CD chip.
Two 4-bit bidirectional DQ (data) buses with DQS (data-strobe sampling clock) signals. VDD-terminated output drivers present a well-defined impedance to the memory controller — impedance matching from Circuit Theory.
Host I/FEach DB drives two ×4 DRAM chips with a fresh 4-bit DQ interface. DQS (sampling clock) is regenerated — not just buffered — to restore clean clock edges for the DRAM. Capacitive load on the bus is drastically reduced.
DRAM I/FDFE (Decision Feedback Equalizer) applied on both host-side and DRAM-side RX paths. Removes ISI caused by PCB trace inductance and capacitance — a direct application of your circuit theory filter concepts.
Signal IntegrityBCOM (4-bit sideband bus from the CD chip) carries control commands to the DB: enter loopback, change impedance, power-down mode. Dedicated ZQ output impedance calibration and parity error alert.
ControlOn a standard RDIMM, every DQ (data) line connects directly to all DRAM chips, creating heavy capacitive load. The DB isolates this load, letting the system double the number of DRAM chips — doubling capacity.
The DB core is a bidirectional tristate buffer: it switches direction (Write: CPU→DRAM / Read: DRAM→CPU) based on the DQS preamble, and applies DFE to restore a clean eye.
The DB receives 4-bit BCOM commands from the CD chip (BCK clock, BCS_n select). A decoder interprets each code — loopback mode, power-down, drive-strength change — and updates internal registers.
Run a Write-then-Read test across CD + 10 DBs + 20 DRAMs. Verify the Eye Diagram stays open after passing through a DB, and that latency (CL, tRCD, tRP) meets JEDEC spec.
BIST (Built-in Self-Test) integrates three test engines on a single chip: the CD BIST (tests the clock/address path), the DB BIST (tests each DQ data line and DQS strobe clock), and a MC (Memory Controller) exerciser. Instead of expensive ATE (Automatic Test Equipment worth millions of dollars), the chip tests itself at power-on: generating PRBS-31 pseudo-random patterns, counting bit errors (BER) per DQ data lane, sweeping voltage and timing to draw a 2D eye-opening map, and reporting pass/fail over I2C/I3C.
Clock/CA path self-test: PLL lock verification, parity checking, DCS/DCA training, ALERT_n assertion test, per-channel loopback — all without external test equipment.
Clock/CA PathTests all 10 DB chips simultaneously via BCOM: sends PRBS (pseudo-random bit sequence) patterns through every DQ (data) line, verifies DFE equalization, ZQ impedance calibration, and signal recovery.
DQ/DQS PathMC (Memory Controller) exerciser replays full JEDEC command sequences — ACTIVATE, READ, WRITE, PRECHARGE, REFRESH — with multi-rank interleaving and power state transitions, stressing the complete path.
ProtocolSweeps sampling voltage (VREF) and sampling time across a 2D grid for every DQ (data) lane — drawing an eye-opening "shmoo" map. BER (Bit Error Rate) threshold from 10⁻⁹ to 10⁻¹⁵; result reported via I2C/I3C.
DiagnosticsATE (Automatic Test Equipment) costs $2 M–$10 M. BIST moves the test logic onto the chip itself. At power-on, the chip runs three loopback segments — CD path, DB path, full path — and reports pass/fail over I2C/I3C.
PRBS (Pseudo-Random Binary Sequence) is generated by an LFSR (Linear Feedback Shift Register): D flip-flops in a chain with XOR feedback. PRBS-31 produces 2³¹−1 bits before repeating — ideal for catching all bit-transition errors.
Sweep VREF (Y axis, DAC-controlled) and sampling time (X axis, delay-chain-controlled) across a 2D grid. At each point, measure BER. The "green" (pass) region forms the eye contour — wider = better design.
Chain all BIST blocks into a top-level FSM: PLL Lock → CD Loopback → DB Loopback (×10) → Full Path → Report. Results written to a register file; MCU reads pass/fail via I2C.
A D2D (Die-to-Die) link connects two separate chips placed side-by-side on the same package substrate — like a very short, very fast cable between chips. Supports 18 Gb/s and 36 Gb/s using SerDes (Serializer/Deserializer: converts parallel data to serial and back). Features a forwarded clock (the transmitter sends the clock along with data, like RS-232 in your lab course), on-chip PRBS self-test, and optional CRC/ECC error correction. Validated over glass and organic interposers at −40°C to +125°C.
High-speed TX with a low-jitter PLL (Phase-Locked Loop) and differential clock pair (like the differential amplifier in your Electronics course) for reliable data transmission at 18–36 Gb/s.
TXRX with automatic timing calibration (aligns the sampling clock to the eye center) and embedded BIST for end-of-line production testing without external ATE.
RXPer-lane PRBS pattern generator and checker built in. Read/write leveling, lane-to-lane skew de-skew, and timing calibration all run on-chip automatically at power-on.
Self-TestTX sends a forwarded clock alongside data lanes (source-synchronous clocking — same concept as SPI/I2S protocols you know). Each channel has a programmable delay element for fine-grained skew compensation.
ClockModern CPUs (AMD, Intel) are built from multiple small chiplets on one package. A D2D link connects them at 18–36 Gb/s over a millimeter-scale distance — shorter than USB, so lower voltage swing and no need for heavy equalization.
The TX uses a CML (Current Mode Logic) differential pair driver (from Electronic Circuits). The RX uses a DLL (Delay-Locked Loop) to align the forwarded clock to the center of the received data eye.
At power-on, the TX sends PRBS through a loopback back to the RX — any bit error flags a faulty link. CRC-8 (polynomial division) appends an 8-bit checksum to each packet; the receiver re-computes and compares.
MOSFET mobility μ decreases and threshold Vth drops with temperature. Run 9 PVT corners (3 temp × 3 VDD) in LTspice; verify the eye stays open at worst-case corner.
The PHY (Physical Layer) is the circuit block that physically drives and receives the electrical signals between the memory controller inside the CPU and the LPDDR6/DDR5 DRAM chip. Think of it as the final analog amplifier stage before the signal leaves the chip. Validated at ISSCC 2026 (the top IC design conference): 14.4 Gb/s per pin using three equalizer stages — FFE (pre-emphasis, from Circuit Theory: boost high freq before sending), CTLE (continuous-time linear equalizer, a passive RC filter), and DFE (decision-feedback: subtract the echo of past bits). 20% lower power than LPDDR5 using dynamic voltage/frequency scaling (DVFS).
Output driver with ZQ-calibrated impedance matching (50 Ω termination, from Circuit Theory transmission-line concepts) and 2-tap FFE pre-emphasis — boosts high-frequency content to pre-compensate for cable/PCB loss before transmission.
TXThree-stage equalization chain: CTLE (passive RC high-pass filter from Circuit Theory) → FFE → DFE (4-tap feedback from Electronic Circuits). Per-pin independent DFE adapts to each lane's unique channel loss.
RXShared WCK (Write Clock) LDO (Low-Dropout Regulator — a linear voltage regulator from Electronics) reduces clock jitter by 30% (measured at ISSCC 2026). WCK runs at 2× the data rate for precise DQ sampling.
ClockPer-bit Phase Interpolator (PI): interpolates between two clock phases to achieve sub-picosecond timing resolution. Full training suite: gate training, read/write leveling, VREF sweep, CA training, ZQ impedance calibration.
CalibrationThe PHY is the last analog circuit inside the CPU that physically drives DQ (data) wires to DRAM at 14.4 Gb/s — one bit every 69 ps. It also samples incoming DQ using the DQS strobe clock and calibrates timing/impedance at every power-on.
PCB traces act as RC low-pass filters, smearing bits (ISI). The TX applies FFE (pre-emphasize the first edge); the RX applies CTLE (RC high-pass) then DFE (subtract predicted ISI of previous bits).
A Phase Interpolator (PI) blends two quadrature clocks to produce any phase with ~1 ps resolution. The training sequence auto-corrects every DQ lane: gate training → write leveling → VREF sweep → ZQ calibration.
LPDDR6 PHY saves 20% power vs LPDDR5. Key techniques: Clock Gating (cut clock to idle blocks), DVFS (scale VDD and frequency together: P = C × VDD² × f). Verify Eye Diagram at worst-case PVT meets JEDEC spec.
AI HPC SoC 3D stacking (TSV-based) DRAM I/F. Si-Proven HBM3E Ctrl+PHY, TGV/TSV Micro-bump (solder ball, 55 µm pitch) 2.5D , 2µm/5µm/10µm RDL . HBM4E ( 12.8 Gb/s ) Target DVFS .
HBM4E/4 MC (Memory Controller) maximizes data throughput with advanced scheduling. On-chip digital calibration and training with BIST loopback — verifies all 16 channels at power-on.
HBM4E CtrlSilicon-proven HBM3E controller at 8.4 Gb/s per pin. Multi-channel (16 channels × 128-bit DQ data bus each) architecture with built-in BIST and full loopback self-test.
HBM3E CtrlHBM PHY connects to the GPU through micro-bumps (tiny solder balls ~40 µm pitch) and RDL (Re-Distribution Layer: metal routing on the interposer, like PCB traces but at µm scale). Interposer uses TGV or pseudo-TSV for vertical connections.
PHYProfessional RDL routing at 2 µm, 5 µm, and 10 µm pitch options. Full SI/PI (Signal Integrity / Power Integrity) co-design: minimizes crosstalk between DQ data lanes and IR-drop on the power grid.
PackageTraining an LLM requires reading billions of weight values per second. DDR5 delivers ~50 GB/s; HBM3E delivers >1 TB/s per stack by stacking 8–12 DRAM dies vertically with TSV (Through-Silicon Via) and placing the stack millimeters from the GPU.
HBM link distance is <2 mm, so voltage swing is only ~200 mV (vs 600 mV for DDR5 on PCB). The wide 128-bit DQ bus per channel is possible because routing stays on the interposer — no PCB trace fan-out.
The MC (Memory Controller) schedules READ/WRITE requests (Open-Page vs Closed-Page policy), corrects 1-bit errors with SECDED Hamming ECC, and self-tests with BIST patterns (Address Sweep, Checkerboard, March C).
The 2.5D package places GPU + HBM stacks on a silicon interposer. AI matrix-multiply operations are memory-bandwidth-limited (Roofline model). Verify bandwidth, power, and DVFS optimal operating point.