AI-Powered Data Science Mock Interview Platform

Overview

Built a full-stack mock interview platform that simulates realistic Data Science technical interviews using LLM-powered adaptive questioning. The system conducts real-time interviews over WebSocket, evaluates answers against weighted rubrics with 6 scoring dimensions, dynamically adjusts follow-up depth based on answer quality, and generates personalized post-interview reports with 7-day training plans. The interviewer persona is deliberately neutral and probing — modeled after real interview dynamics, not a chatbot.

Problem

Existing interview prep tools fall into two categories: static flashcard-style Q&A (no adaptivity, no follow-ups) and generic chatbot conversations (encouraging tone, no rubric-based evaluation, no time pressure). Neither replicates what a real DS interview actually feels like: a neutral evaluator who probes weak spots, challenges assumptions, manages time, and moves on when you're stuck — not a cheerleader who says "Great answer!" regardless.

The core design question was: how do you make an LLM behave like an interviewer, not a tutor?

Why It Mattered

Interview preparation is high-stakes and deeply personal. Generic tools don't surface the specific gaps in a candidate's understanding — they either accept everything or reject everything. A realistic simulator needs to:

Detect answer quality in real-time and branch accordingly
Apply time pressure naturally (not just a countdown timer, but verbal cues from the interviewer)
Score across multiple dimensions (technical correctness alone misses communication, self-awareness, problem-solving approach)
Provide actionable feedback, not just a pass/fail grade

Approach

Interview Engine (State Machine)

Designed a deterministic state machine (WARMUP → CORE → DEEP_PROBE → RETESTING → WRAPUP) with probabilistic transitions driven by LLM evaluation output. Questions are grouped by domain (depth-first, not breadth-first) with adaptive follow-up trees up to 2 levels deep. The engine enforces realistic pacing (~4 minutes per question including follow-ups) and automatically redirects after 5 minutes on a single question.

LLM Integration (Dual-Agent Architecture)

Two separate LLM roles operate in parallel per answer:

Evaluator agent: Scores the answer against a per-question weighted rubric and 6 behavioral dimensions (Technical Correctness 40%, Problem-Solving 25%, Communication 15%, Depth 10%, Challenge Handling 5%, Self-Awareness 5%). Returns structured JSON with quality classification, misconceptions, and missing concepts.
Interviewer agent: Receives the evaluation result and generates the next response. Prompt engineering enforces neutral tone ("Okay.", "I see.", not "Great answer!"), three probing modes (strong answer → escalate difficulty, partial → probe the gap, weak → one probe then move on), and time-pressure verbal cues.

The key insight: separating evaluation from delivery lets the interviewer reference specific things the candidate said while still following a structured assessment framework.

Question Bank

100+ questions across 10 Data Science domains, each with:

Weighted rubric criteria with key points
2-level follow-up trees triggered by answer quality
Common mistakes and misconceptions for targeted probing
Difficulty tiers (foundational, intermediate, advanced) with level-appropriate distribution

Real-Time Communication

WebSocket protocol handles bidirectional text and audio streams. The server injects active silence signals (2-second pauses before responding) and time-warning messages at 15/5/2 minute thresholds — creating natural interview pressure without artificial UI elements.

Architecture

Backend: FastAPI (async Python), WebSocket interview handler, LLM client via OpenAI-compatible API
Frontend: Next.js 16, React 19, Zustand state management, real-time transcript with auto-scroll
Database: PostgreSQL (Supabase) with Row-Level Security for multi-tenancy
Voice (optional): OpenAI Whisper (STT) + TTS for spoken interviews
Infrastructure: Docker Compose orchestration (6 services), self-hosted Supabase stack

Results & Impact

Built a complete interview simulator that adapts difficulty, probes weaknesses, and generates actionable reports — not a chatbot wrapper
100+ questions across 10 DS domains with structured follow-up trees and weighted rubrics
6-dimension scoring with behavioral anchors replaces binary pass/fail evaluation
Realistic pacing: ~7-8 questions per 30-minute session (vs. 15 in the initial rapid-fire version), matching real interview cadence
Interview persona validated through E2E testing: neutral tone, no cheerleading, challenges weak answers, references candidate's specific words

Lessons Learned

The hardest part of building an LLM-powered interviewer isn't the technology — it's prompt engineering the absence of helpfulness. LLMs default to being encouraging and explanatory; making one behave like a neutral evaluator who doesn't coach requires explicit negative instructions ("Do NOT say 'Great answer'", "Do NOT explain the correct answer")
Separating evaluation from response generation was the key architectural decision — it lets each agent optimize for its role without conflicting objectives
Real interviews are depth-first, not breadth-first. The initial version hopped between domains question-by-question; grouping by domain and completing each before moving on felt dramatically more realistic
Time pressure is more about verbal cues than UI timers. "We have about 5 minutes left, let's make sure we cover the remaining topics" from the interviewer creates more pressure than a red countdown clock