Skip to content
Resume

Harry (Jiajun) Hu

New York, Open for Relocation · [email protected]

Summary

AI/ML Engineer and Data Scientist building deployable machine learning systems across computer vision, multimodal AI, and decision-focused analytics.

Experience
Machine Learning Engineer, Washington University in St. LouisMay 2025 – Dec 2025
  • Built and deployed a geospatial ML outlier-detection pipeline (Isolation Forest, DBSCAN + tree-based models) to rank rooms for expert review — uncovered 14% under-utilized space and projected 9% property revenue uplift.
  • Integrated Zillow API to collect real-time valuations for 220 properties and ran geospatial analysis in ArcGIS Pro — revealing over 10% latent revenue upside to guide long-term space-use strategy.
  • Optimized form UX via A/B testing, cutting auditor input time by 25% and halving error rates.
  • Built live Power BI dashboards and delivered weekly insight reviews, raising daily audit coverage by 20%.
Machine Learning Engineer, GlodonJun 2024 – Sep 2024
  • Fine-tuned YOLO with frozen CNN backbone on limited blueprint data, chained OCR + rule-based aggregation to compute linear meters — reducing takeoff from 60 min to 30 sec at 97% accuracy.
  • Designed domain-specific augmentation (rotation, scaling, line-width/contrast perturbation) that boosted minority-class recall by 20% under severe class imbalance.
  • Identified YOLO's architectural ceiling on dense small targets with structural relationships — reflection that directly informed the segmentation pivot in my next project.
Machine Learning Engineer, SigticaJan 2023 – May 2023
  • Built an end-to-end ML pipeline combining OCR, LLM-assisted semantic extraction, and data-centric optimization to classify 50,000+ historical documents — achieving 87% precision and 93% recall on 3,000 labeled PDFs.
  • Designed a lightweight labeling QA loop and data quality controls that improved training signal without large-scale data collection — a data-centric approach over model complexity.
  • Implemented LLM-assisted outlier detection to flag edge cases and rare document types, preventing noisy samples from degrading model performance.
  • Packaged a repeatable inference workflow and delivered stakeholder reports translating model predictions into actionable research insights.
Data Scientist, The 11th Impact LLCMay 2022 – Sep 2022
  • Decomposed multi-stage ad funnel (impression → click → conversion → revenue) to identify bottleneck segments, driving budget reallocation that lifted ROAS by 18%.
  • Designed and supported A/B tests on creative layouts and ad placements with proper sample sizing and significance thresholds.
  • Developed time-series analyses over rolling windows (7/15/30 days) to identify seasonality patterns and optimize campaign timing.
  • Built Power BI dashboards with ETL pipelines for self-serve insights, enabling non-technical stakeholders to act without analyst support.
Education
M.Sc. in Data Analytics and StatisticsAug 2024 – Dec 2025

Washington University in St. Louis, McKelvey School of Engineering

B.Sc. in Data Science and MathematicsJan 2021 – May 2024

New York University, College of Arts & Science

Skills

Programming: Python, Pandas, NumPy, PyTorch, TensorFlow, scikit-learn, SQL, Java, Git/GitHub

Techniques: Machine Learning, Deep Learning, Neural Networks, Computer Vision, Classification/Regression, Feature Engineering, Hyperparameter Tuning, Cross-Validation, A/B Testing, Statistical Analysis, Causal Inference, Clustering, RAG, LLM Integration, Prompt Engineering

MLOps & Engineering: Model Deployment/Serving, Data Preprocessing, Data Pipeline, MLOps (CI/CD, Monitoring, Retraining), Fairness/Bias Testing, Data Visualization

Tools & Platforms: AWS, SageMaker, Docker, Kubernetes, Hugging Face, LangChain, Claude Code, ArcGIS, Power BI, Tableau