About
This course covers the fundamentals and recent advances in deep learning for image processing. The first half builds foundational knowledge (CNNs, RNNs, Transformers, ViT, detection/segmentation, self-supervised learning), while the second half explores modern topics (foundation models, diffusion, vision-language models, video understanding, embodied AI) through lectures and student-led paper presentations.
- Instructor
- Kihyun Na (Research Professor)
- Affiliation
- BK21 AI Education and Research Group & Institute for ICT, Handong Global University
- Schedule
- Spring 2026, Weekly
- Format
- Lecture (Wk 1–7) → Lecture + Paper Seminar (Wk 9–15) → Miniconference (Wk 16)
- Prerequisites
- Basic deep learning fundamentals, Python/PyTorch
- LMS
- Handong LMS (enrolled students)
Schedule
Slides will be posted after each lecture. The schedule may be adjusted as the semester progresses.
Wk 2–7: 90 min lecture + 40–50 min challenge feedback & discussion. Wk 8: Review writing workshop + paper seminar role explanation. Wk 9–15: 60–70 min lecture + ~80 min paper presentation session.
| Wk | Topic | Materials |
|---|---|---|
| 1 | OT + Introduction Lecture | slides |
| 2 | DL Fundamentals Review Lecture | slides |
| 3 | Convolutional Neural Networks Lecture | slides notes |
| 4 | From Sequence Modeling to Transformer Lecture | slides notes |
| 5 | Transformer in Vision Lecture | slides notes |
| 6 | Detection & Segmentation Lecture | slides |
| 7 | Self-Supervised Learning Lecture | slides |
| 8 | Review Literacy + Role Explanation Lecture | |
| 9 | Foundation Models (CLIP, SAM) Lecture + Paper #1 | |
| 10 | Diffusion Models Lecture + Paper #2 | |
| 11 | Conditional Generation Lecture + Paper #3 | |
| 12 | Vision-Language Models Lecture + Paper #4 | |
| 13 | VLM Applications Lecture + Paper #5 | |
| 14 | Video Understanding Lecture + Paper #6 | |
| 15 | Embodied AI & Robot Vision Lecture + Paper #7 | |
| 16 | Miniconference (Final Project Presentations) Conference |
Paper Presentation Sessions
Starting from Week 9, each class includes a student-led paper presentation session (~80 min). Students rotate through all roles over the 7-week seminar period.
Author
Present the paper + rebuttal
Area Chair
Synthesize reviews, accept/reject decision
Reviewer ×2
Submit written review (Pro & Con)
Archaeologist
Prior work context briefing
Future Researcher
Limitations + follow-up ideas
Reproducibility Engineer
Code availability, hyperparameters, re-implementation assessment
Paper presentation format inspired by Raffel & Jacobson's role-playing seminar model.
Challenge & Final Project
Students participate in ML/CV challenges as teams throughout the semester. The final project (30%) takes the form of a technical report or blog post based on challenge results or independent research. Teams present their work at the Week 16 miniconference.
- Team Formation
- Self-organized, multiple team participation allowed
- Deliverable
- Written report with Author Contributions section
- Presentation
- Week 16 Miniconference (team-based)
Acknowledgements
Course materials draw inspiration from the following open resources.
-
Stanford CS231n: Deep Learning for Computer Vision
Fei-Fei Li, Ehsan Adeli, et al. — cs231n.stanford.edu -
UMich EECS 498/598: Deep Learning for Computer Vision
Justin Johnson — web.eecs.umich.edu/~justincj -
MIT 6.8300: Advances in Computer Vision
Vincent Sitzmann — scenerepresentations.org -
MIT 6.7960: Deep Learning
Phillip Isola, Sara Beery, Jeremy Bernstein — MIT OCW -
CMU 16-824: Visual Learning and Recognition
Jun-Yan Zhu — visual-learning.cs.cmu.edu