Deep Learning for Image Processing

About

This course covers the fundamentals and recent advances in deep learning for image processing. The first half builds foundational knowledge (CNNs, RNNs, Transformers, ViT, detection/segmentation, self-supervised learning), while the second half explores modern topics (foundation models, diffusion, vision-language models, video understanding, embodied AI) through lectures and student-led paper presentations.

Instructor: Kihyun Na (Research Professor)
Affiliation: BK21 AI Education and Research Group & Institute for ICT, Handong Global University
Schedule: Spring 2026, Weekly
Format: Lecture (Wk 1–7) → Lecture + Paper Seminar (Wk 9–15) → Miniconference (Wk 16)
Prerequisites: Basic deep learning fundamentals, Python/PyTorch
LMS: Handong LMS (enrolled students)

Schedule

Slides will be posted after each lecture. The schedule may be adjusted as the semester progresses.

Wk 2–7: 90 min lecture + 40–50 min challenge feedback & discussion. Wk 8: Review writing workshop + paper seminar role explanation. Wk 9–15: 60–70 min lecture + ~80 min paper presentation session.

Wk	Topic	Materials
1	OT + Introduction Lecture	slides
2	DL Fundamentals Review Lecture	slides
3	Convolutional Neural Networks Lecture	slides notes
4	From Sequence Modeling to Transformer Lecture	slides notes
5	Transformer in Vision Lecture	slides notes
6	Detection & Segmentation Lecture	slides
7	Self-Supervised Learning Lecture	slides
8	Review Literacy + Role Explanation Lecture
9	Foundation Models (CLIP, SAM) Lecture + Paper #1
10	Diffusion Models Lecture + Paper #2
11	Conditional Generation Lecture + Paper #3
12	Vision-Language Models Lecture + Paper #4
13	VLM Applications Lecture + Paper #5
14	Video Understanding Lecture + Paper #6
15	Embodied AI & Robot Vision Lecture + Paper #7
16	Miniconference (Final Project Presentations) Conference

Paper Presentation Sessions

Starting from Week 9, each class includes a student-led paper presentation session (~80 min). Students rotate through all roles over the 7-week seminar period.

Author

Present the paper + rebuttal

Area Chair

Synthesize reviews, accept/reject decision

Reviewer ×2

Submit written review (Pro & Con)

Archaeologist

Prior work context briefing

Future Researcher

Limitations + follow-up ideas

Reproducibility Engineer

Code availability, hyperparameters, re-implementation assessment

Paper presentation format inspired by Raffel & Jacobson's role-playing seminar model.

Challenge & Final Project

Students participate in ML/CV challenges as teams throughout the semester. The final project (30%) takes the form of a technical report or blog post based on challenge results or independent research. Teams present their work at the Week 16 miniconference.

Team Formation: Self-organized, multiple team participation allowed
Deliverable: Written report with Author Contributions section
Presentation: Week 16 Miniconference (team-based)

Acknowledgements

Course materials draw inspiration from the following open resources.

Stanford CS231n: Deep Learning for Computer Vision
Fei-Fei Li, Ehsan Adeli, et al. — cs231n.stanford.edu
UMich EECS 498/598: Deep Learning for Computer Vision
Justin Johnson — web.eecs.umich.edu/~justincj
MIT 6.8300: Advances in Computer Vision
Vincent Sitzmann — scenerepresentations.org
MIT 6.7960: Deep Learning
Phillip Isola, Sara Beery, Jeremy Bernstein — MIT OCW
CMU 16-824: Visual Learning and Recognition
Jun-Yan Zhu — visual-learning.cs.cmu.edu