Syllabus & Logistics

Logistics

Objectives

Students gain a solid understanding of the foundation and application of computer vision
A strong focus is put on geometric and 3D visual understanding
After this course, students should be able to
- Process and analyze images with efficient tools
- Train modern computer vision models for visual applications
- Explore advanced vision topics for potential research

Grading

in-class quiz & attendance: 2% $\times$ 10 = 20%
- closed book
- 12 in total; you can miss at most twice
- if you submit more than 10 times, 10 highest scores will be used for the final grading
- 5-7 questions per quiz
- short and simple
- might include bonus questions
assignments & projects (in English): 10% $\times$ 4 = 40%
- once per 3 weeks (roughly)
- mixture of programming and written problems
- late submission penalized 30% per day
- have 5 “late days” during the entire semester
- projects should be done independently
final project: 30%
- team: work in a small group, up to 3
- duration: complete in the last month of class
- topic: will be a pool to choose from; topics are exclusive among teams; reserve online
- delivery:
  - 15%: a short presentation
  - 15%: a technical report (written in English and Latex) in top conference quality
    - insights of the problem
    - related work / literature review
    - major components of the systems of algorithms
    - technical details, e.g., learning, training, parameters
    - experimental results
    - publicly available code, e.g., on GitHub

Plagiarism

Please refer to the plagiarism page for details.

Errata/Typo

Please contact the instructor or TA

Syllabus

01: Introduction

Course logistics
- Meet the team
- Logistics: homework, projects, attendance, and grading
- What you can learn from this course
What is computer vision?
- What is computer vision
- Computer vision vs. biological vision
- Computer vision vs. image processing & graphics
- Why study computer vision
- Why is visual perception hard (challenges)
- Applications
A (very) brief history of AI and CV
- A DARPA’s perspective on AI
- Waves of development of CV
- Some milestones
How to make a computer to understand an image?
- Marr’s Vision
- Algorithmic
- Computational
- Implementational

02: Image Formation

Projection
- Pinhole camera
- Perspective projection
- Homogeneous coordinates
- Orthographic projection
Camera parameters
- Intrinsics
- Extrinsics
Camera with lens
- Thin lens
- The eye
- Depth of field
- Field of view
- Lens aberrations
Digital sensor
Light and shading
- Radiometry
- Reflectance & BRDF
- Photometric stereo
  - Shape from normal
  - Shape from shading

03: Image Processing

Image filtering
- Pixel processing
- LSIS (linear shift-invariant systems) and convolution
- Linear image filters
- Non-linear image filters
- Template matching by correlation
Edge
- What is an edge?
- Canny edge detector
- Role of edge detection in image understanding
Resampling
- Subsampling and aliasing
- Gaussian pyramids
- Interpolation

04: Features and Matching

Motivation and intuition
- Is edge enough?
- Interesting point/feature
- Why local?
- Main components
Feature detector and descriptor
- What are good features?
- Harris corner
- The scaling issue
- Blob (structured edge)
- SIFT
- Local feature matching
Image stitching
- Image transformation and warping
- Computing homography
- RANSAC
- Panorama
- Blending

05: Calibration

Perspective projection model: review
Intrinsic camera parameters
Extrinsic camera parameters
Calibration

06: Stereo

Simple (Binocular) stereo
Epipolar geometry
Two-view stereo
Active stereo

07: Multi-View Stereo & Structure from Motion

Application and motivation
Plane sweep stereo
Depth map fusion
Patch-based multi-view stereo
Stereo from Internet photo collections
Recent trends

08: Recognition

Brief summary
The statistical learning approach
“Shallow” vs. “deep” recognition pipelines
Taxonomy of prediction tasks and supervision types

09. Classification

Image classification
KNN
Linear classifier
Loss function
- Cross-entropy loss
- SVM loss

10. Optimization and Neural Network (NN)

Optimization
Neural network

11. Back Propagation (BP)

Computational graphs
Back Propagation
Flat BP
Modular BP

12-13. CNN

Fully-connected layers
Activation function
Convolution layers
Pooling layers
Normalization
Training

14. Visualization and Understanding Models

Visualizing what models have learned
Understanding input pixels
Adversarial perturbations
Style transfer

15. Detection and Segmentation

Semantic segmentation
Object detection
Instance segmentation

16. RNN and Transformer

RNN
Attention
Transformer

17. 3D Vision and 3D Scene Understanding

Representations for 3D vision
Shape representation
AI + shape
3D scene reconstruction and synthesis
Physical and social interactions in 3D scenes
Multimodel understanding in 3D scenes
Planning in 3D scenes

18. Vision meets Cognition

Physical commonsense
Social commonsense
Applications in robotics

Last updated on Aug 1, 2022