Syllabus & Logistics

Logistics

Objectives

  • Students gain a solid understanding of the foundation and application of computer vision
  • A strong focus is put on geometric and 3D visual understanding
  • After this course, students should be able to
    • Process and analyze images with efficient tools
    • Train modern computer vision models for visual applications
    • Explore advanced vision topics for potential research

Grading

  • in-class quiz & attendance: 2% $\times$ 10 = 20%
    • closed book
    • 12 in total; you can miss at most twice
    • if you submit more than 10 times, 10 highest scores will be used for the final grading
    • 5-7 questions per quiz
    • short and simple
    • might include bonus questions
  • assignments & projects (in English): 10% $\times$ 4 = 40%
    • once per 3 weeks (roughly)
    • mixture of programming and written problems
    • late submission penalized 30% per day
    • have 5 “late days” during the entire semester
    • projects should be done independently
  • final project: 30%
    • team: work in a small group, up to 3
    • duration: complete in the last month of class
    • topic: will be a pool to choose from; topics are exclusive among teams; reserve online
    • delivery:
      • 15%: a short presentation
      • 15%: a technical report (written in English and Latex) in top conference quality
        • insights of the problem
        • related work / literature review
        • major components of the systems of algorithms
        • technical details, e.g., learning, training, parameters
        • experimental results
        • publicly available code, e.g., on GitHub

Plagiarism

Errata/Typo

  • Please contact the instructor or TA

Syllabus

01: Introduction

  • Course logistics
    • Meet the team
    • Logistics: homework, projects, attendance, and grading
    • What you can learn from this course
  • What is computer vision?
    • What is computer vision
    • Computer vision vs. biological vision
    • Computer vision vs. image processing & graphics
    • Why study computer vision
    • Why is visual perception hard (challenges)
    • Applications
  • A (very) brief history of AI and CV
    • A DARPA’s perspective on AI
    • Waves of development of CV
    • Some milestones
  • How to make a computer to understand an image?
    • Marr’s Vision
    • Algorithmic
    • Computational
    • Implementational

02: Image Formation

  • Projection
    • Pinhole camera
    • Perspective projection
    • Homogeneous coordinates
    • Orthographic projection
  • Camera parameters
    • Intrinsics
    • Extrinsics
  • Camera with lens
    • Thin lens
    • The eye
    • Depth of field
    • Field of view
    • Lens aberrations
  • Digital sensor
  • Light and shading
    • Radiometry
    • Reflectance & BRDF
    • Photometric stereo
      • Shape from normal
      • Shape from shading

03: Image Processing

  • Image filtering
    • Pixel processing
    • LSIS (linear shift-invariant systems) and convolution
    • Linear image filters
    • Non-linear image filters
    • Template matching by correlation
  • Edge
    • What is an edge?
    • Canny edge detector
    • Role of edge detection in image understanding
  • Resampling
    • Subsampling and aliasing
    • Gaussian pyramids
    • Interpolation

04: Features and Matching

  • Motivation and intuition
    • Is edge enough?
    • Interesting point/feature
    • Why local?
    • Main components
  • Feature detector and descriptor
    • What are good features?
    • Harris corner
    • The scaling issue
    • Blob (structured edge)
    • SIFT
    • Local feature matching
  • Image stitching
    • Image transformation and warping
    • Computing homography
    • RANSAC
    • Panorama
    • Blending

05: Calibration

  • Perspective projection model: review
  • Intrinsic camera parameters
  • Extrinsic camera parameters
  • Calibration

06: Stereo

  • Simple (Binocular) stereo
  • Epipolar geometry
  • Two-view stereo
  • Active stereo

07: Multi-View Stereo & Structure from Motion

  • Application and motivation
  • Plane sweep stereo
  • Depth map fusion
  • Patch-based multi-view stereo
  • Stereo from Internet photo collections
  • Recent trends

08: Recognition

  • Brief summary
  • The statistical learning approach
  • “Shallow” vs. “deep” recognition pipelines
  • Taxonomy of prediction tasks and supervision types

09. Classification

  • Image classification
  • KNN
  • Linear classifier
  • Loss function
    • Cross-entropy loss
    • SVM loss

10. Optimization and Neural Network (NN)

  • Optimization
  • Neural network

11. Back Propagation (BP)

  • Computational graphs
  • Back Propagation
  • Flat BP
  • Modular BP

12-13. CNN

  • Fully-connected layers
  • Activation function
  • Convolution layers
  • Pooling layers
  • Normalization
  • Training

14. Visualization and Understanding Models

  • Visualizing what models have learned
  • Understanding input pixels
  • Adversarial perturbations
  • Style transfer

15. Detection and Segmentation

  • Semantic segmentation
  • Object detection
  • Instance segmentation

16. RNN and Transformer

  • RNN
  • Attention
  • Transformer

17. 3D Vision and 3D Scene Understanding

  • Representations for 3D vision
  • Shape representation
  • AI + shape
  • 3D scene reconstruction and synthesis
  • Physical and social interactions in 3D scenes
  • Multimodel understanding in 3D scenes
  • Planning in 3D scenes

18. Vision meets Cognition

  • Physical commonsense
  • Social commonsense
  • Applications in robotics
Next