Syllabus & Logistics
Logistics
Objectives
- Students gain a solid understanding of the foundation and application of computer vision
- A strong focus is put on geometric and 3D visual understanding
- After this course, students should be able to
- Process and analyze images with efficient tools
- Train modern computer vision models for visual applications
- Explore advanced vision topics for potential research
Grading
- in-class quiz & attendance: 2% $\times$ 10 = 20%
- closed book
- 12 in total; you can miss at most twice
- if you submit more than 10 times, 10 highest scores will be used for the final grading
- 5-7 questions per quiz
- short and simple
- might include bonus questions
- assignments & projects (in English): 10% $\times$ 4 = 40%
- once per 3 weeks (roughly)
- mixture of programming and written problems
- late submission penalized 30% per day
- have 5 “late days” during the entire semester
- projects should be done independently
- final project: 30%
- team: work in a small group, up to 3
- duration: complete in the last month of class
- topic: will be a pool to choose from; topics are exclusive among teams; reserve online
- delivery:
- 15%: a short presentation
- 15%: a technical report (written in English and Latex) in top conference quality
- insights of the problem
- related work / literature review
- major components of the systems of algorithms
- technical details, e.g., learning, training, parameters
- experimental results
- publicly available code, e.g., on GitHub
Plagiarism
Errata/Typo
- Please contact the instructor or TA
Syllabus
01: Introduction
- Course logistics
- Meet the team
- Logistics: homework, projects, attendance, and grading
- What you can learn from this course
- What is computer vision?
- What is computer vision
- Computer vision vs. biological vision
- Computer vision vs. image processing & graphics
- Why study computer vision
- Why is visual perception hard (challenges)
- Applications
- A (very) brief history of AI and CV
- A DARPA’s perspective on AI
- Waves of development of CV
- Some milestones
- How to make a computer to understand an image?
- Marr’s Vision
- Algorithmic
- Computational
- Implementational
- Projection
- Pinhole camera
- Perspective projection
- Homogeneous coordinates
- Orthographic projection
- Camera parameters
- Camera with lens
- Thin lens
- The eye
- Depth of field
- Field of view
- Lens aberrations
- Digital sensor
- Light and shading
- Radiometry
- Reflectance & BRDF
- Photometric stereo
- Shape from normal
- Shape from shading
03: Image Processing
- Image filtering
- Pixel processing
- LSIS (linear shift-invariant systems) and convolution
- Linear image filters
- Non-linear image filters
- Template matching by correlation
- Edge
- What is an edge?
- Canny edge detector
- Role of edge detection in image understanding
- Resampling
- Subsampling and aliasing
- Gaussian pyramids
- Interpolation
04: Features and Matching
- Motivation and intuition
- Is edge enough?
- Interesting point/feature
- Why local?
- Main components
- Feature detector and descriptor
- What are good features?
- Harris corner
- The scaling issue
- Blob (structured edge)
- SIFT
- Local feature matching
- Image stitching
- Image transformation and warping
- Computing homography
- RANSAC
- Panorama
- Blending
05: Calibration
- Perspective projection model: review
- Intrinsic camera parameters
- Extrinsic camera parameters
- Calibration
06: Stereo
- Simple (Binocular) stereo
- Epipolar geometry
- Two-view stereo
- Active stereo
07: Multi-View Stereo & Structure from Motion
- Application and motivation
- Plane sweep stereo
- Depth map fusion
- Patch-based multi-view stereo
- Stereo from Internet photo collections
- Recent trends
08: Recognition
- Brief summary
- The statistical learning approach
- “Shallow” vs. “deep” recognition pipelines
- Taxonomy of prediction tasks and supervision types
09. Classification
- Image classification
- KNN
- Linear classifier
- Loss function
- Cross-entropy loss
- SVM loss
10. Optimization and Neural Network (NN)
- Optimization
- Neural network
11. Back Propagation (BP)
- Computational graphs
- Back Propagation
- Flat BP
- Modular BP
12-13. CNN
- Fully-connected layers
- Activation function
- Convolution layers
- Pooling layers
- Normalization
- Training
14. Visualization and Understanding Models
- Visualizing what models have learned
- Understanding input pixels
- Adversarial perturbations
- Style transfer
15. Detection and Segmentation
- Semantic segmentation
- Object detection
- Instance segmentation
17. 3D Vision and 3D Scene Understanding
- Representations for 3D vision
- Shape representation
- AI + shape
- 3D scene reconstruction and synthesis
- Physical and social interactions in 3D scenes
- Multimodel understanding in 3D scenes
- Planning in 3D scenes
18. Vision meets Cognition
- Physical commonsense
- Social commonsense
- Applications in robotics
Last updated on Aug 1, 2022