Syllabus & Logistics
Logistics
Objectives
- Students gain a solid understanding of the foundation and application of computer vision
- A strong focus is put on geometric and 3D visual understanding
- After this course, students should be able to- Process and analyze images with efficient tools
- Train modern computer vision models for visual applications
- Explore advanced vision topics for potential research
 
Grading
- in-class quiz & attendance: 2% $\times$ 10 = 20%- closed book
- 12 in total; you can miss at most twice
- if you submit more than 10 times, 10 highest scores will be used for the final grading
- 5-7 questions per quiz
- short and simple
- might include bonus questions
 
- assignments & projects (in English): 10% $\times$ 4 = 40%- once per 3 weeks (roughly)
- mixture of programming and written problems
- late submission penalized 30% per day
- have 5 “late days” during the entire semester
- projects should be done independently
 
- final project: 30%- team: work in a small group, up to 3
- duration: complete in the last month of class
- topic: will be a pool to choose from; topics are exclusive among teams; reserve online
- delivery:- 15%: a short presentation
- 15%: a technical report (written in English and Latex) in top conference quality- insights of the problem
- related work / literature review
- major components of the systems of algorithms
- technical details, e.g., learning, training, parameters
- experimental results
- publicly available code, e.g., on GitHub
 
 
 
Plagiarism
Errata/Typo
- Please contact the instructor or TA
Syllabus
01: Introduction
- Course logistics- Meet the team
- Logistics: homework, projects, attendance, and grading
- What you can learn from this course
 
- What is computer vision?- What is computer vision
- Computer vision vs. biological vision
- Computer vision vs. image processing & graphics
- Why study computer vision
- Why is visual perception hard (challenges)
- Applications
 
- A (very) brief history of AI and CV- A DARPA’s perspective on AI
- Waves of development of CV
- Some milestones
 
- How to make a computer to understand an image?- Marr’s Vision
- Algorithmic
- Computational
- Implementational
 
- Projection- Pinhole camera
- Perspective projection
- Homogeneous coordinates
- Orthographic projection
 
- Camera parameters
- Camera with lens- Thin lens
- The eye
- Depth of field
- Field of view
- Lens aberrations
 
- Digital sensor
- Light and shading- Radiometry
- Reflectance & BRDF
- Photometric stereo- Shape from normal
- Shape from shading
 
 
03: Image Processing
- Image filtering- Pixel processing
- LSIS (linear shift-invariant systems) and convolution
- Linear image filters
- Non-linear image filters
- Template matching by correlation
 
- Edge- What is an edge?
- Canny edge detector
- Role of edge detection in image understanding
 
- Resampling- Subsampling and aliasing
- Gaussian pyramids
- Interpolation
 
04: Features and Matching
- Motivation and intuition- Is edge enough?
- Interesting point/feature
- Why local?
- Main components
 
- Feature detector and descriptor- What are good features?
- Harris corner
- The scaling issue
- Blob (structured edge)
- SIFT
- Local feature matching
 
- Image stitching- Image transformation and warping
- Computing homography
- RANSAC
- Panorama
- Blending
 
05: Calibration
- Perspective projection model: review
- Intrinsic camera parameters
- Extrinsic camera parameters
- Calibration
06: Stereo
- Simple (Binocular) stereo
- Epipolar geometry
- Two-view stereo
- Active stereo
07: Multi-View Stereo & Structure from Motion
- Application and motivation
- Plane sweep stereo
- Depth map fusion
- Patch-based multi-view stereo
- Stereo from Internet photo collections
- Recent trends
08: Recognition
- Brief summary
- The statistical learning approach
- “Shallow” vs. “deep” recognition pipelines
- Taxonomy of prediction tasks and supervision types
09. Classification
- Image classification
- KNN
- Linear classifier
- Loss function- Cross-entropy loss
- SVM loss
 
10. Optimization and Neural Network (NN)
- Optimization
- Neural network
11. Back Propagation (BP)
- Computational graphs
- Back Propagation
- Flat BP
- Modular BP
12-13. CNN
- Fully-connected layers
- Activation function
- Convolution layers
- Pooling layers
- Normalization
- Training
14. Visualization and Understanding Models
- Visualizing what models have learned
- Understanding input pixels
- Adversarial perturbations
- Style transfer
15. Detection and Segmentation
- Semantic segmentation
- Object detection
- Instance segmentation
17. 3D Vision and 3D Scene Understanding
- Representations for 3D vision
- Shape representation
- AI + shape
- 3D scene reconstruction and synthesis
- Physical and social interactions in 3D scenes
- Multimodel understanding in 3D scenes
- Planning in 3D scenes
18. Vision meets Cognition
- Physical commonsense
- Social commonsense
- Applications in robotics
Last updated on Aug 1, 2022