Facenet Notes

Paper: FaceNet A Unified Embedding for Face Recognition and Clustering
Authors: Florian Schroff (Google), Dimitry Kalenichenko (Google) and James Philbin (Google)
Area: Computer Vision, Clustering, Classification, Deep Learning
Year: 2015
Highlighted Paper

Background:

  • Euclidean space, distance and vector norms
  • Nearest neighbor/k-means/clustering
  • CNNs: Stanford's CS231N course notes, Deep learning book chapter 9
  • Important architectures for this paper:
    1. Zeiler&Fergus
    2. Google LeNet Inception Model
  • Key Contributions:

    1. Learns a mapping from face images to a compact Euclidean space where distances directly correspond to measure of face similarity. The method uses a CNN to directly optimize the embedding itself.
    2. To train - use triplets of roughly aligned matching/non-matching face patches generated using an online triplet mining method.

    Why is this novel? Unlike previous approaches where a final classification layer is used to predict the class, here we are leaning an embedding which can then be used for various classification purposes.

    Model Triplet Loss:
    $$\sum_{i}^{N}[\lVert f(x_{i}^{a}) - f(x_{i}^{p}) \rVert_{2}^{2} - \lVert f(x_{i}^{a}) - f(x_{i}^{n}) \rVert_{2}^{2} + \alpha)]$$ \(a\) is the anchor, \(p\) is a positive and \(n\) is a negative. You essentially want to optimize for making the positive closer to the anchor than the negative.

    Triplet Selection:
    We focus on the online generation and use large mini-batches in the order of a few thousand exemplars and only compute the \(argmin\) and \(argmax\) within that mini-batch.

    CNN Architecturse:
    Existing architectures - Inception model and the Zeiler&Fergus model.

    Some implementations:
    tbmoon's implementation using PyTorch
    timseler's implementation

    Datasets
    Labeled Faces in the Wild (LFW) courtsey of UMass
    YouTube Faces DB

    On Spectral Clustering

    Learn By Doing Rl Actor Critic

    Implementing and Understanding Actor Critic Algorithms

    I was following Seregey Levine's lecture and using the following pseudocode on page 19 to implement the algorithm: