š„ Applications of Foundation Models in Biometrics
In this section, we review recent papers on the applications of foundation models in biometrics:
- Foundation Models for Biometric Recognition
- Foundation Models for Soft-biometric Detection
- Foundation Models for Deepfake and Forgery Detection
- Foundation Models for Anti-spoofing
- Foundation Models for Synthetic Biometric Generation
Foundation Models for Biometric Recognition
| Paper Title | Year | Modality / Task | Paper | Code |
|---|---|---|---|---|
| Exploring wav2vec 2.0 on speaker verification and language identification | 2020 | speaker and language identification | link | NA |
| ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities | 2024 | face verification, gender detection, age estimation | link | NA |
| How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability | 2024 | face verification | link | NA |
| ChatGPT Meets Iris Biometrics | 2024 | iris recognition | link | NA |
| Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition | 2025 | face verification | link | NA |
| Benchmarking Foundation Models for Zero-Shot Biometric Tasks | 2025 | face verification, soft biometric attribute prediction (gender and race), iris recognition, iris presentation attack detection, face morph detection, and face deepfake detection | link | NA |
| A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding | 2021 | speaker verification | link | link |
| Iris-SAM: Iris Segmentation Using a Foundation Model | 2024 | iris segmentation | link | link |
| SAM-Iris: A SAM-Based Iris Segmentation Algorithm | 2025 | iris segmentation | link | NA |
| Froundation: Are foundation models ready for face recognition? | 2024 | face recognition | link | link |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | 2025 | audio-visual human video recognition (emotion recognition, expression description, and action understanding) | link | link |
| FaceLLM: A Multimodal Large Language Model for Face Understanding | 2025 | face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting | link | link |
| FaceXBench: Evaluating Multimodal LLMs on Face Understanding | 2025 | face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting | link | link |
| Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants | 2025 | facial attributes, age estimation, expression recognition, attack detection, recognition; human attributes, action, spatial/social relations, re-ID | link | link |
| From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing | 2024 | face recognition explainability | link | NA |
| FaceOracle: Chat with a Face Image Oracle | 2025 | face image quality assessment | link | NA |
| Unispeech-sat: Universal speech representation learning with speaker aware pre-training | 2022 | speaker ID, verification, diarization, phoneme recognition, keyword spotting, emotion recognition | link | link |
| Large-scale self-supervised speech representation learning for automatic speaker verification | 2022 | speaker verification | link | link |
| General facial representation learning in a visual-linguistic manner | 2022 | face parsing, alignment, attribute recognition | link | link |
| Marlin: Masked autoencoder for facial video representation learning | 2023 | face attribute recognition, expression recognition, deepfake detection, lip synchronization | link | link |
| Self-Supervised Facial Representation Learning with Facial Region Awareness | 2024 | face expression and attribute recognition | link | link |
| Pose-disentangled contrastive learning for self-supervised facial representation | 2023 | face expression, face recognition, head pose estimation | link | link |
| Pros: Facial omni-representation learning via prototype-based self-distillation | 2024 | face parsing, attribute recognition, emotion detection, landmark detection | link | link |
| ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces | 2024 | face expression change, weight change, age change estimation | link | NA |
| SwinFace: a multi-task transformer for face recognition, expression recognition, age estimation and attribute estimation | 2023 | face attributes, age estimation, expression recognition, face recognition | link | link |
| FaceXFormer: A Unified Transformer for Facial Analysis | 2024 | face parsing, landmarks, head pose estimation, age/gender/race estimation, attribute recognition, expression recognition, | link | link |
| Task-adaptive Q-Face | 2024 | head pose estimation, face attribute recognition, age estimation, expression recognition | link | NA |
| Faceptor: A generalist model for face perception | 2024 | face parsing, landmarks, age and gender estimation, attribute recognition, expression recognition, face recognition | link | link |
Foundation Models for Soft-biometric Detection
| Paper Title | Year | Modality / Task | Paper | Code |
|---|---|---|---|---|
| Robust light-weight facial affective behavior recognition with clip | 2024 | facial expression classification; action unit detection | link | link |
| Cliper: A unified vision-language framework for in-the-wild facial expression recognition | 2024 | face static & dynamic expression recognition | link | link |
| Emoclip: A vision-language method for zero-shot video facial expression recognition | 2024 | video facial emotion recognition | link | link |
| Finecliper: Multi-modal fine-grained clip for dynamic facial expression recognition with adapters | 2024 | dynamic facial expression recognition | link | NA |
| Face-mllm: A large face perception model | 2024 | face age/gender, expression, action units, attributes | link | NA |
| FaceGPT: Self-supervised Learning to Chat about 3D Human Faces | 2024 | face 3DMM parameter generation | link | NA |
| FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs | 2025 | face attribute detection | link | link |
| Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning | 2025 | face expression recognition, action unit detection, facial attribute detection, age estimation, and deepfake detection | link | link |
| FaceInsight: A Multimodal Large Language Model for Face Perception | 2025 | face attribute recognition, age/ gender/ race estimation, and expression prediction | link | NA |
| R1-omni: Explainable omni-multimodal emotion recognition with reinforcement learning | 2025 | audio-visual emotion recognition with reasoning | link | link |
| ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities | 2024 | face gender detection, age estimation | link | NA |
| How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability | 2024 | age, gender, ethnicity, hair color | link | NA |
| ChatGPT Meets Iris Biometrics | 2024 | irisāface matching; soft-biometrics | link | NA |
Foundation Models for Deepfake and Forgery Detection
| Paper Title | Year | Modality / Task | Paper | Code |
|---|---|---|---|---|
| MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection | 2024 | face forgery detection | link | link |
| Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection | 2024 | face forgery detection | link | link |
| MADation: Face Morphing Attack Detection with Foundation Models | 2025 | face morph attack detection | link | link |
| FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning | 2024 | deepfake detection, anti-spoofing, unseen diffusion forgery | link | link |
| Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation | 2022 | voice spoofing & deepfake detection | link | link |
| X2-dfd: A framework for explainable and extendable deepfake detection | 2024 | face deepfake detection | link | link |
| Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant | 2024 | forgery analysis assistant | link | link |
| Towards general visual-linguistic face forgery detection (v2) | 2025 | face forgery detection | link | link |
| Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection | 2024 | face morph attack detection | link | link |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | 2024 | speaker deepfake detection | link | NA |
| Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector | 2025 | face deepfake detection \newline+ description | link | link |
| Standing on the shoulders of giants: Reprogramming visual-language model for general deepfake detection | 2025 | face deepfake detection | link | link |
| Can chatgpt detect deepfakes? a study of using multimodal large language models for media forensics | 2024 | face deepfake detection | link | link |
| How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception | 2024 | audio-visual deepfake detection | link | NA |
| ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models | 2025 | face morph detection | link | NA |
Foundation Models for Anti-spoofing
| Paper Title | Year | Modality / Task | Paper | Code |
|---|---|---|---|---|
| Flip: Cross-domain face anti-spoofing with language guidance | 2023 | fineātune CLIP image encoder for face (FLIP alignment) | link | link |
| On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection | 2023 | SSL via maskedāfingerprint prediction with prompt tuning | link | NA |
| CPL-CLIP: Compound Prompt Learning for Flexible-Modal Face Anti-Spoofing | 2024 | face anti-spoofing | link | NA |
| Fm-clip: Flexible modal clip for face anti-spoofing | 2024 | crossāmodal antispoofing | link | NA |
| La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection | 2024 | Unified physical-digital face attack detection | link | NA |
| Cfpl-fas: Class free prompt learning for generalizable face anti-spoofing | 2024 | face anti-spoofing | link | NA |
| InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing | 2025 | face anti-spoofing | link | link |
| Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing | 2025 | Multimodal face anti-spoofing | link | link |
| FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models | 2025 | face anti-spoofing (classification and attack localization) | link | link |
| Interpretable face anti-spoofing: Enhancing generalization with multimodal large language models | 2025 | face anti-spoofing | link | NA |
| Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning | 2025 | face anti-spoofing (spoofing detection and reasoning) | link | NA |
| VL-FAS: Domain Generalization via Vision-Language Model For Face Anti-Spoofing | 2024 | face antiāspoofing | link | NA |
| FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection | 2025 | face antiāspoofing | link | link |
| Towards Iris Presentation Attack Detection with Foundation Models | 2025 | iris antiāspoofing | link | NA |
| Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning | 2025 | face presentation attack detection | link | link |
| Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? | 2025 | face presentation attack detection | link | link |
| Shield: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models | 2025 | face anti-spoofing (RGB, infrared, depth) and forgery detection | link | link |
| ChatGPT Meets Iris Biometrics | 2024 | iris presentationāattack detection | link | NA |
Foundation Models for Synthetic Biometric Generation
| Paper Title | Year | Modality / Task | Paper | Code |
|---|---|---|---|---|
| Toward open-world text-driven face generation and manipulation via stylegan3 | 2024 | Text-to-face synthesis | link | NA |
| AnyFace++: A unified framework for free-style text-to-face synthesis and manipulation | 2024 | Text-guided face editing | link | NA |
| AnyFace: Free-style text-to-face synthesis and manipulation | 2022 | Text-to-face generation | link | NA |
| Towards counterfactual image manipulation via clip | 2022 | Controllable text-to-face | link | link |
| Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation | 2024 | Prompt-based face synthesis | link | NA |
| Tecm-clip: Text-based controllable multi-attribute face image manipulation | 2022 | face attribute / expression editing | link | link |
| Stylemc: Multi-channel based fast text-guided image generation and manipulation | 2022 | face multi-attribute editing | link | link |
| Photoverse: Tuning-free image customization with text-to-image diffusion models | 2023 | Few-shot personalised face portrait generation | link | link |
| Fastcomposer: Tuning-free multi-subject image generation with localized attention | 2024 | fast subject-driven face text-to-image | link | link |
| Moa: Mixture-of-attention for subject-context disentanglement in personalized image generation | 2024 | multi-concept face portrait generation | link | NA |
| Photomaker: Customizing realistic human photos via stacked id embedding | 2024 | high-fidelity face personalisation | link | link |
| Face0: Instantaneously conditioning a text-to-image model on a face | 2023 | Identity-preserving face text-to-image | link | NA |
| Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models | 2023 | face instant personalisation | link | link |
| Dreamidentity: Improved editability for efficient face-identity preserved image generation | 2023 | face identity-guided generation | link | NA |
| Portraitbooth: A versatile portrait model for fast identity-preserved personalization | 2024 | face few-shot portrait generation | link | NA |
| Instantid: Zero-shot identity-preserving generation in seconds | 2024 | face real-time personalisation | link | link |
| ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | 2024 | face identity-consistent generation | link | link |
| Facestudio: Put your face everywhere in seconds | 2023 | face ID & style controllable text-to-image | link | link |
| IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models | 2024 | identity-aware face editing | link | NA |
| Arc2face: A foundation model for id-consistent human faces | 2024 | identity-conditioned face generation | link | link |
| Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model | 2024 | General identity-conditioned face generation | link | link |
| Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance | 2025 | Identity-conditioned 3D head / avatar generation | link | link |
| ClipSwap: Towards High Fidelity Face Swapping via Attributes and CLIP-Informed Loss | 2024 | Face swapping | link | NA |