šŸ”„ Applications of Foundation Models in Biometrics

In this section, we review recent papers on the applications of foundation models in biometrics:

Foundation Models for Biometric Recognition

Paper Title Year Modality / Task Paper Code
Exploring wav2vec 2.0 on speaker verification and language identification 2020 speaker and language identification link NA
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities 2024 face verification, gender detection, age estimation link NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability 2024 face verification link NA
ChatGPT Meets Iris Biometrics 2024 iris recognition link NA
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition 2025 face verification link NA
Benchmarking Foundation Models for Zero-Shot Biometric Tasks 2025 face verification, soft biometric attribute prediction (gender and race), iris recognition, iris presentation attack detection, face morph detection, and face deepfake detection link NA
A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding 2021 speaker verification link link
Iris-SAM: Iris Segmentation Using a Foundation Model 2024 iris segmentation link link
SAM-Iris: A SAM-Based Iris Segmentation Algorithm 2025 iris segmentation link NA
Froundation: Are foundation models ready for face recognition? 2024 face recognition link link
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding 2025 audio-visual human video recognition (emotion recognition, expression description, and action understanding) link link
FaceLLM: A Multimodal Large Language Model for Face Understanding 2025 face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting link link
FaceXBench: Evaluating Multimodal LLMs on Face Understanding 2025 face recognition, anti-spoofing, deepfake detection, attribute prediction, expression, parsing, pose, crowd counting link link
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants 2025 facial attributes, age estimation, expression recognition, attack detection, recognition; human attributes, action, spatial/social relations, re-ID link link
From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing 2024 face recognition explainability link NA
FaceOracle: Chat with a Face Image Oracle 2025 face image quality assessment link NA
Unispeech-sat: Universal speech representation learning with speaker aware pre-training 2022 speaker ID, verification, diarization, phoneme recognition, keyword spotting, emotion recognition link link
Large-scale self-supervised speech representation learning for automatic speaker verification 2022 speaker verification link link
General facial representation learning in a visual-linguistic manner 2022 face parsing, alignment, attribute recognition link link
Marlin: Masked autoencoder for facial video representation learning 2023 face attribute recognition, expression recognition, deepfake detection, lip synchronization link link
Self-Supervised Facial Representation Learning with Facial Region Awareness 2024 face expression and attribute recognition link link
Pose-disentangled contrastive learning for self-supervised facial representation 2023 face expression, face recognition, head pose estimation link link
Pros: Facial omni-representation learning via prototype-based self-distillation 2024 face parsing, attribute recognition, emotion detection, landmark detection link link
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces 2024 face expression change, weight change, age change estimation link NA
SwinFace: a multi-task transformer for face recognition, expression recognition, age estimation and attribute estimation 2023 face attributes, age estimation, expression recognition, face recognition link link
FaceXFormer: A Unified Transformer for Facial Analysis 2024 face parsing, landmarks, head pose estimation, age/gender/race estimation, attribute recognition, expression recognition, link link
Task-adaptive Q-Face 2024 head pose estimation, face attribute recognition, age estimation, expression recognition link NA
Faceptor: A generalist model for face perception 2024 face parsing, landmarks, age and gender estimation, attribute recognition, expression recognition, face recognition link link

Foundation Models for Soft-biometric Detection

Paper Title Year Modality / Task Paper Code
Robust light-weight facial affective behavior recognition with clip 2024 facial expression classification; action unit detection link link
Cliper: A unified vision-language framework for in-the-wild facial expression recognition 2024 face static & dynamic expression recognition link link
Emoclip: A vision-language method for zero-shot video facial expression recognition 2024 video facial emotion recognition link link
Finecliper: Multi-modal fine-grained clip for dynamic facial expression recognition with adapters 2024 dynamic facial expression recognition link NA
Face-mllm: A large face perception model 2024 face age/gender, expression, action units, attributes link NA
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces 2024 face 3DMM parameter generation link NA
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs 2025 face attribute detection link link
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning 2025 face expression recognition, action unit detection, facial attribute detection, age estimation, and deepfake detection link link
FaceInsight: A Multimodal Large Language Model for Face Perception 2025 face attribute recognition, age/ gender/ race estimation, and expression prediction link NA
R1-omni: Explainable omni-multimodal emotion recognition with reinforcement learning 2025 audio-visual emotion recognition with reasoning link link
ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities 2024 face gender detection, age estimation link NA
How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability 2024 age, gender, ethnicity, hair color link NA
ChatGPT Meets Iris Biometrics 2024 iris–face matching; soft-biometrics link NA

Foundation Models for Deepfake and Forgery Detection

Paper Title Year Modality / Task Paper Code
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection 2024 face forgery detection link link
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection 2024 face forgery detection link link
MADation: Face Morphing Attack Detection with Foundation Models 2025 face morph attack detection link link
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning 2024 deepfake detection, anti-spoofing, unseen diffusion forgery link link
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation 2022 voice spoofing & deepfake detection link link
X2-dfd: A framework for explainable and extendable deepfake detection 2024 face deepfake detection link link
Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant 2024 forgery analysis assistant link link
Towards general visual-linguistic face forgery detection (v2) 2025 face forgery detection link link
Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection 2024 face morph attack detection link link
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models 2024 speaker deepfake detection link NA
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector 2025 face deepfake detection \newline+ description link link
Standing on the shoulders of giants: Reprogramming visual-language model for general deepfake detection 2025 face deepfake detection link link
Can chatgpt detect deepfakes? a study of using multimodal large language models for media forensics 2024 face deepfake detection link link
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception 2024 audio-visual deepfake detection link NA
ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models 2025 face morph detection link NA

Foundation Models for Anti-spoofing

Paper Title Year Modality / Task Paper Code
Flip: Cross-domain face anti-spoofing with language guidance 2023 fine‐tune CLIP image encoder for face (FLIP alignment) link link
On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection 2023 SSL via masked‐fingerprint prediction with prompt tuning link NA
CPL-CLIP: Compound Prompt Learning for Flexible-Modal Face Anti-Spoofing 2024 face anti-spoofing link NA
Fm-clip: Flexible modal clip for face anti-spoofing 2024 cross‐modal antispoofing link NA
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection 2024 Unified physical-digital face attack detection link NA
Cfpl-fas: Class free prompt learning for generalizable face anti-spoofing 2024 face anti-spoofing link NA
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing 2025 face anti-spoofing link link
Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing 2025 Multimodal face anti-spoofing link link
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models 2025 face anti-spoofing (classification and attack localization) link link
Interpretable face anti-spoofing: Enhancing generalization with multimodal large language models 2025 face anti-spoofing link NA
Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning 2025 face anti-spoofing (spoofing detection and reasoning) link NA
VL-FAS: Domain Generalization via Vision-Language Model For Face Anti-Spoofing 2024 face anti‐spoofing link NA
FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection 2025 face anti‐spoofing link link
Towards Iris Presentation Attack Detection with Foundation Models 2025 iris anti‐spoofing link NA
Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning 2025 face presentation attack detection link link
Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? 2025 face presentation attack detection link link
Shield: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models 2025 face anti-spoofing (RGB, infrared, depth) and forgery detection link link
ChatGPT Meets Iris Biometrics 2024 iris presentation‐attack detection link NA

Foundation Models for Synthetic Biometric Generation

Paper Title Year Modality / Task Paper Code
Toward open-world text-driven face generation and manipulation via stylegan3 2024 Text-to-face synthesis link NA
AnyFace++: A unified framework for free-style text-to-face synthesis and manipulation 2024 Text-guided face editing link NA
AnyFace: Free-style text-to-face synthesis and manipulation 2022 Text-to-face generation link NA
Towards counterfactual image manipulation via clip 2022 Controllable text-to-face link link
Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation 2024 Prompt-based face synthesis link NA
Tecm-clip: Text-based controllable multi-attribute face image manipulation 2022 face attribute / expression editing link link
Stylemc: Multi-channel based fast text-guided image generation and manipulation 2022 face multi-attribute editing link link
Photoverse: Tuning-free image customization with text-to-image diffusion models 2023 Few-shot personalised face portrait generation link link
Fastcomposer: Tuning-free multi-subject image generation with localized attention 2024 fast subject-driven face text-to-image link link
Moa: Mixture-of-attention for subject-context disentanglement in personalized image generation 2024 multi-concept face portrait generation link NA
Photomaker: Customizing realistic human photos via stacked id embedding 2024 high-fidelity face personalisation link link
Face0: Instantaneously conditioning a text-to-image model on a face 2023 Identity-preserving face text-to-image link NA
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models 2023 face instant personalisation link link
Dreamidentity: Improved editability for efficient face-identity preserved image generation 2023 face identity-guided generation link NA
Portraitbooth: A versatile portrait model for fast identity-preserved personalization 2024 face few-shot portrait generation link NA
Instantid: Zero-shot identity-preserving generation in seconds 2024 face real-time personalisation link link
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning 2024 face identity-consistent generation link link
Facestudio: Put your face everywhere in seconds 2023 face ID & style controllable text-to-image link link
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models 2024 identity-aware face editing link NA
Arc2face: A foundation model for id-consistent human faces 2024 identity-conditioned face generation link link
Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model 2024 General identity-conditioned face generation link link
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance 2025 Identity-conditioned 3D head / avatar generation link link
ClipSwap: Towards High Fidelity Face Swapping via Attributes and CLIP-Informed Loss 2024 Face swapping link NA

This site uses Just the Docs, a documentation theme for Jekyll.