Multi-modal 11
- [논문 리뷰] ControlNet, Adding Conditional Control to Text-to-Image Diffusion Models
- [논문 리뷰] Med-PaLM M, Towards Generalist Biomedical AI
- [논문 리뷰] PaLM-E: An Embodied Multimodal Language Model
- [논문 리뷰] BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
- [논문 리뷰] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- [논문 리뷰] GLIP, Grounded Language-Image Pre-training
- [논문 리뷰] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- [논문 리뷰] DALL-E 2, Hierarchical Text-Conditional Image Generation with CLIP Latents (unCLIP)
- [논문 리뷰] DINOv2: Learning Robust Visual Features without Supervision
- [논문 리뷰] DALL-E, Zero-Shot Text-to-Image Generation
- [논문리뷰] CLIP, Learning Transferable Visual Models From Natural Language Supervision