This website is currently only used as a personal recorder of paper reading.
The Campfire Tales
Please see original post for full version. Neural Tangent Kernel (NTK) is well known as a powerful tool that elegantly proves the convergence and generalisation of neural networks. However, as an engineering student who basically know nothing about mathematics, I find the original NTK paper a little bit opaque to understand. Driven by my intrinsic inquisitiveness (“vegetable but addicted”), I spent some time trying to figure out what on earth is NTK with every effort and managed to get an impression on that at last. This article aims to help people who are in the same boat, vegetable, but still struggling for a free ride in deep learning theory.
Only includes thoroughly read papers. Full collection is much larger than this.
DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation (arXiv 2306)
How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images (arXiv 2306)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day (arXiv 2306)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering (arXiv 2305)
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey (arXiv 2305)
UniverSeg: Universal Medical Image Segmentation (arXiv 2304, ICCV 2023)
Customized Segment Anything Model for Medical Image Segmentation (arXiv 2304)
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation (arXiv 2304)
MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization (arXiv 2303)
Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing (arXiv 2303)
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts (arXiv 2302)
[Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning (TMI 2023)](Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning | IEEE Journals & Magazine | IEEE Xplore)
Diffusion Deformable Model for 4D Temporal Medical Image Generation (arXiv 2206, MICCAI 2022)
[Medical Visual Question Answering via Conditional Reasoning (ACM MM 2020)](Medical Visual Question Answering via Conditional Reasoning | Proceedings of the 28th ACM International Conference on Multimedia)
[Overcoming Data Limitation in Medical Visual Question Answering (arXiv 1909, MICCAI 2019)]([1909.11867] Overcoming Data Limitation in Medical Visual Question Answering (arxiv.org))
LLaVA-ϕ: Efficient Multi-Modal Assistant with Small Language Model (arXiv 2401)
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices (arXiv 2312)
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models (arXiv 2312)
PaLI-3 Vision Language Models: Smaller, Faster, Stronger (arXiv 2310, ICLR 2024 submission)
Improved Baselines with Visual Instruction Tuning (arXiv 2310)
Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic (arXiv 2306)
Kosmos-2: Grounding Multimodal Large Language Models to the World (arXiv 2306)
Improving CLIP Training with Language Rewrites (arXiv 2305, NeurIPS 2023)
PaLI-X: On Scaling up a Multilingual Vision and Language Model (arXiv 2305)
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (arXiv 2305)
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model (arXiv 2304)
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (arXiv 2304)
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention (arXiv 2303)
Language Is Not All You Need: Aligning Perception with Language Models (arXiv 2302)
Scaling Language-Image Pre-training via Masking (arXiv 2212, CVPR 2023)
PaLI: A Jointly-Scaled Multilingual Language-Image Model (arXiv 2209, ICLR 2023)
Flamingo: a Visual Language Model for Few-Shot Learning (arXiv 2204, NeurIPS 2022)
SLIP: Self-supervision meets Language-Image Pre-training (arXiv 2112, ECCV 2022)
LiT: Zero-Shot Transfer with Locked-image Text Tuning (arXiv 2111, CVPR 2022)
FILIP: Fine-grained Interactive Language-Image Pre-Training (arXiv 2111, ICLR 2022)
An Empirical Study of Training End-to-End Vision-and-Language Transformers (arXiv 2111, CVPR 2022)
Multimodal Few-Shot Learning with Frozen Language Models (arXiv 2106, NeurIPS 2021)
LoRA: Low-Rank Adaptation of Large Language Models (arXiv 2106, ICLR 2022)
Learning Transferable Visual Models From Natural Language Supervision (arXiv 2103, ICML 2021)