Notes for papers presented during our paper reading sessions.
Papers:
- Efficiently Modeling Long Sequences with Structured State Spaces [paper] [slides]
- Simplified State Space Layers for Sequence Modeling [paper] [slides]
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces [paper] [slides]
- Representation Learning on Hyper-Relational and Numeric Knowledge Graphs with Transformers [paper] [slides]
- Chemical-Reaction-Aware Molecule Representation Learning [paper] [slides]
- Towards Foundation Models for Knowledge Graph Reasoning [paper]
- Pre-training Molecular Graph Representation with 3D Geometry [paper] [slides]
- AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning [paper] [slides]
- Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing [paper] [slides]
- Translation between Molecules and Natural Language [paper] [slides]
- SpecFormer: Spectral Graph Neural Networks Meet Transformers [paper] [slides]
- FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning [paper] [slides]
- Context-Enriched Molecule Representations Improve Few-Shot Drug Discovery [paper]
- Cooperative Graph Neural Networks [paper]
- Bi-Level Contrastive Learning for Knowledge Enhanced Molecule Representations [paper] [slides]
- Equivariant Subgraph Aggregation Networks [paper] [slides]
- Prodigy: Enabling In-context Learning Over Graphs [paper] [slides]
- Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction [paper] [slides]
- Knowledge graph-enhanced molecular contrastive learning with functional prompt [paper] [slides]
- Multi-Grained Multimodal Interaction Network for Entity Linking [paper] [slides]
- Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific Adaptation [paper] [slides]
- PolyGCL: Graph Contrastive Learning via Learnable Spectral Polynomial Filters [paper] [slides]
- Less is More: One-Shot-Subgraph Link Prediction on Large-Scale Knowledge Graph [paper] [slides]
- Vision Transformers Need Registers [paper]
- Denoising Diffusion Probabilistic Models [paper]
- Mitigating Memorisation in Diffusion Models [paper]
- Interpreting CLIP’s Image Representation via Text-Based Decomposition [paper]
- Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion [paper]
- Do Transformers Really Perform Bad for Graph Representation? (Graphormer) [paper] [slides]
- Rank-N-Contrast [paper] [slides]
- EGRU [paper]
- Deep Bidirectional Language-Knowledge Graph Pretraining (DRAGON) [paper] [slides]
- Resurrecting Recurrent Neural Networks for Long Sequences (LRU) [paper]
- Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models [paper]
- Linguistic Binding in Diffusion Models [paper]
- Evolutionary Optimization of Model Merging Recipes [paper] [slides]
- Editing Models with Task Arithmetic [paper] [slides]
- WARM: On the Benefits of Weight Averaged Reward Models [paper] [slides]
- Rewarded Soups: Towards Pareto-Optimal Alignment by Interpolating Weights Fine-Tuned on Diverse Rewards [paper] [slides]
- Arcee’s MergeKit: A Toolkit for Merging Large Language Models [paper] [slides]
- TIES-Merging: Resolving Interference When Merging Models [paper] [slides]
- LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition [paper] [slides]
- ZipIt! Merging Models from Different Tasks without Training [paper] [slides]
- Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization [paper] [slides]
- Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries [paper] [slides]
- MOPO: Model-based Offline Policy Optimization [paper] [notes]
- DETR: End-to-End Object Detection with Transformers [paper] [notes]
- Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation [paper] [notes]
- PipeDream: Generalized Pipeline Parallelism for DNN Training [paper] [notes]
- Lottery Ticket Hypothesis [paper] [notes]
- Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies [paper] [notes]
- Designing and Interpreting Probes with Control Tasks [paper] [notes]
- What Does BERT Learn about the Structure of Language? [paper] [notes]
- GNN [notes]
- Universal Adversarial Triggers [paper] [notes]
- Confidence-Aware Learning for Deep Neural Networks (CRL) [paper] [notes]
- A How-to-Model Guide for Neuroscience [paper] [notes]
- Neural ODEs [paper] [notes]
- Model based Reinforcement Learning [notes]
- Learning to describe scenes with programs [paper] [notes]
- DeepSynth: Automata Synthesis for Automatic Task Segmentation in RL [paper] [notes]
- Model free conventions in MARL with Heterogeneous Preferences [paper] [notes]
- Accelerating Reinforcement Learning with Learned Skill Priors [paper] [notes]
- Progressive Domain Adaptation for Object Detection [paper] [notes]
- Convolutional Networks with Adaptive inference Graphs [paper] [notes]