Directory of Links By Section
Last updated
Last updated
– The foundational paper on Transformers.
– Describes LLMs ability to perform a wide range of tasks with few-shot learning.
– Explores how chain-of-thought prompting improves reasoning in LLMs.
– Shows how generating multiple responses can improve reasoning consistency.
– Introduces the Tree of Thought approach to enhance reasoning capabilities in LLMs.
– Injecting Logic into Contexts for Full Reasoning in Large Language Models
- This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners.
– Introduces RAG, a method combining retrieval-based and generation-based models to enhance the performance on knowledge-intensive tasks.
– Foundational paper on knowledge distillation.
– Overview paper on ensemble methods and voting systems.
– This paper discusses the use of Stochastic Gradient Descent (SGD) for Machine Learning.
– This paper introduces Reinforcement Learning from Human Feedback (RLHF).
– Official documentation for the AdamW optimizer in PyTorch, widely used for fine-tuning models.
– The paper introducing Layer-wise Fine-Tuning (LIFT) for model adaptation.
– A popular library for working with pre-trained transformer models, including support for fine-tuning tasks.
– The original paper discussing Low-Rank Adaptation (LoRA), a technique for efficient fine-tuning of large pre-trained models.
– Documentation for the PEFT (Parameter-Efficient Fine-Tuning) technique, including implementation and configurations related to LoRA and other efficient fine-tuning approaches.
– The Rotten Tomatoes dataset, used for sentiment analysis and fine-tuning tasks in this example.
– A popular library for working with pre-trained transformer models, including support for fine-tuning tasks.
- Official documentation for PyTorch, including tools for implementing perplexity calculations.
- Official documentation for the NLTK library, which includes BLEU score calculation.
- GitHub repository for the ROUGE metric, used for evaluating text summaries.
- NLTK’s official documentation for METEOR score, an alternative to BLEU.
- GitHub repository for the Python-Levenshtein package, which implements Levenshtein distance.
- GitHub repository for scikit-learn, which provides implementations of F1 score and other classification metrics.
- A widely used benchmark in NLP that tests models on tasks such as sentiment analysis, textual entailment, and factual question answering to assess general-purpose language understanding.
- An advanced version of GLUE that includes more complex tasks like commonsense reasoning, multi-hop inference, and complex question answering to push the boundaries of LLM capabilities.
- A benchmark designed to evaluate LLMs on 57 diverse tasks, from elementary school-level math to advanced subjects like law and medicine, testing reasoning and domain-specific knowledge.
- A benchmark for evaluating a model's reading comprehension and question-answering capabilities, with two versions: SQuAD 1.1 (answer extraction) and SQuAD 2.0 (including unanswerable questions).
- A benchmark for testing commonsense reasoning and contextual understanding by predicting the most likely continuation of incomplete sentences or narratives in multiple-choice format.
- A benchmark focused on coreference resolution, particularly resolving pronouns in complex sentences, to assess a model's ability to understand relationships between entities in a document.