Directory of Links By Section

Introduction

Prompt Engineering

Neuro-Symbolic Methods

Retrieval-Augmented Generation (RAG)

Honorable Mentions

Fine-Tuning

Full Parameter Fine-Tuning

  • AdamW Documentation – Official documentation for the AdamW optimizer in PyTorch, widely used for fine-tuning models.

  • LIFT – The paper introducing Layer-wise Fine-Tuning (LIFT) for model adaptation.

  • Hugging Face Transformers Library – A popular library for working with pre-trained transformer models, including support for fine-tuning tasks.

Parameter-Efficient Fine-Tuning (PEFT)

  • LoRA (Low-Rank Adaptation) Paper – The original paper discussing Low-Rank Adaptation (LoRA), a technique for efficient fine-tuning of large pre-trained models.

  • PEFT Documentation – Documentation for the PEFT (Parameter-Efficient Fine-Tuning) technique, including implementation and configurations related to LoRA and other efficient fine-tuning approaches.

  • Rotten Tomatoes Dataset – The Rotten Tomatoes dataset, used for sentiment analysis and fine-tuning tasks in this example.

  • Hugging Face Transformers Library – A popular library for working with pre-trained transformer models, including support for fine-tuning tasks.

Task specific Evaluation Metrics

  • GLUE (General Language Understanding Evaluation) - A widely used benchmark in NLP that tests models on tasks such as sentiment analysis, textual entailment, and factual question answering to assess general-purpose language understanding.

  • SuperGLUE - An advanced version of GLUE that includes more complex tasks like commonsense reasoning, multi-hop inference, and complex question answering to push the boundaries of LLM capabilities.

  • MMLU (Massive Multitask Language Understanding) - A benchmark designed to evaluate LLMs on 57 diverse tasks, from elementary school-level math to advanced subjects like law and medicine, testing reasoning and domain-specific knowledge.

  • SQuAD (Stanford Question Answering Dataset) - A benchmark for evaluating a model's reading comprehension and question-answering capabilities, with two versions: SQuAD 1.1 (answer extraction) and SQuAD 2.0 (including unanswerable questions).

  • HellaSwag - A benchmark for testing commonsense reasoning and contextual understanding by predicting the most likely continuation of incomplete sentences or narratives in multiple-choice format.

  • WinoGrande - A benchmark focused on coreference resolution, particularly resolving pronouns in complex sentences, to assess a model's ability to understand relationships between entities in a document.

Last updated