spell-checkDirectory of Links By Section

Introduction

Prompt Engineering

Neuro-Symbolic Methods

Retrieval-Augmented Generation (RAG)

Honorable Mentions

Fine-Tuning

Full Parameter Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT)

Task specific Evaluation Metrics

  • GLUE (General Language Understanding Evaluation)arrow-up-right - A widely used benchmark in NLP that tests models on tasks such as sentiment analysis, textual entailment, and factual question answering to assess general-purpose language understanding.

  • SuperGLUEarrow-up-right - An advanced version of GLUE that includes more complex tasks like commonsense reasoning, multi-hop inference, and complex question answering to push the boundaries of LLM capabilities.

  • MMLU (Massive Multitask Language Understanding)arrow-up-right - A benchmark designed to evaluate LLMs on 57 diverse tasks, from elementary school-level math to advanced subjects like law and medicine, testing reasoning and domain-specific knowledge.

  • SQuAD (Stanford Question Answering Dataset) arrow-up-right- A benchmark for evaluating a model's reading comprehension and question-answering capabilities, with two versions: SQuAD 1.1 (answer extraction) and SQuAD 2.0 (including unanswerable questions).

  • HellaSwagarrow-up-right - A benchmark for testing commonsense reasoning and contextual understanding by predicting the most likely continuation of incomplete sentences or narratives in multiple-choice format.

  • WinoGrande arrow-up-right- A benchmark focused on coreference resolution, particularly resolving pronouns in complex sentences, to assess a model's ability to understand relationships between entities in a document.

Last updated