Question Bank Common Question Features

AI Evaluation

How AI-powered automatic evaluation works and how to write effective grading prompts.

Updated 2026/06/03

AI Evaluation lets you automatically score open-ended responses using a large language model. Instead of manually reviewing every submission, you provide a grading prompt and the AI evaluates each candidate’s response against it.

Supported Question Types

AI evaluation is available on the following input question types:

  • Short Answer — evaluate brief text responses
  • Long Answer — evaluate extended written responses
  • Code — evaluate code submissions in any supported language
  • Audio — transcribe and evaluate spoken responses
  • Video — evaluate video responses

How It Works

  1. You enable AI evaluation on the question and write a grading prompt that instructs the AI on what to look for.
  2. When a candidate submits a response, the AI processes it against your prompt and returns a score (as a success rate between -100% and 100%) and a feedback text explaining the evaluation.
  3. The success rate is multiplied by the question’s point multiplier to produce the final score — the same mechanism as all other question types.
  4. For audio responses, the AI also produces a transcription of the spoken content alongside the evaluation.

Writing an Effective Grading Prompt

The quality of AI evaluation depends entirely on the quality of your grading prompt. A good prompt:

  • Clearly states what a correct or high-quality answer looks like
  • Lists the key criteria the response must meet
  • Specifies the expected language, format, or depth where relevant
  • States how to handle partially correct answers
📷 Screenshot: AI evaluation settings with grading prompt field
Recommended: 1200 × 700 px PNG

Reviewing AI Results

After an assessment, AI evaluation results are visible in the session report alongside the AI’s feedback text and (for audio) the transcription. Reviewers can override the AI score manually if needed.

AI evaluation is best treated as a first-pass scorer that handles the bulk of the work. For high-stakes assessments, plan to have a human reviewer sample-check AI-scored responses.
Tip