! Rubrics assessing robustness

20 Apr 2026

Phase 1: Testing AI Recall (The Coding Phase)

  • The Hypothesis: Adding custom columns for your three rubrics (Uniqueness, Triangulation, Factuality) to the initial AI coding prompt will increase the AI's reasoning burden, leading to a decrease in recall (fewer total links identified) compared to a standard prompt.
  • The Process: You will run the AI-assisted coding in two variations.
    • Variation A (Control): Uses your standard minimalist prompt, extracting just the causal links and basic metadata (like sentiment and type).
    • Variation B (Complex): Uses the expanded prompt, asking the AI to extract the links and simultaneously evaluate the text to populate the Uniqueness, Triangulation, and Factuality columns.
  • The Metric: You will compare the total number of links found (recall) between the two variations to measure the impact of the increased reasoning burden.

Phase 2: Links Assessment & Robustness (Post-Coding)

  • The Process: Once the text is coded and your custom columns are populated, you move into the formal Bundle Assessment phase. You group the individual causal claims into bundles (co-terminal links sharing the same cause and effect).
  • The Assessment: You use the assessment panel to review the bundles. Here, you calculate your 9-point Robustness score based on the combined performance of your three rubrics (Uniqueness + Triangulation + Factuality).
  • The Output: Based on this 9-point score, you decide whether the evidence is robust enough to collapse the raw claims into a single "assessed link." If the score is too low, you decline to create an assessed link

Draft

Use tree aid? choose a map and work on the bundles only on that map

Choose a dataset (maybe example-file or lonely in London to compare between:

  • our standard AI-coding approach (steve suggested mermaid, but I'm not really used to it, so I might go with our normal approach with sentiment and type columns)
  • using rubrics approach (come up with one after reading the articles) adding like 3 more columns

  • Experiment/Phase 1: Check how much the links coverage decreases with adding new columns/coding tasks i.e rubrics

Phase 2: Robustness/Certainness: how much certain about the causal claim is the respondent?

My question: do I also assess the links after raw coding or only check the links coverage with the new columns?

look at the aggregate numbers on each bundle —> like citation and source count and average sentiment