🌻 A formalisation of causal mapping

Abstract

Draft for an IJSRM submission. This paper proposes a lightweight grammar and logic for encoding, aggregating, and querying causal claims found in qualitative text data.

The specification is grounded in a "Minimalist" (or "Barefoot") approach to causal coding: it prioritises capturing the explicit causal claims made by sources, without imposing complex theoretical frameworks that may not align with how people naturally speak.

1. Data Structures#

The foundation of the specification is the Project Data package, which strictly separates the causal claims from the source material.

Definition Rule DS-PROJECTDATA: Project Data#

Syntax Rule DS-LINKS: The Basic Links Table

A list of causal connections where each row represents one atomic claim.

Syntax Rule DS-SOURCES: The Sources Table (optional)#

A registry of documents, interviews, or respondents.

Coding is strictly evidence-based.

Definition Rule DS-FACTORS: The Factors Derivation#

There is no independent table for Factors stored in the Project. Factors are derived entities.

2. Coding and Semantics#

This section defines how text is translated into the data structures defined above.

Definition Rule COD-DEF: The Definition of Coding#

Coding is the process of extracting links from source text into the Links table.

Semantics Rule COD-ATOM: The Atomic Causal Claim#

A single row in the Links table, Link | Cause="A" | Effect="B" | Source_ID="S1", is interpreted semantically as:

Semantics Rule COD-BARE: Bare Causation#

The relationship A -> B implies influence, not determination.

Example:

Semantics Rule COD-MIN: Minimalist Coding Principles#

The coding schema prioritizes explicit claims over complex theoretical frameworks.

Surface cognition

A table containing multiple links simply asserts the logical conjunction of the links:

So, unless specific contexts are specified, if Source S1 claims that A -> C and also that B -> C, this is neither a contradiction nor (by default) a claim that A-and-B jointly influenced C as a package. It is simply two separate claims.

3. The Filter Pipeline (Query Language)#

Analysis is performed by passing a links table through a sequence of filters.

Syntax Rule FIL-PIPE: The Semantic Pipeline#

The meaning of a result is defined by the cumulative semantic restrictions or transformations of the filters applied.

Input 
    |> Filter1 
    |> Filter2 
    |> Output

Types of Filters#

In practice (as in the app), filter behaviour is often multi-effect. For example, a filter may rewrite labels and add tracking columns. So instead of forcing each filter into exactly one β€œtype”, we treat each filter as having an effect signature:

Below, filters are grouped by their primary intent, and each rule declares its Effects: line.

Row-selection filters#

Syntax Rule FIL-CTX: Context Filters#

Reduces the evidentiary base based on Source metadata.

Syntax Rule FIL-FREQUENCY: Content Filters#

Reduces the evidentiary base based on signal strength.

Syntax Rule FIL-TOPO: Topological Filters#

Retains links based on their position in a causal chain.

Label-rewrite filters#

Semantics Rule FIL-ZOOM: The Zoom Filter (Hierarchical Syntax)#

Extends the logic to handle nested concepts via a separator syntax.

Semantics Rule FIL-OPP: The Combine Opposites Filter (Bivalence Syntax)#

Extends the logic to handle polarity/negation.

Column-enrichment filters#

Syntax Rule FIL-BUNDLE: The Bundling Filter#

This filter aggregates co-terminal links (links with the same cause and effect) to calculate evidence metrics without reducing the row count. We normally think of it as being automatically applied after any other filter.

We measure importance using two distinct metrics:

Output filters#

Output Rule OUT-FACTORS: Factors table view#

Returns a Factors table (one row per factor) derived from a Links table (typically after FIL-BUNDLE).

Output Rule OUT-MAP: Graphical map view#

Returns a graphical network view of the current Links table.

Definition Rule MET-NODE: Factor Role Metrics#

These metrics describe the topological role of a factor.

4. Example Queries#

Example A: The "Drivers" Query Question: What do female participants say are the main drivers of Income?

Result = ProjectData
  |> filter_sources | Gender="Female"         // Rule FIL-CTX
  |> trace_paths | to="Income" | steps=1      // Rule FIL-TOPO
  |> filter_links | min_citations=2           // Rule FIL-FREQUENCY 


Example B: The "Mechanism" Query Question: Is there valid narrative evidence that Training leads to Better Yields?

Result = ProjectData
  |> transform_labels | zoom_level=1                 // Rule FIL-ZOOM
  |> trace_paths | from="Training" | to="Yield" | thread_tracing=TRUE   // Rule FIL-TOPO + Rule INF-THREAD

5 Causal Inference?#

Inference Rule INF-EVID: Evidence is not effect size#

We quantify evidence strength, not causal effect strength.

Inference Rule INF-FACT: Factual Implication#

If we observe Link | Cause="A" | Effect="B" | Source_ID="S1":

Inference Rule INF-THREAD: Thread Tracing (Valid Transitivity)#

We can infer a long causal chain (indirect influence) only if one source provides every step.

Inference Rule INF-CTX: The Context Rule (The Transitivity Trap)#

We cannot infer causal chains by stitching together different sources without checking context.

Appendix A: AI Extensions#

These filters extend the core logic using probabilistic AI models (Embeddings or Clustering).

Semantics Rule FIL-SOFT: The Soft Recode Filter#

Extends logic using semantic similarity (vector embeddings).

Semantics Rule FIL-AUTO: The Auto Recode Filter#

Extends logic using unsupervised clustering.