🌻 Workflows gdoc

24 May 2025

https://docs.google.com/document/d/1nexgcqgtK-nTThCot7f35An1TRi3tdp4i4ZrHoPnfbc/edit?tab=t.0

**

Making sense of mountains of text:#

The Causal Map Workflows app.

What is Workflows?

Workflows is a tool used by Causal Map Ltd to make sense of large amounts of text data (interviews, reports) supported by generative AI. 

At the moment, use of Workflows is not open for public use. Instead, we use it internally to provide verifiable and transparent analysis and reporting to clients. Clients can then access their workflows in the app to view and, if they wish, verify each step.

Which questions can Workflows help answer?

What do you get?

For Evaluators and Contractors

Why Causal Map Ltd? 

Frequently Asked Questions#

What does it cost?#

We charge by day of our time, £695+VAT/day. 

Most of the work isn’t using the app. Most of our time is spent finding out what you want to know and what findings can help you, and iterating our reports until you are happy. 

Caveats#

Like humans, the AI processor makes mistakes and needs to be monitored. If you want an overall picture, the "noise" it can produce probably won't matter and you can get very good results almost out of the box. However at the moment, if you want to be sure that each individual piece of coding is correct, you'll probably need to tweak a not insignificant proportion of the coding it produces. 

Why causal mapping?#

At Causal Map Ltd., we promote causal mapping as a really useful and effective way of processing text which can be applied with surprisingly little adjustment to a great variety of evaluation tasks. Our Workflows software provides ready-made causal mapping steps, which can be used alone or seamlessly alongside other text processing tasks like thematic analysis. And the software can of course produce graphical causal maps. As the software can also be used without any causal mapping steps or visualisations, it could In principle be seen as also occupying a similar space to any of the market-leader CAQDAS applications which now offer AI assistance, but with more explicit, transparent workflows. However it is not our intention to compete directly with this kind of general-use software.

What is wrong with using AI as a "black box"?#

At Causal Map Ltd, we use AI to both collect and analyse qualitative data. 🤖 We believe that the use of AI should also follow published guidelines to ensure transparent and valid results.

We are not comfortable with procedures which rely on giving the AI the freedom to make evaluative judgements. So we do not, for example, ask the AI “what are the main or most important causal stories in the document”. We do not ask the AI to make summaries. Instead, we use the AI only as a tireless low-level assistant to exhaustively and transparently identify large numbers of individual causal claims (ideally, every single one of them) within the texts.

🔍 We follow established qualitative social science procedures for coding, with the aim that each step of the process from coding to analysis is transparent and verifiable. We don’t rely on AI to simplify the coded data or decide, for example, which themes are most important.

AI-Powered automatic causal coding#

The app is enabled with OpenAI's GPT-4o and 4o-mini and other models to automatically identify causal claims and connections within qualitative data like interview transcripts and documents.

The Story of Workflows#

An app for making sense of text (interviews, reports …) at scale. Using the power of genAI to synthesise and visualise the meaning of texts and answer high-level questions about them but with a verifiable data audit trail, step by step from text to final analysis.

A lot of evaluation work involves making sense of texts (interviews, reports…). 

That’s hard to do at any kind of scale, which often means using the most convenient and easy-to-reach sources and ignoring the rest. And it can involve weeks or months of work to create a coding framework and to code say 100 pages of text. 

Generative AI seems to provide a solution to scaling. 

That’s a big challenge and a big opportunity. But there is a big problem to solve first.

To allow an AI to make evaluative judgements on its own is to rely on a big black box which is not transparent, or reproducible, or trustworthy, or verifiable. We may as well ask some random person we meet on the street to write our report for us.

The solution should have these three parts.#

First: keep track using reproducible workflows.#

Real-life research and evaluation assignments involve multiple tasks and sub-tasks. Traditionally, quantitative scientists have used workflows to keep track of their working in a reproducible way. No more “I found the graph but it needs updating and I can’t find which version of the code produces it”. 

For example we can construct a documented workflow like a Jupytr workbook or an R-markdown document: a text document which includes the instructions for importing data, carrying out tasks and displaying output. It’s like a human-readable computer program which when run produces outputs like charts and tables reproducibly. 

Qualitative scientists too have been strong on documenting their research process, but as most of the steps involve human labour and human judgement, you can’t just “rerun” a qualitative researcher’s notebook and see the results reappear somewhere. 

Now, AI is blurring the boundaries between qualitative and quantitative work. Some "qualitative" processes like thematic coding can now be carried out by machine, with some caveats. But, hacking around on the prompt until we get some sort of an answer and pasting the answer into a document somewhere – that’s ok while we are learning and exploring, but it isn’t reproducible or verifiable. An evaluation commissioner wouldn’t accept quantitative work which we gave to a random person who claimed to be a statistician but who sent us some tables and graphs. And nor should they accept the results of “using an AI” if the working is not documented and verifiable. 

Some evaluators are already exploring how to apply reproducible, workflow-based procedures to AI-supported work, mainly with Python.

However, just because an AI workflow is documented does not mean it is transparent. Which brings us to the second demand.

Second: breaking down big, vague tasks into small, easy ones so that we don’t leave the AI to make big judgements all on its own#

How can we reformulate hard tasks into easy ones, so we don’t let the AI make significant evaluative judgements for us in an unverifiable way? Here’s how.

The use of rubrics is a good example of this break-it-down-then-build-it-up strategy - usually rubrics are applied by humans, but when we have many cases, rubric assignment is a great task to hand off to AIs because we can easily monitor their performance and tweak the prompt and the rubric until the AI becomes as good as a human.  

But how should I break down big tasks into smaller steps, and exactly which ones, and where and how should I employ an AI for the low-level steps, and when I am going to learn all that Python? This brings us to the third demand. 

Third: providing a set of basic tools for the small, easy-to-automate steps#

Workflow-based solutions are already being supported by AI service providers. Sort of. But if you ask, say, Google’s NotebookLM to process a large table, it may or may not really produce results for each row. 

Tools like this may produce plausible reports of their “thinking” which is a big step forward, but we don’t really know exactly what steps were really carried out, only what it feels like telling us. 

To do this properly, we need to learn to use genAI APIs and something like LangChain or Python and construct their own reproducible workflows using say a Jupytr notebook and selecting appropriate packages. But each kind of solution is different and can be time-consuming. It can be difficult to ensure or check that there are no black boxes left anywhere in the workflow. 

So the third part of the solution we need is:

Each step, whether AI-powered steps or ordinary data manipulation steps like sorting and filtering could also in principle be understood and followed by an army of human assistants. For example:

“Take these documents and break them up into paragraph-sized chunks. To each paragraph, add information about the age and role of the speaker. Ask the AI to decide whether each paragraph mentions changes in health behaviour. Reassemble all the examples into a single text, prepending each with the age and role of the speaker. Next, …”

All the work of handing off tasks to the AI, reconstructing the results, saving versions of the prompts and the workflow etc should not be carried out by a black-box AI but by ordinary computer software with published code. Only specific, low-level coding tasks should be given to the AI.#

So at Causal Map Ltd we looked around for alternatives but in the end decided we had to build our own solution.

Our solution: workflows.causalmap.app#

We present software (workflows.causalmap.app) which enables users to construct their own workflows for making sense of text, step by step from raw text to final outputs. Each workflow is simply a stored piece of text: a list of steps, one per line, described in a basic scripting language which is more or less human-readable.

Each line in the command editor has a data table associated with it. 

The workflow starts by importing data from storage, from a web page or a Qualia interview, or from a previously uploaded set of documents, resulting in a table of data. 

Each subsequent line operates on the current data table and applies a verb (pivot, filter, sort, code …) to that table, resulting in another table (or an output like a chart or a map). 

The user can click on any line to see the result of the workflow up to that point, usually the data table itself or alternatively a visualisation of it, like a causal map.

We can insert AI operations into the workflow at any point. We can save and re-use prompts which work line by line on the current data table. Each prompt returns one or more rows with whichever columns we ask for, which are reassembled into a new table. Using AI prompts becomes just like any other manipulation of a data table.

Features#

Currently, the following features are implemented:

Data input:

Steps useful for AI processing:

Causal mapping steps: All the same functionality as in the Causal Map app, e.g.

Common steps for manipulating tables:

Output steps:

Evaluative judgements and evaluator responsibility#

A clarification: We are not saying that breaking down evaluation tasks into this kind of workflow frees evaluation from human judgement. Quite the opposite. We think the discipline of being more explicit about how evaluative judgements are made is a good thing, whether the steps are carried out by humans (as in the past) or with AI assistance. If you didn’t have a clear workflow from data to judgements before AI, DON’T lean on the black box of the AI to cover that up. Instead, make the workflow explicit / human-verifiable and then use the AI to speed up the process. 

Even where a workflow does not include explicit human input at any stage, the decision to use this particular workflow, the details of its design, quality assurance checks, and, crucially, interpreting the results so they will be correctly interpreted by the evaluation audience - these are all the responsibility of the evaluator.

Current status of the app#

Technical details#

Backend: Python, Quart (asynchronous server for time-consuming processes), hosted at Heroku. NoSQL caching at MongoDB. SQL data stored at Heroku. Access to Causal Map SQL data. OpenAI API with various models available. Implicit generation of embeddings using OpenAI’s text-embedding-3-small, cached at MongoDB.

Frontend: Alpine, Axios, CodeMirror editor, Tabulator tables …

Why a new app?#

Causal Map 3 is written in R-Shiny which is not a platform of choice for AI. CM3 is a user-focused platform with lots of buttons and sliders and a fixed approach to applying filters. 

Whereas Workflows: 

How we do automated causal mapping#

We import the texts to be coded, together where appropriate with meta-data (such as age and gender of respondent etc) into Causal Map Workflows.

We agree on an approach for the coding e.g. 

Assuming we were successful in helping you turn your research aims into an appropriate research design, producing the results should now be quite straightforward, producing: 

We can produce additional and more detailed outputs for you, but this will mean additional time.

**