Agent Mindset
← back to notebook
Philosophy · Ruben

Learning in the open —
nothing hidden.

I'm a law researcher who started using AI and couldn't stop thinking about how it works. This page is my attempt to be honest about the logic behind this site — the tools I use, the claims I make, and how I think about this shift in practice.

Context windows fill up fast, and a bloated context makes agents slower, more expensive, and more error-prone. The practical disciplines: using /compact to summarise and compress before the window overflows, splitting work into small self-contained batches so each task starts clean, and routing sub-tasks to (sub)agents rather than doing everything in one long thread. Keeping context small is not a nice-to-have — it directly determines whether the agent stays on track or starts hallucinating earlier reasoning.

Ongoing: context hygiene, batching strategy, sub-agent routing.
Token usageSub-agents

Citation hallucination remains the hardest problem. Every research output needs source verification — the eval harness I'm building exists because I don't yet have a reliable automated way to catch hallucinated cases before they hit a filing.

Ongoing: citation eval harness.
HallucinationEvaluation

Privilege and confidentiality: cross-matter context leakage is a real failure mode. I broke this once (documented in the build log) and rebuilt with per-matter isolation. The tooling is improving but this is not a solved problem.

Ongoing: per-matter isolation.
PrivilegeIsolation

The pace of change means that what's true about model capabilities today may not be true in six months. I'm building the study modules to teach judgment, not specific tool instructions, because the tools will change.

Ongoing: judgment-first curriculum.
Pace of changeJudgment

The daily AI & Law digest is fully automated — curation, formatting, and delivery runs on a scheduled agent pipeline. I didn't want to manually select items each morning; I defined the criteria once and the system runs them daily.

Build log entries are drafted from my raw voice notes and written observations. The study modules were structured with an agent working from my outline — I specified the learning arc, the agent proposed the breakdown, I verified and edited each unit.

This site itself: designed with Claude, coded with Cursor, deployed on Vercel, updated with agents. The build log documents every step, including the parts that broke.

Stack: Claude · Cursor · n8n · Vercel · Legora
Every automation is logged in the build log with its failure rate.
Clauden8nCursorVercelDaily digest

I wanted to build a framework to assess the polish level of my own PhD thesis; a set of observable markers that distinguish finished academic writing from substantively finished yet ‘unpolished’ work. Prompting AI generally for writing polish can be helpful but I wanted more.

I contacted five PhD graduates and asked permission to run their theses through AI agents focused purely on writing style. All five agreed. I dispatched five agents simultaneously — one per thesis — each instructed to analyse the same dimensions: paragraph openings and transitions, flow between ideas, signposting, hedging, and repetition.

The five reports were synthesised into a single academic polish framework. Now when I pressure-test a section of my own thesis, I run it through this framework and get specific suggestions for where the prose needs work based on the weaknesses I know are part of my writing.

5 simultaneous agents · synthesised into one framework · used ongoing in thesis revision.