Agent Mindset
← back to notebook

Sources & Methods

Methods — how & why

Transparency is the point. Part of learning to work with AI is learning to make it accountable — knowing where a claim comes from, why a source was chosen, and what the model was told to do with it. I want this site to model that, not just talk about it.

Every external source that informs content here gets logged in extra-sources.json with an annotation explaining what it contributed. The CLAUDE.md file that governs how the AI assistant working on this site behaves has explicit sourcing rules — report before recording, Ruben decides what gets added. The methodology is in the tooling, not just stated.

All practitioners, articles, case law, and tools referenced across the courses are listed below. Short quotes are used under fair use; follow each link for the full work.

Practitioners

Agent Mindset

Real-world practitioners whose workflows and writing are cited in the Agent Mindset course lessons.

SW

Simon Willison

Creator of Datasette; long-time open-source developer who writes extensively about coding with LLMs

blog
A computer can never be held accountable. That's your job as the human in the loop.
Cited in Delegate, Then VerifySource
AK

Andrej Karpathy

Former Director of AI at Tesla; OpenAI founding member; creator of the Neural Networks: Zero to Hero lecture series

talk
The model has read a large chunk of the internet. It doesn't remember facts the way you do — it has learned statistical patterns across trillions of words. The training data is everything: what it knows, what it doesn't, and where it will confidently confabulate.
Cited in The Raw Material: Pretraining DataSource
LO

Long Ouyang et al. (OpenAI)

Research team at OpenAI; lead authors of InstructGPT, the paper that established RLHF as the standard approach to aligning large language models

paper
Our results show that fine-tuning with human feedback significantly improves outputs on a wide range of tasks — and that labelers strongly prefer InstructGPT outputs over those of GPT-3, despite InstructGPT having 100x fewer parameters.
Cited in From Base Model to Assistant: SFT and RLHFSource
DA

DeepSeek AI Research Team

Chinese AI research lab; authors of DeepSeek-R1, an open-source reasoning model that matched frontier closed-model performance using RL training on verifiable tasks

paper
We find that through purely reinforcement learning training, without any supervised chain-of-thought demonstrations, the model spontaneously develops sophisticated reasoning behaviours — including self-verification, backtracking, and extended deliberation before answering.
Cited in Reasoning Models: Thinking Before AnsweringSource

Harvey

PowerUser

Platform documentation, third-party reviews, and press coverage referenced in the PowerUser course.

Platform

Harvey Platform Overview

Harvey AI

Official overview of Harvey's legal AI capabilities — document drafting, contract analysis, legal research, and matter summarisation.

Platform

Harvey AI

Harvey AI

Harvey's product home — enterprise AI built specifically for law, tax, and professional services firms.

Article

Harvey: Most Innovative Companies 2026

Fast Company

Fast Company's profile of Harvey as one of the most innovative companies of 2026, covering its enterprise growth and legal AI positioning.

Article

An Overview of Harvey AI's Features for Lawyers

Minnesota State Bar Association

Bar association overview of Harvey's feature set written for practising lawyers — useful for grounding capability expectations against professional standards.

Review

Harvey AI Review

Tools for Humans

Independent practitioner review covering Harvey's strengths, limitations, and workflow integration from a user perspective.

Review

Harvey AI Review

GrowLaw

GrowLaw's review of Harvey for law firms — practical coverage of use cases, pricing considerations, and day-to-day workflow fit.

Legora

PowerUser

Official resources and independent coverage of Legora referenced in the PowerUser course.

Platform

Legora for Law Firms

Legora

Legora's solution page for law firms — document intelligence, workflow automation, and due diligence tooling.

Platform

Legora

Legora

Legora's product home — legal AI platform with a strong focus on Nordic and European jurisdictions and data residency.

Blog

Choosing the Right Legal AI Solution: A Practical Guide

Legora

A practical framework for evaluating legal AI tools, covering jurisdiction fit, data governance, accuracy benchmarks, and hallucination risk.

Article

Microsoft Customer Story: Legora

Microsoft

Microsoft's case study on Legora's Azure OpenAI integration — covers technical architecture and enterprise deployment at scale.

Legal ethics & case law

PowerUser

Professional conduct rules and landmark cases that define how lawyers must use AI responsibly.

Rule

ABA Model Rules of Professional Conduct — Rules 1.1, 1.6, 5.1 & 5.3

American Bar Association

The foundational professional conduct rules underpinning responsible legal AI use: Rule 1.1 Comment 8 (technology competence), Rule 1.6 (confidentiality of client data), and Rules 5.1 & 5.3 (supervision of lawyers and non-lawyer assistants, including AI tools).

Case law

Mata v. Avianca, Inc., No. 22-cv-1461 (PKC) (S.D.N.Y. June 22, 2023)

United States District Court, S.D.N.Y.

Landmark 2023 case in which counsel submitted AI-generated briefs citing non-existent cases. The court sanctioned the attorneys — establishing binding precedent that lawyers bear full accountability for AI-generated content they file.

Academic & technical foundations

PowerUser

Key research underpinning how AI language models and retrieval systems work.

Paper

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis et al. (2020)

The foundational paper introducing RAG — the technique that lets AI models ground responses in retrieved documents rather than training memory alone. Core to how legal AI tools reduce hallucination when working with specific source material.

Empirical research on AI productivity

PowerUser

Academic empirical studies, systematic reviews, benchmarks, and legal-engineering methodology sources measuring productivity gains, quality effects, hallucination risk, workflow design, and verification in AI-supplemented legal and professional work.

Paper

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

Dell'Acqua, F., McFowland III, E., Mollick, E. R. et al. (Harvard Business School / BCG, 2023)

The landmark RCT on professional AI use — 758 BCG consultants, 18 tasks. Source for the 'jagged frontier' concept: 25% faster, 40% higher quality inside the frontier; 19pp worse outside it. Foundation for the Research Evidence section.

Paper

Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence

Noy, S. & Zhang, W. (MIT, 2023) — published in Science

Preregistered RCT with 444 professionals on writing tasks. Source for the 40% speed gain, 18% quality improvement, and skill-compression findings in the Research Evidence section.

Paper

Lawyering in the Age of Artificial Intelligence

Choi, J. H., Monahan, A. & Schwarcz, D. (Minnesota Law Review Vol. 109, 2024)

First legal-specific RCT: law students with/without GPT-4 on realistic legal tasks. Source for the consistent speed gains / uneven quality improvement finding and the junior-lawyer uplift pattern.

Paper

AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice

Schwarcz, D., Manning, S., Prescott, J. J. et al. (Journal of Law and Empirical Analysis, 2026)

Primary source for the 2026 legal RCT showing quality gains from o1-preview and Vincent AI, fewer hallucinated citations with the RAG-grounded tool, and task-specific quality effects.

Paper

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Chen, B. M. & Bao, H. (2026)

Source for the claim that untrained LLM access can be counterproductive in legal analysis, while brief training improved adoption and scores.

Paper

Artificial Intelligence and Human Legal Reasoning

Bednar, N., Cleveland, D. R., Erbsen, A. & Schwarcz, D. (2026)

Source for the workflow-placement claim: AI helped early legal synthesis without reducing later comprehension, but AI revision helped weaker memos while degrading stronger ones.

Paper

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

van der Meer, V. & Rossi, J. (ICAIL 2026)

Deployment study supporting the legal-engineering claim that curated legal knowledge bases, controlled prompting, and expert-in-the-loop review can produce near-final legal drafts in a bounded workflow.

Paper

Reimagining Legal Fact Verification with GenAI: Toward Effective Human-AI Collaboration

Han, S., Zhang, Y., Huang, Y. et al. (CHI 2026)

Interview study supporting the claim that legal AI fact-verification workflows require auditability, accountability, confidentiality controls, and human legal judgment.

Paper

Benchmarking Legal RAG: The Promise and Limits of AI Statutory Surveys

Afane, M., Hariri, E., Ouyang, D. & Ho, D. E. (ACM CS&Law 2026)

Benchmark source for the claim that specialized statutory RAG and legal error analysis can outperform generic or commercial legal AI setups, but retrieval and reasoning failures remain material.

Paper

Legal RAG Bench: an end-to-end benchmark for legal RAG

Butler, A.-R. & Butler, U. (2026)

Benchmark source for the claim that retrieval quality sets the ceiling for many legal RAG workflows and that groundedness must be evaluated separately from answer fluency.

Paper

Generative AI in public administration: A quasi-experimental analysis of bureaucratic productivity

Kim, E. (Government Information Quarterly, 2026)

Quasi-experimental source for the claim that specialized GenAI can reduce task-level drafting time in rule-bound public-sector workflows, especially for newer employees.

Paper

Generative AI and labour productivity: A quasi experiment on coding

Gambacorta, L., Qiu, H., Shan, S. & Rees, D. M. (Journal of Financial Stability, 2026)

Source for the measurement caution that AI can increase output volume more than useful task completion, making workflow-level productivity metrics essential.

Paper

Generative AI and Worker Productivity: A Systematic Review and Quantitative Evidence Synthesis (2023-2026)

Singh, H. V. (Indian Institute of Management Bangalore, 2026)

Systematic review source for the cross-study synthesis that task-level AI productivity gains are real but heterogeneous by task, expertise, and measurement method.

Paper

LawFlow: Collecting and Simulating Lawyers' Thought Processes

Das, S. et al. (Microsoft Research / arXiv, 2025)

Source for the legal-engineering claim that real legal work is an adaptive workflow with decision points and review loops, not just isolated answer generation.

Paper

An Uncommon Task: Participatory Design in Legal AI

Delgado, F., Barocas, S. & Levy, K. (Proceedings of the ACM on Human-Computer Interaction, 2022)

Methodology source for the claim that legal AI evaluation and tool design benefit from participatory methods where lawyers and technologists co-design tasks, simulations, and criteria.

Paper

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Guha, N. et al. (2023)

Benchmark source for the evaluation-rubric claim: legal AI should be tested against legal task taxonomies rather than generic model capability claims.

Paper

LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain

Pipitone, N. & Alami, G. (2024)

Benchmark source for the retrieval-grounding claim that legal RAG needs expert-annotated relevant passages and legal-domain groundedness checks.

Paper

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

Dahl, M. et al. (2024)

Source for the verification-risk claim that legal hallucinations remain a core failure mode requiring citation checks and human legal review.

Paper

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

Magesh, V. et al. (Stanford HAI / RegLab, 2024)

Source for the claim that even legal RAG and legal research tools require human verification because hallucinations and unsupported legal propositions can persist.

Paper

Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review

Grossman, M. R. & Cormack, G. V. (Richmond Journal of Law & Technology, 2011)

Foundational empirical paper establishing that TAR matches or exceeds exhaustive manual review recall at dramatically lower cost. Kept as background context for validated, auditable legal automation.

Paper

Comparing the Performance of Artificial Intelligence to Human Lawyers in the Review of Standard Business Contracts

LawGeex / Bucerius Law School (2018)

Vendor-funded controlled comparison retained as supporting context for narrow contract-review automation claims; no longer used as a headline academic evidence card.

Article

Future of Professionals Report 2024

Thomson Reuters Institute (2024)

Industry survey retained as adoption context for legal and compliance professionals; not used as primary academic evidence for productivity or quality claims.

Article

Another New Study of Legal AI Shows Some Models Can Significantly Improve Work Quality and Efficiency

Bob Ambrogi / LawNext (2025)

Practitioner coverage retained as context for how legal AI RCT findings were surfaced in the legal industry; primary claims now use the 2026 journal article directly.

Legal AI news

PowerUser

Publications and outlets tracked daily for legal AI tool launches, court rulings, adoption data, and market developments.

Blog

LawSites (LawNext) by Bob Ambrogi

Robert Ambrogi

The most thorough independent tracker of legal technology developments. Primary source for the Recent Developments feed — covering tool launches, court decisions, and adoption data with detailed analysis.

Blog

Artificial Lawyer

Artificial Lawyer

Legal AI industry publication covering tool launches, market developments, and technology trends. Source for Microsoft Legal Agent and OpenAI legal vertical coverage in the developments feed.

Article

2026 Legal Industry AI Adoption Report

8am

Survey of 1,300+ legal professionals conducted in late 2025. Found individual AI adoption doubled to 69% in a year while institutional governance (policies, training) significantly lagged — the governance gap entry in the developments feed.

Article

Legal AI's Next Phase: Built With Lawyers, Measured in Practice

National Law Review

Source for the hallucination incidents data: 1,348 cases catalogued worldwide as of April 2026, growing from ~2/week to ~2-3/day. Covers the shift in legal AI toward reasoning-based approaches.

Rule

Florida Supreme Court: Amendment to Rule 2.515(d)(2) — AI Citation Certification

The Florida Bar / Florida Supreme Court

Effective June 15 2026, requires all signers of Florida court filings to certify that cited legal authorities exist and are accurately cited. Direct regulatory response to AI hallucination incidents in court submissions.

Blog

National Law Review

National Law Review

Legal news outlet covering AI regulation, court rulings, and compliance developments. Monitored as part of the daily digest.

Blog

Law.com / The American Lawyer

ALM Media

Industry publication covering law firm strategy, legal tech adoption, and market developments. Monitored as part of the daily digest.

EU AI regulation

PowerUser

Official EU sources, independent trackers, and law firm commentary on the AI Act, GDPR enforcement, and European AI governance.

Rule

European Commission — AI Act

European Commission

Official EU source for AI Act updates, implementation guidance, and enforcement timelines. Primary regulatory reference for EU AI coverage in the digest.

Platform

EU AI Act Tracker

Future of Life Institute

Independent tracker of the EU AI Act's progress, obligations by risk tier, and implementation deadlines. Useful for quickly checking where a specific article or obligation stands.

Rule

European Data Protection Board (EDPB)

EDPB

EDPB guidance and enforcement actions on AI and GDPR intersections — transparency obligations, data subject rights, and coordinated enforcement. Monitored for AI-specific opinions and decisions.

Blog

EURACTIV

EURACTIV

EU policy news outlet with dedicated AI coverage. Strong on legislative process, member-state positions, and Brussels negotiations around the AI Act and digital regulation.

Blog

AlgorithmWatch

AlgorithmWatch

Investigative and policy outlet tracking algorithmic accountability and AI governance in Europe. Covers enforcement gaps, civil society responses, and high-risk AI system incidents.

Blog

Bird & Bird AI Insights

Bird & Bird

Law firm insights on EU and global AI regulation — practical compliance analysis written for practitioners. Useful for understanding legal obligations in plain terms.

Blog

Fieldfisher AI Insights

Fieldfisher

Fieldfisher's technology and AI regulatory commentary. Covers GDPR enforcement, AI Act obligations, and cross-border AI compliance for European businesses.

Frontier model labs

PowerUser

Official news and blogs from the major foundation model providers — monitored for model releases, pricing changes, and capability updates.

Platform

Anthropic News

Anthropic

Official Anthropic announcements covering model releases, safety research, and product updates. Primary source for Claude-related developments in the digest.

Platform

OpenAI News

OpenAI

Official OpenAI announcements covering model releases, API changes, and product launches. Monitored for frontier model developments relevant to legal and enterprise AI.

Blog

Google DeepMind Blog

Google DeepMind

Google DeepMind's research and product announcements, including Gemini model releases. Monitored for capability and pricing developments.

Platform

Mistral AI News

Mistral AI

Mistral's model and product announcements. Relevant for open-weight model releases and European frontier AI developments.

Blog

Meta AI Blog

Meta AI

Meta's AI research and product updates, including Llama model releases. Monitored for open-weight model developments and inference ecosystem changes.

Platform

xAI News

xAI

xAI's announcements covering Grok model releases and updates. Monitored as part of the frontier model landscape.

AI coding tools

PowerUser

Changelogs, blogs, and practitioners covering AI-assisted coding workflows, agents, and developer tooling.

Blog

Cursor Changelog

Anysphere / Cursor

Release notes and feature updates for the Cursor AI code editor. Monitored for prompting and agentic coding workflow developments.

Blog

GitHub Blog

GitHub

GitHub's official blog covering Copilot updates, Actions features, and developer AI tools. Monitored for AI coding workflow and agent developments.

Blog

Cognition Blog (Devin)

Cognition AI

Cognition AI's blog covering Devin — the autonomous software engineering agent. Relevant for understanding agentic coding patterns and capabilities.

Blog

Simon Willison's Weblog

Simon Willison

High-signal blog from the creator of Datasette, covering practical LLM usage, prompting techniques, and agentic coding workflows. One of the most reliable sources for real-world AI coding tips.

Prompting & coding tips

PowerUser

Communities and writers where practical prompting techniques and LLM workflow patterns surface and get stress-tested.

Platform

Hacker News

Y Combinator

Tech community aggregator where popular prompting techniques and LLM workflow threads frequently surface. Front page and search reliably capture what practitioners are finding useful.

Platform

r/ClaudeAI

Reddit

Community for Claude users sharing prompting techniques, workflows, and tips. High-upvote threads reliably surface practical insights that haven't made it to formal writeups yet.

Platform

r/LocalLLaMA

Reddit

Community focused on running and prompting open-weight models locally. Techniques discussed here often generalise to hosted models. Strong signal for prompting patterns and inference optimisations.

Blog

Latent Space

swyx & Alessio Fanelli

Podcast and newsletter covering AI engineering, prompting research, and practitioner workflows. Surfaces Twitter/X discourse and translates it into structured analysis.

Blog

Andrej Karpathy — Blog & Posts

Andrej Karpathy

Karpathy's writing and social posts on LLM behaviour, prompting intuitions, and AI learning. His observations on X regularly generate high-signal discussion worth tracking.

Blog

The Pragmatic Engineer

Gergely Orosz

Engineering-focused newsletter with serious AI coding tool coverage. Covers real-world adoption of AI tools in engineering teams — useful for prompting and workflow patterns in professional contexts.

Digest aggregators

PowerUser

Catch-all aggregators monitored daily to surface AI developments that may not appear in category-specific sources.

Blog

AI Flash Report

AI Flash Report

Daily AI news aggregator covering model releases, research, and industry developments across all major sources. Used as a catch-all to surface developments that may have been missed from primary sources.

Llm Deep Dive

PowerUser
Talk

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

The primary reference for the LLM Deep Dive module. Karpathy's lecture covers the full stack from pretraining data and tokenization through the transformer architecture, alignment, and reasoning models — all without requiring a maths background.

Platform

FineWeb: Pretraining Dataset

HuggingFace

HuggingFace's open pretraining dataset and interactive demo. Shows the quality filtering pipeline used to build web-scale training data, making abstract data curation decisions concrete and explorable.

Platform

Tiktokenizer

Xenova

Interactive tokenizer visualiser. Shows in real time how different text is split into tokens by GPT-4, Llama, and other tokenizers — essential for building intuition about cost, context limits, and tokenization failure modes.

Platform

Transformer Neural Net 3D Visualiser

Brendan Bycroft

A 3D interactive visualisation of the transformer architecture showing how tokens flow through attention and feed-forward layers. Referenced in the architecture lesson as a hands-on companion.

Platform

llm.c: Let's Reproduce GPT-2

Andrej Karpathy

Karpathy's from-scratch C implementation of GPT-2 pretraining. Makes the pretraining process concrete and auditable — valuable for understanding exactly what pretraining does and how base models differ from assistant models.

Paper

The Llama 3 Herd of Models

Meta AI Research

Meta's technical report for Llama 3. One of the most detailed public accounts of a modern LLM training pipeline — pretraining data, tokenizer design, architecture choices, and alignment approach. Referenced throughout the deep dive module.

Platform

Hyperbolic — Base Model Inference

Hyperbolic

Cloud inference platform offering access to base models (pre-alignment) alongside instruction-tuned variants. Allows direct comparison of base vs. fine-tuned model behaviour — used in the pretraining lesson challenge.

Paper

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

Long Ouyang et al. (OpenAI, 2022)

The paper that established RLHF as the standard approach to LLM alignment. Demonstrated that a 1.3B parameter aligned model outperforms a 175B base model on human preference metrics — making alignment quality as important as scale.

Platform

HuggingFace Inference Playground

HuggingFace

Browser-based interface for running open models via HuggingFace's inference API. Used in the alignment lesson challenge to compare base vs. instruction-tuned model behaviour.

Paper

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek AI (2025)

The paper demonstrating that RL training on verifiable tasks produces emergent chain-of-thought reasoning without explicit demonstrations. Open-source weights enabled broad reproducibility. Central reference for the reasoning models lesson.

Platform

TogetherAI Playground

Together AI

Cloud inference playground for open models including DeepSeek-R1, Llama 3, and Mistral variants. Used in the reasoning models lesson challenge for side-by-side model comparison.

Paper

Mastering the Game of Go with Deep Neural Networks and Tree Search

David Silver et al. (DeepMind, 2016)

The AlphaGo paper showing that RL with self-play can discover superhuman strategies in Go without human demonstrations. Provides the conceptual foundation for understanding how RL training enables reasoning models to discover chain-of-thought strategies.

Platform

LM Arena

LMSYS / UC Berkeley

Human preference benchmark for language models. Real users vote on blind model comparisons, producing ELO-based rankings that reflect genuine user preference rather than academic benchmark performance.

Blog

AI News Newsletter

swyx / Lior Bar

Daily newsletter summarising significant AI research, model releases, and industry developments. Recommended in the inference ecosystem lesson as a reliable way to stay current in a fast-moving field.

Platform

LMStudio

LM Studio

Desktop application for downloading and running open LLMs locally without any command-line setup. Used in the inference ecosystem lesson challenge to experience fully local, private model inference.