An open notebook · Ruben

Learning AI and automations for law in the open — and sharing it.

I'm teaching myself to work with AI because it is transforming how we can do good legal work and research, and I'm publishing everything as I go: free study modules, a daily automated AI & Law newsfeed, little tools and automations with build logs including wins and mistakes.

Browse the study modules →Today's AI Digest

Scroll

This entire page is built and automated with it →

Today · 18 June 2026

Today's AI pulse.

A curated read for legal work — the capabilities, regulation, and court decisions that actually matter. Updated each morning, no feed to scroll.

Artificial Lawyer

Co-founder Gabe Pereyra: Harvey aims to enable 'law firms to build their own specialized models and own their own intelligence,' developing 'the first model in our legal foundation model series' to deliver intelligent capabilities affordably while maintaining security.

Firms that own their trained models retain institutional intelligence rather than contributing it to a shared SaaS provider's dataset — a fundamental shift in the IP dynamic of legal AI. If Harvey succeeds in making fine-tuning accessible, the vendor relationship flips: Harvey becomes infrastructure, not product.

Model

→

Google

DiffusionGemma 'generates entire blocks of text simultaneously' via text diffusion rather than sequential token prediction, achieving 1,000+ tokens/second on NVIDIA H100 and 700+ on RTX 5090.

Diffusion-based text generation has historically lagged autoregressive models on quality; a 26B MoE model with only 3.8B active parameters hitting 4x throughput on consumer hardware suggests the quality gap may be closing for speed-critical use cases. For local code infilling and in-line editing, this could shift the economics away from cloud APIs.

Model

→

Simon Willison

Simon Willison: GLM-5.2 is 'probably the most powerful text-only open weights LLM,' leading the Artificial Analysis Intelligence Index v4.1 at score 51 and ranking 2nd on the Code Arena WebDev leaderboard — behind only Claude Fable 5.

A 753B MoE model under MIT license with frontier-class benchmark scores raises the open-weights ceiling to where proprietary alternatives are no longer necessary for many enterprise coding and reasoning tasks. The 1M token context window — 5x its predecessor — opens long-document and large-codebase use cases that previously required closed models.

Model

→

Artificial Lawyer

Ironclad CEO Dan Springer: the integration enables 'AI systems that work together' for faster regulatory analysis, litigation assessment, and contract impact determination — all within trusted enterprise systems.

Most legal AI platforms are vertical stacks that resist integrating a competing AI layer. A deep bidirectional partnership between two platforms with overlapping capabilities suggests the market is shifting toward AI interoperability as a selling point — and may pressure other CLM and legal research vendors to follow.

Tool

→

Artificial Lawyer

Perplexity states its competitive edge centres on accuracy: 'most LLMs tolerate a certain amount of hallucinations...at Perplexity, we only reward answers that are factually accurate.'

Perplexity joins OpenAI, Anthropic, Microsoft, and Palantir in making a dedicated push into legal — five major tech brands competing in a vertical previously dominated by specialist vendors like Harvey and Legora. The influx raises the capability floor and compresses the time specialists have to build durable differentiation.

Market

→

Open the full digest →

Study modules · self-paced & free

Study the use of AI for Law.

Free, self-paced modules — working from prompting up to delegating real legal work to agents, with verification you can defend. No code, any profession.

Module 03

How LLMs Work — A Deep Dive

A standalone technical module following Andrej Karpathy's framework: from raw training data and tokenization through the transformer architecture, pretraining, alignment, reasoning models, and the modern inference ecosystem.

03The Raw Material: Pretraining Data
04Tokenization: How Text Becomes Numbers
05The Architecture: What a Transformer Does
06Pretraining: Teaching a Model to Predict
07From Base Model to Assistant: SFT and RLHF
08Reasoning Models: Thinking Before Answering
09Running Models: The Inference Ecosystem

Legal AI · Harvey & Legora

Legal AI Mastery

Get fluent with the two platforms lawyers actually use. Seven modules covering every capability on both — and the judgment for when to use which.

01Orientation — two platforms, two philosophies
02Asking & Researching — the Assistant
03Review at Scale — Vault & Tabular Review
04Drafting in Word — add-ins & playbooks
05Workflows & Agents — the heart of the course
06Tool for the Job — Harvey vs. Legora
07Using Them Responsibly
08Capstone — a live matter, end to end

SELF-PACED · FREE · NO CODE REQUIREDStart learning →

Build log · in public

What I'm building & breaking.

The honest version — experiments, dead ends, and the fixes that finally worked. Documented as it happens.

06 Jun 2026Shipped

Research evidence refreshed around 2026 studies and legal-engineering workflows

Refreshed /research from a static evidence page into a more current 2026 research synthesis. The page now opens with a Methodology block explaining the search filters, what counted as primary evidence, and how three research agents were used: one for legal-engineering literature, one for empirical AI legal-workflow studies, and one for the taxonomy of legal-engineering inputs. Rebuilt the study cards around newer academic and empirical sources: the 2026 AI-Powered Lawyering RCT, Chen & Bao's training RCT, Bednar et al. on AI and human legal reasoning, LegalCheck's municipal legal drafting deployment, Legal RAG benchmarks, public-sector drafting evidence, coding productivity evidence, and the 2026 worker-productivity evidence synthesis. Updated the synthesis panel to make the central point sharper: AI improves speed and sometimes quality, but the gains depend on training, task framing, retrieval grounding, evaluation rubrics, verification, and expert review. Added every source used to /sources under Empirical research, and demoted vendor/industry reports to supporting context rather than headline evidence.

06 Jun 2026Shipped

First digest run under the new brief — 5 items, 1 tip, sources reported

Ran the daily digest manually for the first time using DIGEST_BRIEF.md as the standing brief. Checked LawNext, Artificial Lawyer, Anthropic news, Hacker News, and EURACTIV (blocked). Selected 5 items: Kirkland + Palantir's PE fund formation engine, Lavern (open-source multi-agent legal system by Finnish lawyer Antti Innanen), a 6,200-matter analysis showing AI effectiveness depends on structured processes, Anthropic's confidential S-1 IPO filing, and a statistical analysis of rsync releases showing Claude-assisted commits are no buggier than baseline. Added 1 tip: the Specify-Encode-Fulfill (SEF) loop for AI-assisted TDD, surfaced from HN with 101 points. Sources used were listed for review before adding to extra-sources.json.

06 Jun 2026Shipped

Digest infrastructure: brief, source list, type taxonomy, and selection transparency

Built out the full infrastructure for running the daily digest as a repeatable, automatable routine. Created DIGEST_BRIEF.md — a standing brief that any agent (scheduled or manual) reads before building a digest entry. Defines 5 source tiers with 25+ outlets, selection criteria (meaningful change only, skip restatements), format, and the instruction to report sources to Ruben rather than self-commit. Added a Prompting & Coding Tips source tier (HN, r/ClaudeAI, r/LocalLLaMA, Latent Space, Karpathy, Pragmatic Engineer) alongside the existing legal AI and EU regulation tiers. Added 22 new entries across 6 groups to extra-sources.json and wired curated headings for all of them in the sources page. Added a selection criteria paragraph to the 'Sources checked' panel on the digest page for transparency. Replaced the priority field (high/medium/low, reflecting source tier) with a type taxonomy (regulation/model/tool/research/technique/market, reflecting what the item actually is) across schema, data, components, and filter bar. Renamed legalSignal → signal throughout. Added DigestTip schema and optional tips array to DailyDigest. Updated CLAUDE.md to reframe the project as a personal AI learning notebook and reference DIGEST_BRIEF.md.

05 Jun 2026Broke → fixed

Philosophy page looked completely unstyled — two separate bugs, one obvious, one hidden

Built and shipped the /philosophy page from a design handoff: a hero section with a floating particle canvas, three expandable accordion boxes (Automations built, Agents deployed, Long term challenges), and a footer. The page rendered — but looked like completely unstyled HTML. Content started at the left edge of the screen, accordion boxes were fully expanded with no card styling, tags appeared as a run of plain text with no pill borders or spacing. The header nav links also did nothing: clicking Study or Digest on /philosophy scrolled nothing. Two bugs, both fixed.

Read the full log →

Launchpad