Issue #324

June 24, 2026

Announcements

Graph RAG combines the best of both: you still use retrieval, but instead of retrieving raw chunks, you retrieve structured paths through a graph. The LLM gets grounded, traceable context. Not just "here are some relevant paragraphs."

The result: answers that are explainable, multi-hop aware, and far more reliable on complex queries.

This is what I'm teaching on July 11.

You'll build the full pipeline — from raw text through entity extraction, coreference resolution, relation extraction, graph construction, and finally a grounded chatbot. Live. With code you keep.

If you're working in finance, healthcare, enterprise AI, or any domain where hallucinations are a real cost, this is worth cutting into your Saturday.>

Use code BRUNO40 at checkout for 40% off.

👉 Production Graph RAG: Build Explainable LLM Apps with Knowledge Graphs

Dear Reader,

This week we continue our 7th anniversary celebration with a new blog post over on the substack: Build a Language Model by Counting where we use a Markov Chain approach to build a simple language model based on the wikitext 103 dataset.

Two pieces this week point in opposite directions. The Economist warns that we are not ready for the coming intelligence explosion. It argues that recursive self-improvement could turn irreversible, and that we should set up a verifiable global framework first. Governments move over decades. AI capability moves in months. That gap is the core worry, and the record on regulators acting early gives little reassurance. A second piece pushes back on the claim that everyone now uses AI for everything. About a third of Americans use it actively. A third use it now and then. A third never touch it. That split has barely shifted in a year. The one figure rising fast is frustration, with a recent poll putting anger among Gen Z up about 40 percent year over year. So the real picture reads closer to some people using AI for some things.

The week’s other thread runs quieter and more hopeful. The tools themselves keep getting better. One developer report shows that local models are now genuinely usable. On a 2022 Mac with 64 GB of memory, agentic coding loops now run at roughly 75 percent of the speed and accuracy of frontier models. That falls short of production work, but it covers a lot of real tasks on hardware many people already own. On the research side, a new method translates a model’s raw activations into plain English. It pairs two fine-tuned models. One reads an activation vector and describes it in words. The other rebuilds the vector from that description. A good description has to recreate the original, and that is the whole test. During a pre-deployment audit of Claude Opus 4.6, the method surfaced a case of hidden evaluation awareness. The model believed it was being tested, and never said so out loud.

A few papers this week probe how much these models actually reason. One revisits a stubborn flaw called the reversal curse. Train a model on “A is B” and it often cannot answer “B is A”. The authors put numbers on it. GPT-4 named Tom Cruise’s mother 79 percent of the time, then named her son just 33 percent of the time. The gap holds across model sizes and families. Another study asks whether agents can work out the rules of a hidden environment. Each agent has to uncover a hidden finite-state machine by asking an oracle yes-or-no questions. Bigger target machines drive accuracy down sharply. Reasoning models do better than the rest, but all of them still trail the classic algorithms built for the job. A brighter result comes from a 3-billion-parameter reasoning model. It scores 94.3 on this year’s AIME math exam and 80.2 on a hard coding benchmark, matching systems hundreds of times its size. The catch is narrow scope. The trick works on tasks with a checkable answer, not on open-ended knowledge.

Two more papers turn from the models to the people using them. One ran three experiments with 2,691 people on simple chores like arithmetic and spell-check, and found an efficiency-gain illusion. People often chose AI for tasks it did not speed up. They misjudged their own habits twice over. They thought they used AI less than they did, and they overrated the time it saved. Worse, one session fed the next. Early AI use pushed people toward more of it and locked in the false sense of speed. A second paper steps back to the act of handing decisions to AI. Delegation here means taking an AI’s answer into a real choice with little checking, little comparison, and little independent thought. People now do this in health, law, finance, and education. The risk sits in scale. Millions leaning on the same few systems can shape shared norms and push judgment in one direction.

Our current book recommendation is “Building Applications with AI Agents” by M. Albada. In this week’s video, we have an overview of What is OpenClaw? Inside AI Agents, LLMs and the Agentic Loop.

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team

Michael Albada spent nine years building machine learning systems at Uber, ServiceNow, and Microsoft, and it shows. His O'Reilly book, Building Applications with AI Agents, treats agents as a design pattern, not magic. Thirteen chapters take you from a single working agent through skills, orchestration, memory, learning, and on to multi-agent systems. Later chapters cover measurement, production monitoring, and security.

The design-first stance is the real draw. Every idea sits inside a case study: customer support, legal work, advertising, and code review agents. Albada compares real frameworks by name, including LangGraph, AutoGen, CrewAI, and OpenAI's SDK, and weighs their trade-offs instead of crowning a winner. A data scientist gets clear patterns for picking tools, structuring memory, and validating output before it ships.

It has two weak spots. Some chapters lean on checklists, and sometimes make you walk away feeling like the core idea could fit in a third of the pages. It also skips runnable, end-to-end code, pointing you to outside docs instead. Still, for the data scientist or ML engineer moving into agent work, this book maps the decisions that matter and saves weeks of trial and error. Worth a spot on the shelf.

Building Applications with AI Agents

1. GraphRAG vs RAG, Claude Mastery, LifeSciBench & 1M-Context Models [packtdatapro1.substack.com]
2. .gitignore Isn’t the Only Way To Ignore Files in Git [nelson.cloud]
3. The founder's playbook: Building an AI-native startup | [claude.com]
4. America’s compact between science and politics is broken [www.scientificamerican.com]
5. Humanity isn’t ready for the coming intelligence explosion [www.economist.com]
6. Running local models is good now [vickiboykis.com]
7. Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations [transformer-circuits.pub]
8. No, everyone is not using AI for everything. [gabrielweinberg.com]

• PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations (M. Ozmen, S. Majumdar)
• The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" (L. Berglund, M. Tong, M. Kaufmann, M. Balesni, A. C. Stickland, T. Korbak, O. Evans)
• Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning (R. Menaged, G. Lior, S. Ravfogel, R. Aharoni, G. Stanovsky)
• VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models (S. Xu, S. Liu, W. Wang, J. Min, Y. Dai, Z. Yin, Y. Chen, X. Zhou, J. Zhang)
• The efficiency-gain illusion: People underestimate the rate of AI use and overestimate its benefits on simple tasks (S. Yu, M. Cheng, A. Jabbar, I. Sucholutsky, K. M. Collins, D. Jurafsky, R. D. Hawkins)
• The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok (H. F. de Arruda, A. S. Teixeira, P. G. Reddy, A. Mondal, K. A. Oliveira, F. N. Silva)
• Implicit Bias: Evolution of a Powerful Idea (B. K. Payne)
• The social consequences of AI delegation (H. F. de Arruda, Y. Moreno)

What is OpenClaw? Inside AI Agents, LLMs and the Agentic Loop

June 24, 2026

Announcements

👉 Production Graph RAG: Build Explainable LLM Apps with Knowledge Graphs

​