
Yes, we’ll say it. Context management is the new buzzword. But it’s not just a buzzword; it’s the next piece in the puzzle of finding out how to use AI effectively. We’re learning that using AI effectively isn’t about making up clever prompts. Nor is it about cramming everything you possibly can into a giant context window. It’s managing what the model knows about the project you’re working on: It should have all the information that’s relevant and none that’s not relevant. And you should be able to detect when errors arise from a misbehaving context and know how to fix or restart your project.
AI
- OpenAI has launched study mode, a version of ChatGPT that’s intended to help students study rather than simply answer questions and solve problems. Like other AI products, it’s vulnerable to hallucination and misinformation derived from its training data.
- GLM-4.5 is yet another important open weight frontier model from a Chinese laboratory. Its performance is on the level of o3 and Claude 4 Opus. It’s a reasoning model that has been optimized for agentic applications and generative coding.
- Mixture of Recursions is a new approach to language models that promises to reduce latency, memory requirements, and processing power. While the details are complex, one key part is determining early in the process how much “attention” any word needs.
- What is “subliminal learning”? Anthropic has discovered that, when using synthetic data generated by a “teacher” model to train a “student” model, the student will learn things from the parent that aren’t in the training data.
- Spotify has published AI-generated songs imitating dead artists without permission from the artists’ estates. The songs were apparently generated by another company and removed from Spotify after their discovery was reported.
- There’s a new release of Qwen3-Coder, one of the top models for agentic coding. It’s a 480B parameter mixture of experts model, with 35B active parameters. Qwen also released Qwen Code, an agentic coding tool derived from Gemini CLI.
- Can treating complex documents as high-resolution images outperform using traditional OCR and document parsers to build RAG systems?
- A large group of researchers have proposed chain of thought monitoring as a way of detecting AI misbehavior. They also note that some newer models bypass natural language reasoning (and older models never used natural language reasoning), and that chain of thought transparency may be central to AI safety.
- A limited audit of the CommonPool dataset, which is frequently used to train image generation models, showed that it contains many images of drivers’ licenses, passports, birth certificates, and other documents with personally identifiable information.
- ChatGPT agent brings agentic capabilities to chat. It integrates with your email and calendar, can generate and run code, and can use websites and documents to generate reports, slides, and other kinds of output.
- Machine unlearning is a new technique for making speech generation models forget specific voices. It could be used to prevent a model from generating speech imitating certain people.
- Kimi-K2-Instruct is a new open weights model from the Moonshot AI group, a Chinese lab funded in part by Alibaba and Tencent. It’s a mixture of experts model with 1T total parameters and 32B active parameters.
- xAI released its latest model, Grok 4. While it has excellent benchmark results, we’d caution against relying on a model whose previous versions have advocated antisemitism, denied the Holocaust, and praised Hitler. It was also reported that Grok 4 searches for Elon Musk’s opinions before returning results. While these issues have been fixed, there’s a clear pattern here.
- Ben Recht asks if AI really needs gigantic scale, or is that just marketing? Nathan Lambert’s American DeepSeek Project will find out. More important, though, is that if you accept that foundational models need enormous scale, you’re accepting a lot of related ideological baggage. And that ideological baggage will only come into the open with fully open source AI.
- Hugging Face has released SmolLM3, a small (3B) reasoning model that’s completely open source, including datasets and training frameworks. The announcement gives a thorough description of the training process. SmolLM3 supports six languages and has a 128K context window.
- Does MCP enable a return to the early days of the web, when it was dominated by people playing with and discovering cool stuff, unlimited by walled gardens? Anil Dash thinks so.
- AI prompts have been found in academic papers. These prompts typically assume that an AI will be responsible for reviewing the paper and tell an AI to generate a good review. The prompts are hidden from human readers using typographical tricks.
- Centaur is a new language model that was designed to simulate human behavior. It was trained on data from human decisions in psychological experiments.
- In a research paper, X describes what could possibly go wrong with xAI’s language model providing “community notes” on Twitter (oops, X). The answer: Just about everything, including the propagation of misinformation and conspiracy theories.
- Playwright MCP is a powerful MCP server that allows an LLM to automate a web browser. Unlike the computer use API, Playwright uses the browser’s accessibility features rather than decoding pixels. It might be the only MCP server you ever need.
- Microsoft has open-sourced its GitHub Copilot Chat extension for VS Code. This apparently doesn’t include the original Copilot code completion feature, although that’s planned for the future.
- Drew Breunig has two excellent posts on context management. As we learn more about using AI effectively, we’re all finding out that using context effectively is key to getting good results. Just letting the context grow because context windows are large leads to failure.
- OpenAI has released an API for Deep Research, including a document on using Deep Research to build agents. We’re still waiting for Google.
- Artifacts are becoming agents. Claude now allows building artifacts (Claude-created JavaScript programs that run in a sandbox) that can call Claude itself. (Since artifacts can be published, the user will be asked to sign into Claude for billing.)
- So much of generative programming comes down to managing the context—that is, managing what the AI knows about your project. Context management isn’t simple; it’s time to get beyond prompt engineering and think about context engineering.
- Anthropic is adding a memory feature to Claude: Like ChatGPT, Claude will be able to reference the contents of earlier conversations in chats. Whether this is useful remains to be seen. The ability to clear the context is important, and Simon Willison points out that ChatGPT saves a lot of personal information.
- Google has donated the Agent2Agent (A2A) protocol to the Linux foundation. The specification and Python, Java, JavaScript and .NET SDKs are available on GitHub.
Security
- An attack against self-hosted Microsoft SharePoint servers has allowed threat actors, including ransomware gangs, to steal sensitive data, including authentication tokens. Installing Microsoft’s patch won’t prevent others from accessing systems using stolen tokens. Victims include the US National Nuclear Security Administration.
- There’s a new business model for malware. A startup is selling data stolen from people’s computers to debt collectors, divorce lawyers, and other businesses. Who needs the dark web?
- The US Cybersecurity and Infrastructure Security Agency (CISA) has recommended that “highly targeted individuals” not use VPNs; many personal VPNs have poor policies for security and privacy.
- Several widely used JavaScript linter libraries have been compromised to deliver malware. The libraries were compromised via a phishing attack on the maintainer. Software supply chain attacks will remain an important attack vector for the foreseeable future.
- Malware-as-a-service operators have used GitHub as a channel for delivering malware to their targets. GitHub is an attractive host because few organizations block it. So far, the targets appear to be Ukrainian entities.
- “Code Execution Through Email: How I Used Claude to Hack Itself” is a fascinating read on a new attack vector called “compositional risk.” Every tool can be secure in isolation, but the combination may still be vulnerable. In a masterpiece of vibe pwning, Claude developed an attack against itself and asked to be listed as an author on the vulnerability report.
- Malware can be hidden in DNS records. This isn’t new, but the problem is becoming worse now that DNS requests are increasingly made over HTTPS or TLS, making it difficult for defenders to discover what’s in DNS requests and responses.
- GPUhammer is an adaptation of the Rowhammer attack that works on NVIDIA GPUs. The attack repeatedly reads memory with specific access patterns to corrupt data. NVIDIA’s recommended defense reduces GPU performance by up to 10%.
- Be careful with your passwords! McDonald’s lost a database of 64M job applicant chats because the password was 123456.
- Static analysis for secure code is no longer enough. It isn’t fast enough to deal with AI-generated code, malware developers know how to evade static scanners, and there are too many false positives. We need new security tools.
Programming
- Databases have long been a problem for Kubernetes. It’s good at working with stateless resources, but databases are repositories of state. Here are some ideas for using Kubernetes to manage databases, including database upgrades and schema migrations.
- 89% of organizations say they’ve implemented Infrastructure as Code, but only 6% have actually done it. The bulk of cloud infrastructure management and administration takes place through clicking on dashboards (”click ops”).
- What happens when you run into a usage limit with Claude Code? Claude-auto-resume can automatically continue your job. Clever, but possibly dangerous; Claude Code will be running autonomously, without supervision or permission.
- Contract testing is the process of testing the contract between two services. It’s particularly important for testing microservices, integrating with third parties, and checking for backwards compatibility.
- GitHub has coined the term “Continuous AI.” It means all use of AI to support software collaboration regardless of the vendor, tool, or platform. They make it clear that it’s not a “product”; it’s a set of activities.
- Adrian Holovaty reports adding a scanner for ASCII guitar tablature to his sheet music tool Soundslice because ChatGPT hallucinated that the feature exists and he started receiving questions and complaints when users couldn’t find it. Adrian has mixed feelings about the process. Misinformation-driven development?
- For those of us who are comfortable with the command line, the Gemini CLI is essentially a shell with Gemini integrated. It’s open source and available on GitHub. Using it requires a personal Gemini account, though that need not be a paid account.
- Martin Fowler argues that LLMs make a fundamental change in the nature of abstraction; this is the biggest change in computing since the invention of high-level languages.
- Phoenix.new is an interesting addition to the agentic coding space developed by Fly. It only generates code in Elixir, and that code runs on Fly’s infrastructure. That combination makes it unique; it’s both an agentic coding tool and an application platform.
Things
- Belkin is another company abandoning its smart “Internet of Things” devices (in this case, Wemo products). Some features can be configured to work with Apple HomeKit, but on the whole, devices will be “bricked.” So is Whistle, a maker of network-enabled pet trackers.
- A solar-powered robot for pulling weeds might be a way to reduce the use of weedkillers on commercial farms.
Biology
- DeepMind’s AlphaGenome is a new model that predicts how small changes in a genome will affect biological processes. This promises to be very useful in researching cancer and other genetic diseases.
- Biomni is an agent that includes a language model with broad knowledge of biology, including tools, software and databases. It can solve problems, design experimental protocols, and perform other tasks that would be difficult for humans who typically have deep expertise in a single field.
Quantum Computing
- HyperQ is a hypervisor for quantum computers. It enables something that was previously thought impossible: multiple users sharing a quantum computer.
Radar Trends, Signals
Radar