The main benefit of using XML here seems to be that it forces clearer thinking and formulation from the user.
RadiozRadioz
> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998
Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.
show comments
kid64
The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.
The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.
So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.
Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.
show comments
Jcampuzano2
But should this extend to anything that could end up in Claudes context? Should we be using xml even in skills for instance, or commands, custom subagents etc.
And then do we end up over indexing on Claude and maybe this ends up hurting other models for those using multiple tools.
I just dislike how much of AI is people saying "do this thing for better results" with no definitive proof but alas it comes with the non determinism.
At least this one has the stamp of approval by Claude codes team itself.
hkbuilds
This matches my experience building AI-powered analysis tools. Structured output from LLMs is dramatically more reliable when you give the model clear delimiters to work with.
One thing I've found: even with XML tags, you still need to validate and parse defensively. Models will occasionally nest tags wrong, omit closing tags, or hallucinate new tag names. Having a fallback parser that extracts content even from malformed XML has saved me more than once.
The real win is that XML tags give you a natural way to do few-shot prompting with structure. You can show the model exactly what shape the output should take, and it follows remarkably well.
michaelcampbell
Total tangent, but what vagary of HTML (or the Brave Browser, which I'm using here) causes words to be split in very odd places? The "inspect" devtools certainly didn't show anything unusual to me. (Edit: Chrome, MS Edge, and Firefox do the same thing. I also notice they're all links; wonder if that has something to do with it.)
To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.
Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.
Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.
I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.
Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.
While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.
If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.
It seems like that would make prompt injection much harder.
show comments
lmeyerov
My intuition is it comes down to error-correcting codes. We're dealing with lossy systems that get off track, so including parity bits helps.
Ex: <message>...</message> helps keep track. Even better? <message78>...</message78>. That's ugly xml, but great for LLMs. Likewise, using standard ontologies for identifiers (ex: we'll do OCSF, AT&CK, & CIM for splunk/kusto in louie.ai), even if they're not formally XML.
For all these things... these intuitions need backing by evals in practice, and part of why I begrudgingly flipped from JSON to XML
apwheele
I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.
Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.
N of 1 anecdote anyway for one use case.
show comments
strongpigeon
This seems like an actual good use for XML. Using it as a serialization format always rubbed me the wrong way (it’s super verbose, the named closing tag are unnecessary grammar-wise, the attribute-or-child question etc.) But to markup and structure LLM prompts and response it feels better than markdown (which doesn’t stream that well)
TutleCpt
I think this article is 100% relevant to you today. Anthropic put out a training video, a number of months ago saying that XML should be highly encouraged for prompts. See https://m.youtube.com/watch?v=ysPbXH0LpIE
imglorp
A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.
And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.
TheJoeMan
That first image, “Structure Prompts with XML”, just screams AI-written. The bullet lists don’t line up, the numbering starts at (2), random bolding. Why would anyone trust hallucinated documentation for prompting? At least with AI-generated software documentation, the context is the code itself, being regurgitated into bulleted english. But for instructions on using the LLM itself, it seems pretty lazy to not hand-type the preferred usage and human-learned tips.
show comments
ryanschneider
Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.
show comments
alansaber
Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.
show comments
twoodfin
This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.
HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.
So if you want to annotate text with semantics in a way models will understand…
show comments
wooptoo
Amazing how an entire profession that until yesterday would pride itself on precision, clarity (in thought and in writing), efficiency, and formality, has now descended into complete quackery.
show comments
wolttam
Anthropic’s tool calling was exposed as XML tags at the beginning, before they introduced the JSON API. I expect they’re still templating those tool calls into XML before passing to the model’s context
show comments
Zebfross
I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.
show comments
ixxie
How about other frontier models, and smaller models?
CactusBlue
I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.
spacecadet
This has been the way for a long time, exploiting XML tags was a means of exfiltrating data or reversing a model for a while as well. Some platforms are still vulnerable to this.
esafak
This sounds like something for harnesses, not end users. Are they really expecting us to format prompts as XML??
Eric_WVGG
bemused by how competently designed this is, compared to enshittified blogs and whatnot
To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”
The main benefit of using XML here seems to be that it forces clearer thinking and formulation from the user.
> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998
Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.
The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.
The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.
So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.
Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.
But should this extend to anything that could end up in Claudes context? Should we be using xml even in skills for instance, or commands, custom subagents etc.
And then do we end up over indexing on Claude and maybe this ends up hurting other models for those using multiple tools.
I just dislike how much of AI is people saying "do this thing for better results" with no definitive proof but alas it comes with the non determinism.
At least this one has the stamp of approval by Claude codes team itself.
This matches my experience building AI-powered analysis tools. Structured output from LLMs is dramatically more reliable when you give the model clear delimiters to work with.
One thing I've found: even with XML tags, you still need to validate and parse defensively. Models will occasionally nest tags wrong, omit closing tags, or hallucinate new tag names. Having a fallback parser that extracts content even from malformed XML has saved me more than once.
The real win is that XML tags give you a natural way to do few-shot prompting with structure. You can show the model exactly what shape the output should take, and it follows remarkably well.
Total tangent, but what vagary of HTML (or the Brave Browser, which I'm using here) causes words to be split in very odd places? The "inspect" devtools certainly didn't show anything unusual to me. (Edit: Chrome, MS Edge, and Firefox do the same thing. I also notice they're all links; wonder if that has something to do with it.)
https://i.imgur.com/HGa0i3m.png
I am unconvinced.
To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.
Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.
Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.
I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.
Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.
While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.
If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.
It seems like that would make prompt injection much harder.
My intuition is it comes down to error-correcting codes. We're dealing with lossy systems that get off track, so including parity bits helps.
Ex: <message>...</message> helps keep track. Even better? <message78>...</message78>. That's ugly xml, but great for LLMs. Likewise, using standard ontologies for identifiers (ex: we'll do OCSF, AT&CK, & CIM for splunk/kusto in louie.ai), even if they're not formally XML.
For all these things... these intuitions need backing by evals in practice, and part of why I begrudgingly flipped from JSON to XML
I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.
E.g. instead of
Just doing something like: Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.N of 1 anecdote anyway for one use case.
This seems like an actual good use for XML. Using it as a serialization format always rubbed me the wrong way (it’s super verbose, the named closing tag are unnecessary grammar-wise, the attribute-or-child question etc.) But to markup and structure LLM prompts and response it feels better than markdown (which doesn’t stream that well)
I think this article is 100% relevant to you today. Anthropic put out a training video, a number of months ago saying that XML should be highly encouraged for prompts. See https://m.youtube.com/watch?v=ysPbXH0LpIE
A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.
And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.
That first image, “Structure Prompts with XML”, just screams AI-written. The bullet lists don’t line up, the numbering starts at (2), random bolding. Why would anyone trust hallucinated documentation for prompting? At least with AI-generated software documentation, the context is the code itself, being regurgitated into bulleted english. But for instructions on using the LLM itself, it seems pretty lazy to not hand-type the preferred usage and human-learned tips.
Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.
Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.
This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.
HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.
So if you want to annotate text with semantics in a way models will understand…
Amazing how an entire profession that until yesterday would pride itself on precision, clarity (in thought and in writing), efficiency, and formality, has now descended into complete quackery.
Anthropic’s tool calling was exposed as XML tags at the beginning, before they introduced the JSON API. I expect they’re still templating those tool calls into XML before passing to the model’s context
I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.
How about other frontier models, and smaller models?
I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.
This has been the way for a long time, exploiting XML tags was a means of exfiltrating data or reversing a model for a while as well. Some platforms are still vulnerable to this.
This sounds like something for harnesses, not end users. Are they really expecting us to format prompts as XML??
bemused by how competently designed this is, compared to enshittified blogs and whatnot
To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”