/pages /computing /projects


Semantic Authoring Markdown is a way to write XML by hand.


Writing SAM looks like this:

		h1: Header
		This is a paragraph.
		- This is a bullet
		- This is the next one
		{This}(a|href="this.html") is a link to another page,
		and {this}(strong) is some style information. This
		syntax is for "annotations" of text.

Rendering the above SAM would yield the following XML/HTML:

		<meta charset="utf-8" />
		<link rel="stylesheet" href="css/styles.css" />
			This is a paragraph.
			<li>This is a bullet</li>
			<li>This is the next one</li>
			<a href="this.html">This</a> is a link to another
			page, and <strong>this</strong> is some style
			information. This syntax is for "annotations" of


The main reason to use SAM is that, like regular Markdown it is inspired from, it makes it much easier to actually write XML documents. The nested tag structure of XML—using the <tag></tag> style of annotation—is easy to read, but tedious to type. Alternative markup systems aim to eliminate the redundancy of XML.Unlike Markdown, however, SAM enables the full range of XML notation. Markdown takes a "less is more" approach; it defines a subset of HTML that is useful and makes that simple and easy to write. However, taking advantage of the more emergent and semantic properties of HTML are difficult or impossible to do with Markdown, and so SAM fills that void. For times where you want to hand-author more complex XML-style data—like, perhaps, a personal wiki—SAM hits the sweet-spot between human read/write-ability and full XML expressiveness.


Mark Baker is the designer of SAM, and his extensive documentation is the best primary resource on the subject. However, as one of the (most likely very few) people who has written their own parser for the specification, there is a small amount of insight that I can add.

At the core of the design is the concept of "Blocks" and "Flows".


A Block can be thought of as an XML element capable of having child elements, and a Flow is a stream of content within the block, generally Character Data. As an XML element, a Block can contain Attributes.


A Flow, on the other hand, can be divided up into XML elements (with their own Attributes) through the use of "Annotation". From XML's perspective, there's no difference between a Block and an Annotation in a Flow: they both end up as XML elements in the end.

The benefit of separating the structure into these two element types is that it allows us to impart more meaning into our writing. Dividing a text into Blocks creates bigger high level structure, while annotations can be used to give more nuance. In the end, it creates a more naturally flowing document that is easier to both read and write than if we kept the fully hierarchical nature of XML, but just without closing tags.


My implementation takes some liberties by ignoring some features I didn't need (or feel like implementing), and adding a few that I found helpful—like the ability for both Block and Flow "literals", for use in pre-formatted text scenarios.

While my code could certainly be refactored into something that's much easier to reason about, I've had very few issues with it, and have been able to fix what few bugs I've encountered and add the few features I needed.

It's available on my source hut repository: sr.ht/~jakintosh/sam-rs

/sam Stream

June 15, 2024

/stream /programming /sam

While editing the site today, I realized that commas were not being rendered in alt text. It turned out that my /sam parser was looking for commas as exit points for attributes, which is how attributes are delimited, so I added a new state where it absorbs all characters it finds once quotes are opened in an attribute's value (other than closing quotes). This should be a pretty default condition for most XML attribute scenarios. sam-rs was bumped to v0.6.3.

June 8, 2024

/stream /sam

While updating the page for /sam itself, I wanted to demonstrate the output of XML text. However, I found that any XML inside of a Literal Block did not get escaped, which makes sense because it's a literal. Unfortunately, that specific use case is not helpful, so I updated the source code to make sure that literals still escape XML syntax (<, >, &) so that they actually end up on the screen.