just like there’s a distinction between non-information resources and information resources, or between binary resources and text resources, maybe there should be a distinction between descriptors documents and content documents
-
html kinda doesn’t make this distinction. there’s a head-body split but that’s not the same as a metadata-content split. you can embed metadata into body content just as equally as you can embed it in head tags (example: rdfa)
i guess this is basically the distinction between embedded metadata and sidecar metadata, is what i was trying to get at
-
i guess this is basically the distinction between embedded metadata and sidecar metadata, is what i was trying to get at
what i’m thinking is that sidecar metadata can be stored 1:1 or 1:n — if it’s 1:1 you might as well embed it if you can, either as some kind of frontmatter or inline with rdfa. but having frontmatter means every single processor that touches your content needs to be aware of the existence of that frontmatter (and strip it). so frontmatter isn’t as portable as i would like. basically a document with frontmatter is no longer that content type; it is a new media type for each combination.
-
what i’m thinking is that sidecar metadata can be stored 1:1 or 1:n — if it’s 1:1 you might as well embed it if you can, either as some kind of frontmatter or inline with rdfa. but having frontmatter means every single processor that touches your content needs to be aware of the existence of that frontmatter (and strip it). so frontmatter isn’t as portable as i would like. basically a document with frontmatter is no longer that content type; it is a new media type for each combination.
example: markdown is text/markdown but if you add frontmatter it is now something different. but there isn’t a standard type for this; instead, every application implements frontmatter parsing independently. there isn’t consensus on the delimiter or on the format. the definition of a new media type should include the delimiter and the format; for example, “delimit with three dashes and serialize frontmatter as yaml” or “delimit with three pluses and serialize frontmatter as toml”
-
example: markdown is text/markdown but if you add frontmatter it is now something different. but there isn’t a standard type for this; instead, every application implements frontmatter parsing independently. there isn’t consensus on the delimiter or on the format. the definition of a new media type should include the delimiter and the format; for example, “delimit with three dashes and serialize frontmatter as yaml” or “delimit with three pluses and serialize frontmatter as toml”
earlier i said that html’s head-body split is not the same as the metadata-content split i am after; after some further thought, this isn’t really true. i think what i am trying to model here is a way to be able to detect and handle arbitrary header data, by unwrapping it to get at the body content. but i’m realizing that this body content may itself have its own nested headers and body…
-
earlier i said that html’s head-body split is not the same as the metadata-content split i am after; after some further thought, this isn’t really true. i think what i am trying to model here is a way to be able to detect and handle arbitrary header data, by unwrapping it to get at the body content. but i’m realizing that this body content may itself have its own nested headers and body…
more precisely there is a format to the header data and there is a format to the body content
an http request/response can be serialized as a text file which has http headers and http body, and then that http body can be of a certain content type like html, which itself has html headers and html body. the html body content is often also of type html
you can progressively wrap or unwrap “body content” with “header data” in different formats. i’m not sure how best to describe this…
-
more precisely there is a format to the header data and there is a format to the body content
an http request/response can be serialized as a text file which has http headers and http body, and then that http body can be of a certain content type like html, which itself has html headers and html body. the html body content is often also of type html
you can progressively wrap or unwrap “body content” with “header data” in different formats. i’m not sure how best to describe this…
how can we generalize this header+content pattern, basically
i’m fairly sure you need to at least define header type, delimiter type, content type
example:
- header = toml
- delimiter = +++ to start, +++ to end
- content = htmlis this enough to describe a canonical data format?
-
how can we generalize this header+content pattern, basically
i’m fairly sure you need to at least define header type, delimiter type, content type
example:
- header = toml
- delimiter = +++ to start, +++ to end
- content = htmlis this enough to describe a canonical data format?
side note: i wish there was a distinction between html content and a full html document… if you try to render html content in a browser and it isn’t a full html document, weird things might happen
-
side note: i wish there was a distinction between html content and a full html document… if you try to render html content in a browser and it isn’t a full html document, weird things might happen
revisiting: i discovered the iana media type multipart/mixed which could basically be this, just with a little modification https://www.iana.org/assignments/media-types/#multipart
the thing is the "boundary" parameter in multipart media types expects to be concatenated to a -- so you can't express the typical --- or +++ without problems (a markdown horizontal rule --- would get parsed as a multipart boundary)
still there's probably some inspiration to be had there, you could define an application/subtype that does similar
-
revisiting: i discovered the iana media type multipart/mixed which could basically be this, just with a little modification https://www.iana.org/assignments/media-types/#multipart
the thing is the "boundary" parameter in multipart media types expects to be concatenated to a -- so you can't express the typical --- or +++ without problems (a markdown horizontal rule --- would get parsed as a multipart boundary)
still there's probably some inspiration to be had there, you could define an application/subtype that does similar
it would probably be more correct to define application/mdx or whatever (since the typical intent is to be processed by something like an MDX processor), but i haven't really looked into the particulars of doing this properly and making it modular (instead of hardcoding semantics of "toml frontmatter, --- separator, markdown body")
-
revisiting: i discovered the iana media type multipart/mixed which could basically be this, just with a little modification https://www.iana.org/assignments/media-types/#multipart
the thing is the "boundary" parameter in multipart media types expects to be concatenated to a -- so you can't express the typical --- or +++ without problems (a markdown horizontal rule --- would get parsed as a multipart boundary)
still there's probably some inspiration to be had there, you could define an application/subtype that does similar
@trwnh multipart/* types have to specify the boundary signature, so why would the classic markdown ---- be identified as one if not indicated as the boundary separator?
-
@trwnh multipart/* types have to specify the boundary signature, so why would the classic markdown ---- be identified as one if not indicated as the boundary separator?
@oblomov i mean if you say boundary="-" then the separator becomes --- but your markdown content might include --- as what gets rendered into an <hr> element
something like
```
---
foo: bar
---stuff.
---
more stuff.
```could get parsed as 3 parts instead of 2