just like there’s a distinction between non-information resources and information resources, or between binary resources and text resources, maybe there should be a distinction between descriptors documents and content documents
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
it might not be a problem with smaller content, like say an as:Note, but stuffing the full html structured contents of an entire article into even rss or atom seems like it could get out of hand really quickly. this is why feeds are often limited to 10-20 items or otherwise only include a summary, right?
so maybe it makes sense to treat even text content as a separate thing, just like we do with binary resources.
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
html kinda doesn’t make this distinction. there’s a head-body split but that’s not the same as a metadata-content split. you can embed metadata into body content just as equally as you can embed it in head tags (example: rdfa)
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
i guess this is basically the distinction between embedded metadata and sidecar metadata, is what i was trying to get at
-
joelving@mastodon.joelving.dkreplied to trwnh@mastodon.social last edited by
@trwnh imagine how easy on your server sharing links across the Fediverse would be, if you could query a URL (either separate via a well-known or as an HTTP verb) for a (signed) OpenGraph document instead extracting it from the full payload.
-
trwnh@mastodon.socialreplied to joelving@mastodon.joelving.dk last edited by
@joelving there is OEmbed which might be what you are looking for?
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
what i’m thinking is that sidecar metadata can be stored 1:1 or 1:n — if it’s 1:1 you might as well embed it if you can, either as some kind of frontmatter or inline with rdfa. but having frontmatter means every single processor that touches your content needs to be aware of the existence of that frontmatter (and strip it). so frontmatter isn’t as portable as i would like. basically a document with frontmatter is no longer that content type; it is a new media type for each combination.
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
example: markdown is text/markdown but if you add frontmatter it is now something different. but there isn’t a standard type for this; instead, every application implements frontmatter parsing independently. there isn’t consensus on the delimiter or on the format. the definition of a new media type should include the delimiter and the format; for example, “delimit with three dashes and serialize frontmatter as yaml” or “delimit with three pluses and serialize frontmatter as toml”
-
alice@gts.void.dogreplied to trwnh@mastodon.social last edited by
@trwnh it should be specified by markdown (variants) probably,,
-
trwnh@mastodon.socialreplied to alice@gts.void.dog last edited by
@alice im thinking more along the lines of like. what if yaml frontmatter + markdown content = application/markdown-content-with-yaml-frontmatter or whatever. and it had a registered extension .mdyaml or whatever.
i’d be mainly interested in html content and toml frontmatter but saying that this is .html and text/html is not accurate. i can’t directly serve such an html file via a web browser; it needs to be processed/converted/whatever by an application first
-
alice@gts.void.dogreplied to trwnh@mastodon.social last edited by
@trwnh if it needs so much pre processing is it a markdown template?
-
trwnh@mastodon.socialreplied to alice@gts.void.dog last edited by
@alice not really, it’s “content” plus some “header”; you *can* render it against some template or layout, but the main goal here is portability. i want to be able to know ahead-of-time that this text file is not just html/md/etc, it also has some junk i may or may not care about at the start. if i want just the content then i need to strip that junk first
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
earlier i said that html’s head-body split is not the same as the metadata-content split i am after; after some further thought, this isn’t really true. i think what i am trying to model here is a way to be able to detect and handle arbitrary header data, by unwrapping it to get at the body content. but i’m realizing that this body content may itself have its own nested headers and body…
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
more precisely there is a format to the header data and there is a format to the body content
an http request/response can be serialized as a text file which has http headers and http body, and then that http body can be of a certain content type like html, which itself has html headers and html body. the html body content is often also of type html
you can progressively wrap or unwrap “body content” with “header data” in different formats. i’m not sure how best to describe this…
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
how can we generalize this header+content pattern, basically
i’m fairly sure you need to at least define header type, delimiter type, content type
example:
- header = toml
- delimiter = +++ to start, +++ to end
- content = htmlis this enough to describe a canonical data format?
-
trwnh@mastodon.socialreplied to trwnh@mastodon.social last edited by
side note: i wish there was a distinction between html content and a full html document… if you try to render html content in a browser and it isn’t a full html document, weird things might happen