i'm not 100% sure about this but i am starting to think that the way #jsonld context declarations propagate by default is generally an anti-pattern

jens@social.finkhaeuser.de

@trwnh ... brought up MIME as a very robust encoding that allows for namespacing and so keeps processors from having to know everything about everything, prevents them from having to understand a complete generic data model, and still allows them to selectively process individual fields.

There's a reason MIME is living for this long.

trwnh@mastodon.social

@jens i think we’re mostly on the same page except for the bit about what a key’s semantics are. {id: foo, bar: {id: baz}} says that foo has a property bar whose value is baz. it doesn’t say anything about baz.

<foo>.actor and <baz>.actor mean different things in different documents, despite being a top-level key `actor` in both.

the analogy to MIME is that each document can be thought of as having a different media type, despite both being nominally json and/or ld+json

trwnh@mastodon.social

@jens there is a concept in later jsonld-based specs of “media type specificity”: a document that is application/cid is also implicitly application/ld+json, and also implicitly application/json, and arguably also text/plain, and arguably also application/octet-stream. each specific media type is a refinement of the one before it, or in other words, a more specific media type is just adding more constraints and semantics on top of everything before it.

trwnh@mastodon.social

@jens so consider one document is application/activity+json and the other document is a hypothetical application/schemadotorg+json. you want to merge them as ld+json and that’s fine; no issues. but the result is no longer purely activity+json, because a subnode is redefining `actor` according to schemadotorg+json. this is still fine if you allow for extended activity+json, but naive agents don’t know how to properly handle extensibility while reserializing for others.

trwnh@mastodon.social

@jens again though, it’s not a fatal error unless a term is protected, which usually happens as part of defining an IANA media type. the newer jsonld based media types provide a protected context that guarantees e.g. that `service` ALWAYS means the same thing in a document of type application/did. you can’t redefine `service` later in the document and still call the result semantically valid“application/did”. it would, however, be valid ld+json, if not for protected terms

trwnh@mastodon.social

@jens all’s well and good as long as you don’t reserialize two documents into a combined document without accounting for the difference in semantics.

this is, unfortunately, what naive agents are quite likely to do, because they are so completely unaware of the difference in semantics. their processing doesn’t have even a *concept* of semantics higher than JSON.

not *strictly* a problem for now, since activity+json’s context is not protected, but it could be.

trwnh@mastodon.social

@jens this isn’t to say that semantics are being unhandled entirely… think of it as if the semantics are being understood at an agent level instead of at a data format level. it’s like, instead of using an svg processor, you open it in a text editor and manually edit the text of the svg. if you’re not careful, you might create a syntactically valid xml document but you have altered the resulting svg image in a way you might not understand.

jens@social.finkhaeuser.de

@trwnh You know, I understand what you're writing, but don't seem to get across what I mean. I'll step away from this. Maybe I'll figure out a different way to say it ‍️

trwnh@mastodon.social

@jens it's mostly this part:

> it's only when you want to use LD to manipulate the entire document as a knowledge graph that you run into issues.

which is incorrect, so there's still some miscommunication on my part.

the issues occur when you use *JSON* to manipulate a *serialization*, like doing a find-and-replace on an English sentence resulting in nonsense.

it's like a student says "I is..." and the teacher says, "no, it's *am*", so the student says "I am the ninth letter of the alphabet"

trwnh@mastodon.social

@jens i get a sense that we're looking at it from somewhat unrelated angles... like, you're saying

> it should be part of the specifications of the key whether a string, array or object is *valid* here

except they're all valid. we just can't constrain it any further. an `object` of an Activity is... the activity's object. this describes the activity but it doesn't describe its object

so JSON semantics are fully defined, but there's still a higher level of semantics

trwnh@mastodon.social

@jens that's all, really -- if we're going in circles then it probably isn't productive to continue, as you say.

jens@social.finkhaeuser.de

@trwnh So we do arrive at it from two different points of view. That much is clear.

What my impression is, is that you have a linked data type of model firmly in your head, and JSON-LD is compatible with that. A pure JSON processor won't understand that, and mess with things.

What I'm trying to get across is that having the linked data model as a starting point when you *know* pure JSON processors will be involved is a mistake.

You can still *use* LD, you just can't use it as a precondition.

trwnh@mastodon.social

@jens i guess i don't really think "generic json processing" or "generic xml processing" really makes any sense. it's like trying to do "generic plaintext processing" or "generic octet stream processing". there should be *some* precondition of shared understanding; otherwise, communication breaks down. this understanding is baked into the application logic, media type, etc... and it always exists at least implicitly, although it can be explicitly described

trwnh@mastodon.social

@jens so the goal of the thread was to take something like "i before e except after c" and extend it to account for edge cases: "or unless it's followed by an r or a g"

the takeaway is basically "jsonld context authors should consider disabling propagation when using protected terms across semantic boundaries (like when merging two documents)"

the alternative is to never do a JSON find-and-replace, and instead strictly process the two documents separately.

trwnh@mastodon.social

@jens

basically this is fine:

{"id": "foo", "actor": "i", "type": "Like", "object": "ghostbusters"}

{"@\context": "schema.org", "id": "ghostbusters", "type": "Movie", "actor": "bill-murray"}

and this would be problematic if activity+json used a protected context:

{"id": "foo", "actor": "i", "type": "Like", "object": {"@\context": "schema.org", "id": "ghostbusters", "actor": "bill-murray"}}

note that activity+json doesn't currently use a protected context... but arguably it does in spirit.

trwnh@mastodon.social

@jens of course combining the two documents is easy. it's splitting them apart that's challenging. like trying to separate two balls of differently colored play-doh.

the act of injecting movie.json (json, ld+json, hypothetically schemadotorg+json) into like.json (json, ld+json, activity+json) is impure wrt the resource <ghostbusters>. the semantics of activity+json are leaking in and through the semantic boundary of the original two resources.

so either plug the leak, or don't combine. right?

trwnh@mastodon.social

@jens well, people are combining anyway and not plugging the leak, because they don't know the leak is there.

it's like trying to fit two sections of a water pipe together, except instead of properly screwing them into a fitting, you just glue them together without the fitting.

ideally people would stop gluing together their pipes and use proper fittings instead, but at least i can scrape off the glue and use my own fitting, i guess?

trwnh@mastodon.social

@jens this is all from the angle of taking arbitrary JSON and "upgrading" it to JSONLD, sure, but even without JSONLD you are still doing the arithmetic in your head.

the same kinds of complexity issues exist for other media formats too. no one said building a web browser was easy!

/fin

jens@social.finkhaeuser.de

@trwnh Again, though, ".actor" and ".object.actor" have different semantics, because of the semantics of ".object", so there really isn't a problem IMHO.

‍️

trwnh@mastodon.social

@jens they can be the same semantics actually

id: foo
actor: i
type: Like
object:
id: bar
actor: you
type: Like
object: that

".object" doesn't imply any semantics for <bar>, nor does it imply anything for ".object.actor" or any other properties. that's because <foo> and <bar> are in essence separate resources

it's semantically equivalent to saying

- id: foo
actor: i
type: Like
object: bar
- id: bar
actor: you
type: Like
object: that

only the serialization is different.

NodeBB-ActivityPub Bridge Test Instance

i'm not 100% sure about this but i am starting to think that the way #jsonld context declarations propagate by default is generally an anti-pattern