Hang on, I think attaching semantics to schemas, rather than data, solves 100% of the problems with both semantics and schemas.

jenniferplusplus@hachyderm.io

Am I wrong? This feels like it's something.

tetron@hachyderm.io

@jenniferplusplus

You might find this interesting:

GitHub - common-workflow-language/schema_salad: Semantic Annotations for Linked Avro Data

Semantic Annotations for Linked Avro Data. Contribute to common-workflow-language/schema_salad development by creating an account on GitHub.

GitHub (github.com)

Basically everything defined in the schema has a corresponding semantic node, documents are written in YAML but have a corresponding rdf representation, and robust support for including fields outside the core vocabulary in an unambiguous way

jenniferplusplus@hachyderm.io

@tetron This appears to be a project to define schemas for linked data documents? And that is, again, backwards. I want to attach (but not embed) vocabularies to schemas. Mostly so that I stop having to deal with it. It can be entirely the problem of the people who want it, instead of them making it my problem.

tetron@hachyderm.io

@jenniferplusplus
I think you want something like a json-ld context, which describes how json fields map to semantic nodes without necessarily specifying a schema, but even then it is hard to avoid asserting schema-like details such as whether a field takes a single value or an array of values.

But ultimately it is a problem for the schema design, because common anti patterns like reusing the same field name to mean different things in different contexts make it challenging to assign semantics.

jenniferplusplus@hachyderm.io

@tetron No, I extremely don't want that. I want the people who do want that to stop forcing it on me. I promise I know about json-ld, and I hate it.

jenniferplusplus@hachyderm.io

@tetron I want to give my json schema and human readable documentation to the people to who want that. And I want them to go off on their own devise their own method to attach semantic meaning to things that doesn't burden me with solving this problem that I don't have and don't care about.

trwnh@mastodon.social

@jenniferplusplus @tetron this is ironically what json-ld contexts were *supposed* to do -- you can "upgrade" any arbitrary json into json-ld by providing your own context, even if it wasn't explicitly declared by the document producer. but this requires you to "guess" what the producer meant by any given term, instead of the producer telling you explicitly what they meant. and your "guess" might not match someone else's "guess".

anyway, i don't see why this can't be layered on top of a schema.

trwnh@mastodon.social

@jenniferplusplus @tetron it's just that usually, the semantic data nerds will insist that the semantics are required while the schema is optional. it feels like the counterargument here is that the schema should be ~required instead, while the semantics should be optional.

trwnh@mastodon.social

@jenniferplusplus @tetron or maybe in an ideal world you could package both together. this is something i've been trying out -- have the context document include not just an intended context mapping, but also schema/ontology information. see https://w3id.org/fep/1985.jsonld for example, which defines a context mapping for 4 terms, but then also separately contains a graph for those term definitions. for example, `orderType` will tell you its domain, range, min/max cardinality (i.e. required/functional).

jenniferplusplus@hachyderm.io

@trwnh @tetron meaning is necessary to do anything meaningful with a document, sure. But the meaning is implicit in the context. We're all out here building AP social networking services, passing each other social messages. We know what these things mean. But without a schema, doing that processing is slow, expensive, and error prone. We gain nothing by defining these messages semantically, and lose a lot from the lack of structure.

tetron@hachyderm.io

@jenniferplusplus @trwnh
So the problem the semantic stuff is trying to solve is how to have an extensible standard without causing chaos, e.g. if two different implementations decide to add a field called "grilledCheese" but actually each one uses it to mean different things with different structure. Then the semantic markup lets you tell them apart.

tetron@hachyderm.io

@jenniferplusplus @trwnh
But your application probably only cares about or understands a subset of all the terms in use and it makes sense to use a schema to rigorously validate the things you support and ignore the rest.

trwnh@mastodon.social

@tetron @jenniferplusplus this assumes you are working purely in one problem space and never cross any boundaries. for example, if your schema is roughly "activitystreams plus some extensions", then you won't know what to do with something that isn't as2. here, the mime type is doing a lot of the semantic work for you. if you want to ensure that certain extensions are understood, you end up basically needing to define a new mime type. but the problem is you can embed documents in other documents

trwnh@mastodon.social

@tetron @jenniferplusplus so the mime type actually changes for only *part of the document* instead of the entire document. i think this is something a lot of people are not prepared to encounter, and generally don't know how to deal with, except by making assumptions based on popular usage. for example, the `publicKey` property is not part of as2. it's from the old deprecated security/v1. if doing ld, you expect some CryptographicKey object(s) inside it, but a "plain json" might use a string!

trwnh@mastodon.social

@tetron @jenniferplusplus this basically means that not only does your schema have to account for extensions (especially "required extensions" in the case of how fedi uses the `publicKey` property), you also have to be clear about semantics at some level. either that is done via the mime type, via the context declaration, or perhaps via some schema that indirectly embeds or references something equivalent to a context (as is being proposed at the top of this thread).

NodeBB-ActivityPub Bridge Test Instance

Hang on, I think attaching semantics to schemas, rather than data, solves 100% of the problems with both semantics and schemas.

GitHub - common-workflow-language/schema_salad: Semantic Annotations for Linked Avro Data