Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

NodeBB

  1. Home
  2. General Discussion
  3. Internationalise The Fediverse

Internationalise The Fediverse

Scheduled Pinned Locked Moved General Discussion
activitypubfediversei18nmastodonunicode
22 Posts 20 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • blog@shkspr.mobiB blog@shkspr.mobi

    Internationalise The Fediverse

    https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

    We live in the future now. It is OK to use Unicode everywhere.

    It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

    A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

    Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

    So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

    Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

    What Does The Fox Spec Say?

    The ActivityPub specification says:

    Building an international base of users is important in a federated network. Internationalization

    I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

    The user's @ name is defined by preferredUsername which is:

    A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

    There's nothing in there about what scripts it can contain. However, later on, the spec says:

    Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

    So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

    The ActivityStreams specification talks about language mapping.

    Finally, the ActivityPub specification has some examples on non-Latin text in names.

    So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

    But What About...?

    There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

    What about homograph attacks?

    Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

    What if people make names that can't be typed?

    Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

    What about weird "Zalgo" text?

    It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

    What about bi-directional text?

    The spec makes clear this is allowed.

    Do people even want a username in their own script?

    I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

    What's Next?

    If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

    If your software can see @你好@i18n.viii.fi and its posts, please let me know.

    #ActivityPub #fediverse #i18n #mastodon #unicode
    harald@hub.volse.noH This user is from outside of this forum
    harald@hub.volse.noH This user is from outside of this forum
    harald@hub.volse.no
    wrote on last edited by
    #2
    @Terence Eden’s Blog @nĭ hăo Can connect, at least. Perhaps mention works too?
    From Hubzilla.

    Screenshot of connection to nĭ hăo from Hubzilla.
    1 Reply Last reply
    0
    • blog@shkspr.mobiB blog@shkspr.mobi

      Internationalise The Fediverse

      https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

      We live in the future now. It is OK to use Unicode everywhere.

      It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

      A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

      Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

      So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

      Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

      What Does The Fox Spec Say?

      The ActivityPub specification says:

      Building an international base of users is important in a federated network. Internationalization

      I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

      The user's @ name is defined by preferredUsername which is:

      A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

      There's nothing in there about what scripts it can contain. However, later on, the spec says:

      Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

      So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

      The ActivityStreams specification talks about language mapping.

      Finally, the ActivityPub specification has some examples on non-Latin text in names.

      So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

      But What About...?

      There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

      What about homograph attacks?

      Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

      What if people make names that can't be typed?

      Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

      What about weird "Zalgo" text?

      It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

      What about bi-directional text?

      The spec makes clear this is allowed.

      Do people even want a username in their own script?

      I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

      What's Next?

      If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

      If your software can see @你好@i18n.viii.fi and its posts, please let me know.

      #ActivityPub #fediverse #i18n #mastodon #unicode
      mabande@mastodon.socialM This user is from outside of this forum
      mabande@mastodon.socialM This user is from outside of this forum
      mabande@mastodon.social
      wrote on last edited by
      #3

      @blog Just to make the answer to "Do people even want a username in their own script?" official:
      Yes. Yes we do.

      Great work and I hope it catches on! 🙂

      1 Reply Last reply
      0
      • blog@shkspr.mobiB blog@shkspr.mobi

        Internationalise The Fediverse

        https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

        We live in the future now. It is OK to use Unicode everywhere.

        It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

        A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

        Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

        So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

        Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

        What Does The Fox Spec Say?

        The ActivityPub specification says:

        Building an international base of users is important in a federated network. Internationalization

        I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

        The user's @ name is defined by preferredUsername which is:

        A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

        There's nothing in there about what scripts it can contain. However, later on, the spec says:

        Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

        So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

        The ActivityStreams specification talks about language mapping.

        Finally, the ActivityPub specification has some examples on non-Latin text in names.

        So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

        But What About...?

        There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

        What about homograph attacks?

        Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

        What if people make names that can't be typed?

        Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

        What about weird "Zalgo" text?

        It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

        What about bi-directional text?

        The spec makes clear this is allowed.

        Do people even want a username in their own script?

        I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

        What's Next?

        If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

        If your software can see @你好@i18n.viii.fi and its posts, please let me know.

        #ActivityPub #fediverse #i18n #mastodon #unicode
        tirifto@jam.xwx.moeT This user is from outside of this forum
        tirifto@jam.xwx.moeT This user is from outside of this forum
        tirifto@jam.xwx.moe
        wrote on last edited by
        #4

        @blog@shkspr.mobi Yes! English may have an elevated status in software development, but that should absolutely not translate into any kind of favouritism on the user side. I don’t have enough insight into the technical side of international username support to say if there might be issues you haven’t addressed, but I know that custom emoji gets the ASCII treatment as well, and for no good reason whatsoever. <img class="not-responsive emoji" src="https://jam.xwx.moe/emoji/Gutkatoj/gutkato_malĝojeta.png" title=":gutkato_malĝojeta:" />

        #Mastodon and #Firefish have these issues open for the emoji:

        • Unicode in custom emoji for Mastodon
        • Unicode in custom emoji for Firefish
        • Unicode in custom emoji reactions for Pleroma

        #Pleroma has basic support the emoji, but lacks support for post language. There are two pull requests to add it, but their importance seems severely underestimated:

        • Setting post language in Pleroma
        • Multi-language posting in Pleroma

        Might be a good idea to have open issues and track their status in all the relevant software. It also probably helps if more people talk about this and express support (in appropriate channels!) to show that yes, it is indeed worth it. <img class="not-responsive emoji" src="https://jam.xwx.moe/emoji/Manĝaĵoj/sandviĉo.png" title=":sandviĉo:" />

        1 Reply Last reply
        0
        • vegafjord@freeradical.zoneV This user is from outside of this forum
          vegafjord@freeradical.zoneV This user is from outside of this forum
          vegafjord@freeradical.zone
          wrote on last edited by
          #5

          @blog Yes! Im so annoyed by the arbitrary #anglocentrism!

          1 Reply Last reply
          0
          • bonfire@indieweb.socialB This user is from outside of this forum
            bonfire@indieweb.socialB This user is from outside of this forum
            bonfire@indieweb.social
            wrote on last edited by
            #6

            @blog @Edent Good point. We had to fix one thing (URL encoding the webfinger request) but it now works for remote actors in Bonfire.

            1 Reply Last reply
            0
            • blog@shkspr.mobiB blog@shkspr.mobi

              Internationalise The Fediverse

              https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

              We live in the future now. It is OK to use Unicode everywhere.

              It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

              A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

              Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

              So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

              Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

              What Does The Fox Spec Say?

              The ActivityPub specification says:

              Building an international base of users is important in a federated network. Internationalization

              I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

              The user's @ name is defined by preferredUsername which is:

              A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

              There's nothing in there about what scripts it can contain. However, later on, the spec says:

              Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

              So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

              The ActivityStreams specification talks about language mapping.

              Finally, the ActivityPub specification has some examples on non-Latin text in names.

              So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

              But What About...?

              There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

              What about homograph attacks?

              Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

              What if people make names that can't be typed?

              Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

              What about weird "Zalgo" text?

              It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

              What about bi-directional text?

              The spec makes clear this is allowed.

              Do people even want a username in their own script?

              I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

              What's Next?

              If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

              If your software can see @你好@i18n.viii.fi and its posts, please let me know.

              #ActivityPub #fediverse #i18n #mastodon #unicode
              meena@cathode.churchM This user is from outside of this forum
              meena@cathode.churchM This user is from outside of this forum
              meena@cathode.church
              wrote on last edited by
              #7

              @blog Tusky on cathode.church (Glitch-social) can't doesn't automatically reply to @你好@i18n.viii.fi, and can't find, when using the search

              1 Reply Last reply
              0
              • blog@shkspr.mobiB blog@shkspr.mobi

                Internationalise The Fediverse

                https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                We live in the future now. It is OK to use Unicode everywhere.

                It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                What Does The Fox Spec Say?

                The ActivityPub specification says:

                Building an international base of users is important in a federated network. Internationalization

                I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                The user's @ name is defined by preferredUsername which is:

                A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                There's nothing in there about what scripts it can contain. However, later on, the spec says:

                Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                The ActivityStreams specification talks about language mapping.

                Finally, the ActivityPub specification has some examples on non-Latin text in names.

                So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                But What About...?

                There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                What about homograph attacks?

                Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                What if people make names that can't be typed?

                Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                What about weird "Zalgo" text?

                It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                What about bi-directional text?

                The spec makes clear this is allowed.

                Do people even want a username in their own script?

                I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                What's Next?

                If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                #ActivityPub #fediverse #i18n #mastodon #unicode
                wbpeckham@techhub.socialW This user is from outside of this forum
                wbpeckham@techhub.socialW This user is from outside of this forum
                wbpeckham@techhub.social
                wrote on last edited by
                #8

                @blog I have no problem with something like original ASCII for localized English-speaking application or database use. For anything general, or applicable internationally or even worldwide I see no excuse for anything less when we have something suitable for generating bad translations into almost every language! I see no excuse for making anyone code or script in a language foreign to them. This is 2024, we have international solutions for this!

                1 Reply Last reply
                0
                • blog@shkspr.mobiB blog@shkspr.mobi

                  Internationalise The Fediverse

                  https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                  We live in the future now. It is OK to use Unicode everywhere.

                  It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                  A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                  Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                  So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                  Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                  What Does The Fox Spec Say?

                  The ActivityPub specification says:

                  Building an international base of users is important in a federated network. Internationalization

                  I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                  The user's @ name is defined by preferredUsername which is:

                  A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                  There's nothing in there about what scripts it can contain. However, later on, the spec says:

                  Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                  So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                  The ActivityStreams specification talks about language mapping.

                  Finally, the ActivityPub specification has some examples on non-Latin text in names.

                  So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                  But What About...?

                  There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                  What about homograph attacks?

                  Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                  What if people make names that can't be typed?

                  Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                  What about weird "Zalgo" text?

                  It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                  What about bi-directional text?

                  The spec makes clear this is allowed.

                  Do people even want a username in their own script?

                  I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                  What's Next?

                  If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                  If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                  #ActivityPub #fediverse #i18n #mastodon #unicode
                  arildsen@fosstodon.orgA This user is from outside of this forum
                  arildsen@fosstodon.orgA This user is from outside of this forum
                  arildsen@fosstodon.org
                  wrote on last edited by
                  #9

                  @blog in the Ice Cubes Mastodon client on iOS, I just get this JSON response when tapping the user name:

                  {"subject":"acct:%E4%BD%A0%E5%A5%BD@i18n.viii.fi","links":[{"rel":"self","type":"application\/activity+json","href":"https:\/\/i18n.viii.fi\/%E4%BD%A0%E5%A5%BD"}]}

                  1 Reply Last reply
                  0
                  • blog@shkspr.mobiB blog@shkspr.mobi

                    Internationalise The Fediverse

                    https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                    We live in the future now. It is OK to use Unicode everywhere.

                    It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                    A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                    Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                    So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                    Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                    What Does The Fox Spec Say?

                    The ActivityPub specification says:

                    Building an international base of users is important in a federated network. Internationalization

                    I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                    The user's @ name is defined by preferredUsername which is:

                    A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                    There's nothing in there about what scripts it can contain. However, later on, the spec says:

                    Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                    So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                    The ActivityStreams specification talks about language mapping.

                    Finally, the ActivityPub specification has some examples on non-Latin text in names.

                    So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                    But What About...?

                    There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                    What about homograph attacks?

                    Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                    What if people make names that can't be typed?

                    Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                    What about weird "Zalgo" text?

                    It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                    What about bi-directional text?

                    The spec makes clear this is allowed.

                    Do people even want a username in their own script?

                    I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                    What's Next?

                    If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                    If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                    #ActivityPub #fediverse #i18n #mastodon #unicode
                    villavelius@mastodon.onlineV This user is from outside of this forum
                    villavelius@mastodon.onlineV This user is from outside of this forum
                    villavelius@mastodon.online
                    wrote on last edited by
                    #10

                    @blog I generally agree. Homographs do produce problems in science, though, even in articles written in 'English'. For instance β-carotene is not the same as the non-existing ß-carotene. (The latter, the sz ligature, can all too often be found in the scientific literature, where β is meant. Not a big problem for the human eye, but a big one for machine-readability.)

                    villavelius@mastodon.onlineV 1 Reply Last reply
                    0
                    • villavelius@mastodon.onlineV villavelius@mastodon.online

                      @blog I generally agree. Homographs do produce problems in science, though, even in articles written in 'English'. For instance β-carotene is not the same as the non-existing ß-carotene. (The latter, the sz ligature, can all too often be found in the scientific literature, where β is meant. Not a big problem for the human eye, but a big one for machine-readability.)

                      villavelius@mastodon.onlineV This user is from outside of this forum
                      villavelius@mastodon.onlineV This user is from outside of this forum
                      villavelius@mastodon.online
                      wrote on last edited by
                      #11

                      @blog Not to forget confusing fonts. Fraktur, for example:

                      1 Reply Last reply
                      0
                      • blog@shkspr.mobiB blog@shkspr.mobi

                        Internationalise The Fediverse

                        https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                        We live in the future now. It is OK to use Unicode everywhere.

                        It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                        A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                        Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                        So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                        Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                        What Does The Fox Spec Say?

                        The ActivityPub specification says:

                        Building an international base of users is important in a federated network. Internationalization

                        I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                        The user's @ name is defined by preferredUsername which is:

                        A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                        There's nothing in there about what scripts it can contain. However, later on, the spec says:

                        Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                        So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                        The ActivityStreams specification talks about language mapping.

                        Finally, the ActivityPub specification has some examples on non-Latin text in names.

                        So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                        But What About...?

                        There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                        What about homograph attacks?

                        Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                        What if people make names that can't be typed?

                        Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                        What about weird "Zalgo" text?

                        It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                        What about bi-directional text?

                        The spec makes clear this is allowed.

                        Do people even want a username in their own script?

                        I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                        What's Next?

                        If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                        If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                        #ActivityPub #fediverse #i18n #mastodon #unicode
                        xtrems876@tech.lgbtX This user is from outside of this forum
                        xtrems876@tech.lgbtX This user is from outside of this forum
                        xtrems876@tech.lgbt
                        wrote on last edited by
                        #12

                        @blog
                        Yeah, the amount of times I ended up having a square in the middle of my surname made me really wary of putting my real name on official documents in the west. Instead I operate under a fake name "Kielinski" instead.

                        1 Reply Last reply
                        0
                        • timwardcam@c.imT This user is from outside of this forum
                          timwardcam@c.imT This user is from outside of this forum
                          timwardcam@c.im
                          wrote on last edited by
                          #13

                          @blog "This is not a hard computer-science problem."

                          😂

                          There is, or at least was for decades, a Cambridge computer science exam question: "Explain why even experienced programmers sometimes have difficulties with character codes."

                          When that question was originally written the expected answers would have been around things like escape sequences on five track paper tape.

                          When I did the exam the sort of answer expected might have been to do with whether your code was portable between ASCII and EBCDIC (with the gaps in the middle of the letters, remember?).

                          These days, your toot would be an answer.

                          1 Reply Last reply
                          0
                          • blog@shkspr.mobiB blog@shkspr.mobi

                            Internationalise The Fediverse

                            https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                            We live in the future now. It is OK to use Unicode everywhere.

                            It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                            A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                            Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                            So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                            Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                            What Does The Fox Spec Say?

                            The ActivityPub specification says:

                            Building an international base of users is important in a federated network. Internationalization

                            I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                            The user's @ name is defined by preferredUsername which is:

                            A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                            There's nothing in there about what scripts it can contain. However, later on, the spec says:

                            Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                            So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                            The ActivityStreams specification talks about language mapping.

                            Finally, the ActivityPub specification has some examples on non-Latin text in names.

                            So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                            But What About...?

                            There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                            What about homograph attacks?

                            Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                            What if people make names that can't be typed?

                            Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                            What about weird "Zalgo" text?

                            It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                            What about bi-directional text?

                            The spec makes clear this is allowed.

                            Do people even want a username in their own script?

                            I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                            What's Next?

                            If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                            If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                            #ActivityPub #fediverse #i18n #mastodon #unicode
                            cristei@tech.lgbtC This user is from outside of this forum
                            cristei@tech.lgbtC This user is from outside of this forum
                            cristei@tech.lgbt
                            wrote on last edited by
                            #14

                            @blog sorry, but text is pretty hard after you start thinking about anything else but the latin alphabet, that's the primary technical motive for why even basic support is lacking.

                            1 Reply Last reply
                            0
                            • blog@shkspr.mobiB blog@shkspr.mobi

                              Internationalise The Fediverse

                              https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                              We live in the future now. It is OK to use Unicode everywhere.

                              It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                              A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                              Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                              So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                              Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                              What Does The Fox Spec Say?

                              The ActivityPub specification says:

                              Building an international base of users is important in a federated network. Internationalization

                              I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                              The user's @ name is defined by preferredUsername which is:

                              A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                              There's nothing in there about what scripts it can contain. However, later on, the spec says:

                              Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                              So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                              The ActivityStreams specification talks about language mapping.

                              Finally, the ActivityPub specification has some examples on non-Latin text in names.

                              So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                              But What About...?

                              There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                              What about homograph attacks?

                              Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                              What if people make names that can't be typed?

                              Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                              What about weird "Zalgo" text?

                              It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                              What about bi-directional text?

                              The spec makes clear this is allowed.

                              Do people even want a username in their own script?

                              I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                              What's Next?

                              If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                              If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                              #ActivityPub #fediverse #i18n #mastodon #unicode
                              ruawhitepaw@chitter.xyzR This user is from outside of this forum
                              ruawhitepaw@chitter.xyzR This user is from outside of this forum
                              ruawhitepaw@chitter.xyz
                              wrote on last edited by
                              #15

                              @blog Tusky opens a webpage with some JSON in it instead. Fantastic.

                              1 Reply Last reply
                              0
                              • blog@shkspr.mobiB blog@shkspr.mobi

                                Internationalise The Fediverse

                                https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                                We live in the future now. It is OK to use Unicode everywhere.

                                It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                                A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                                Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                                So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                                Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                                What Does The Fox Spec Say?

                                The ActivityPub specification says:

                                Building an international base of users is important in a federated network. Internationalization

                                I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                                The user's @ name is defined by preferredUsername which is:

                                A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                                There's nothing in there about what scripts it can contain. However, later on, the spec says:

                                Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                                So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                                The ActivityStreams specification talks about language mapping.

                                Finally, the ActivityPub specification has some examples on non-Latin text in names.

                                So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                                But What About...?

                                There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                                What about homograph attacks?

                                Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                                What if people make names that can't be typed?

                                Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                                What about weird "Zalgo" text?

                                It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                                What about bi-directional text?

                                The spec makes clear this is allowed.

                                Do people even want a username in their own script?

                                I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                                What's Next?

                                If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                                If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                                #ActivityPub #fediverse #i18n #mastodon #unicode
                                jorin@soc.punktrash.clubJ This user is from outside of this forum
                                jorin@soc.punktrash.clubJ This user is from outside of this forum
                                jorin@soc.punktrash.club
                                wrote on last edited by
                                #16
                                @blog I'm using husky against pleroma. The username is parsed as a link to https://i18n.viii.fi/.well-known/webfinger . It didn't appear in the reply window when typing this up.
                                1 Reply Last reply
                                0
                                • edent@mastodon.socialE This user is from outside of this forum
                                  edent@mastodon.socialE This user is from outside of this forum
                                  edent@mastodon.social
                                  wrote on last edited by
                                  #17

                                  @onemuri@wavebird.party @blog
                                  Thanks for your repy. Re your comment about a "self own".

                                  The purpose of hyperbole in written text is to convey the ridiculous nature of a statement by making it obviously extreme. For example, I used multiple exclamation marks and preceded it with a couple of other statements of a similar nature.

                                  In doing so, I hoped to lead my reader into understanding that I disagreed with the proposition - as set out by the rest of the post.

                                  I'm sorry if that wasn't clear.

                                  1 Reply Last reply
                                  0
                                  • blog@shkspr.mobiB blog@shkspr.mobi

                                    Internationalise The Fediverse

                                    https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                                    We live in the future now. It is OK to use Unicode everywhere.

                                    It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                                    A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                                    Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                                    So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                                    Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                                    What Does The Fox Spec Say?

                                    The ActivityPub specification says:

                                    Building an international base of users is important in a federated network. Internationalization

                                    I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                                    The user's @ name is defined by preferredUsername which is:

                                    A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                                    There's nothing in there about what scripts it can contain. However, later on, the spec says:

                                    Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                                    So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                                    The ActivityStreams specification talks about language mapping.

                                    Finally, the ActivityPub specification has some examples on non-Latin text in names.

                                    So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                                    But What About...?

                                    There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                                    What about homograph attacks?

                                    Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                                    What if people make names that can't be typed?

                                    Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                                    What about weird "Zalgo" text?

                                    It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                                    What about bi-directional text?

                                    The spec makes clear this is allowed.

                                    Do people even want a username in their own script?

                                    I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                                    What's Next?

                                    If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                                    If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                                    #ActivityPub #fediverse #i18n #mastodon #unicode
                                    jontheniceguy@toot.ioJ This user is from outside of this forum
                                    jontheniceguy@toot.ioJ This user is from outside of this forum
                                    jontheniceguy@toot.io
                                    wrote on last edited by
                                    #18

                                    @blog in @Tusky when I click on the account link it takes me to the webfinger URL.

                                    1 Reply Last reply
                                    0
                                    • tito_swineflu@sfba.socialT This user is from outside of this forum
                                      tito_swineflu@sfba.socialT This user is from outside of this forum
                                      tito_swineflu@sfba.social
                                      wrote on last edited by
                                      #19

                                      @blog I think you're aiming too high when half the payment processors and reservation systems I com into contact with can't even accept a hyphenated name.

                                      1 Reply Last reply
                                      0
                                      • blog@shkspr.mobiB blog@shkspr.mobi

                                        Internationalise The Fediverse

                                        https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/

                                        We live in the future now. It is OK to use Unicode everywhere.

                                        It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!

                                        A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".

                                        Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.

                                        So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!

                                        Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.

                                        What Does The Fox Spec Say?

                                        The ActivityPub specification says:

                                        Building an international base of users is important in a federated network. Internationalization

                                        I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.

                                        The user's @ name is defined by preferredUsername which is:

                                        A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects

                                        There's nothing in there about what scripts it can contain. However, later on, the spec says:

                                        Properties containing natural language values, such as name, preferredUsername, or summary, make use of natural language support defined in ActivityStreams. 4. Actors

                                        So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.

                                        The ActivityStreams specification talks about language mapping.

                                        Finally, the ActivityPub specification has some examples on non-Latin text in names.

                                        So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.

                                        But What About...?

                                        There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.

                                        What about homograph attacks?

                                        Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.

                                        What if people make names that can't be typed?

                                        Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.

                                        What about weird "Zalgo" text?

                                        It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.

                                        What about bi-directional text?

                                        The spec makes clear this is allowed.

                                        Do people even want a username in their own script?

                                        I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉

                                        What's Next?

                                        If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.

                                        If your software can see @你好@i18n.viii.fi and its posts, please let me know.

                                        #ActivityPub #fediverse #i18n #mastodon #unicode
                                        federico3@oldbytes.spaceF This user is from outside of this forum
                                        federico3@oldbytes.spaceF This user is from outside of this forum
                                        federico3@oldbytes.space
                                        wrote on last edited by
                                        #20

                                        @blog
                                        Homographs are a big security problem, also an easily printable id is needed in many protocols for development, debugging and bug reports. Unless you want to replace ids with qrcodes or similar...

                                        edent@mastodon.socialE 1 Reply Last reply
                                        0
                                        • federico3@oldbytes.spaceF federico3@oldbytes.space

                                          @blog
                                          Homographs are a big security problem, also an easily printable id is needed in many protocols for development, debugging and bug reports. Unless you want to replace ids with qrcodes or similar...

                                          edent@mastodon.socialE This user is from outside of this forum
                                          edent@mastodon.socialE This user is from outside of this forum
                                          edent@mastodon.social
                                          wrote on last edited by
                                          #21

                                          @federico3 @blog
                                          As I mention in the post, ASCll aIready has a H0M0GRAPH problem.

                                          You also pre-suppose that all programmers are able to read A-Z as well as their own alphabet.

                                          But, even if that's not the case, the IDs can be URl encoded.

                                          1 Reply Last reply
                                          0

                                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                          With your input, this post could be even better 💗

                                          Register Login
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups