Copyright © 2021 Daniel Oaks <daniel@danieloaks.net>
Copyright © 2021 Shivaram Lingamneni <slingamn@cs.stanford.edu>
Unlimited redistribution and modification of this document is allowed provided that the above copyright notice and this permission notice remains intact.
IRC predates the Unicode standard. Consequently, although UTF-8 has been widely adopted on IRC, clients cannot assume that all IRC data is UTF-8. This specification defines a way for servers to advertise that they only allow UTF-8 on their network, letting clients change their processing of outgoing and incoming messages accordingly.
UTF8ONLY ISUPPORT token π
This specification introduces a new token UTF8ONLY that servers can include in their ISUPPORT (005) output. Servers publishing this token MUST NOT relay content (such as PRIVMSG or NOTICE message data, channel topics, or realnames) containing non-UTF-8 data to clients. Clients implementing this specification MUST NOT send non-UTF-8 data to the server once they have seen this token. Server handling of such messages is implementation-defined; for example, they MAY send the INVALID_UTF8 code described below, or respond in some other way.
If a client implementing this specification sees this token, they MUST set their outgoing encoding to UTF-8 without requiring any user intervention. This allows clients to work transparently on networks that only allow UTF-8 traffic.
INVALID_UTF8 standard replies code π
This is a code that can be used with the standard replies specification. When sent with the FAIL command, it indicates that the clientβs message was rejected because it contained invalid UTF-8 data. When sent with the WARN command, it indicates that the message was modified but still accepted.
Client: PRIVMSG #ircv3 :<non-utf-8 message>
Server: FAIL PRIVMSG INVALID_UTF8 :Message rejected, your IRC software MUST use UTF-8 encoding on this network
Client: USER u s e :<non-utf8 realname>
Server: FAIL USER INVALID_UTF8 :Message rejected, your IRC software MUST use UTF-8 encoding on this network
Client: PRIVMSG #ircv3 :<non-utf-8 message>
Server: WARN PRIVMSG INVALID_UTF8 :Your message was not correctly encoded as UTF-8 and had to be modified
This section is non-normative.
Implementations must ensure that if they truncate messages to meet a length limit, they do not do so in the middle of a UTF-8-encoded codepoint.
Software supporting UTF8ONLY: Ergo, UnrealIRCd, AdiIRC, Halloy, HexChat, KVIrc, mIRC, Srain, WeeChat, soju (as Server), soju (as Client), Limnoria, Matrix2051