This is a draft document explaining how to write locale files for GNU libc. It will not go into details, but reference specifications. It will on the other hand mention some of the pitfalls, and try to document the current practice.
Locale names consist of three parts. The language code, the country/region code, and the optional modifier. The format is language_REGION@modifier. The language code is a code from ISO 639. The two-letter code is prefered, but a three letter code is accepted if no two-letter code is available. The country/region code is a code from ISO 3166. If the language or region in question is missing in the ISO standard, one need to get the ISO standard updated before the locale will be included in glibc. If one can't convince the ISO 639 maintainers that your language exists (and thus need a language code), the glibc maintainers will refuse to add the locale. In addition, the glibc maintainers seem to refuse "artificial languages" like Esperanto and Lojban, even if they got a ISO 639 code.
Little is known about the requirements for the naming of modifiers. The following modifiers are currently used: abegede, cyrillic, euro and saaho. This might indicate that lower case letters are prefered in modifier names.
It is recommended to follow RFC 3066 when selecting locale names.
To make it easier to compare locales with each other, I recommend using the same order for the categories in all locales. Any order will do, so I picked the order used in most locales, and decided to recommend this order:
One should avoid cut-n-paste when possible, and instead use the copy statement to include sections from locales with identical content.
The category entries are references to the standard used when writing the given section. The standard refs should have quotes around them, and should not use the <U#> notation. They should normally look something like this:
category "i18n:1997";LC_IDENTIFICATION
Then yesexpr and noexpr entries should have the form ^[yY<extra>] and ^[nN<extra>], without 0 and 1 and without trailing ".*". The reason is to make sure the expressions have the same form as the expressions used in the C/POSIX locale (^[yY] and ^[nN]).
To test a new locale on a test machine, do the following:
Example, generating a new de_DE@euro locale using the ISO-8859-15 charset and save it as 'de_DE':
cp de_DE@euro /usr/share/i18n/locales/de_DE@euro localedef -i de_DE@euro -c -f ISO-8859-15 de_DE LANG=de_DE date
I've made a small tool check-locale capable of detecting a few common mistakes with locales