Standardization
=标准化=
When two or more entities interact, common conventions are important. Car drivers must abide by traffic rules to prevent accidents. People need common conventions on languages and gestures to communicate. Likewise, software needs standards and protocols to interoperate seamlessly. In terms of software engineering, contracts between parts of programs need to be established before implementation. The contracts are most important for systems developed by a large group of individual developers from different backgrounds, and are extremely essential for cross-platform interoperability.
两个或者更多的团体进行交流时,重要的是使用同样的方式。汽车司机要遵守交通规则才能避免事故。人们需要共同的语言和手势习惯才能沟通。类似地,软件需要标准和协议来实现无缝的协作。在软件工程中,程序的不同部分需要在实现前预先确定协调方法。由具有不同背景的许多单个开发者组成的大团体开发的系统中协议尤其重要,而且对跨平台的协作能力是极为必要的。
Standards provide such contracts for all computing systems in the world. Software developers need to conform to such conventions to prevent miscommunication. Therefore, standardization should be the very first step for any kind of software development, including localization.
标准为世界上所有计算系统提供了这样的协议。软件开发者需要遵循这些标准以避免沟通的障碍。因此,对于包括本地化在内的任何类型的软件开发,标准化都应该是第一步。
To start localization, it is a good idea to study related standards and use them throughout the project. Nowadays, many international standards and specifications have been developed to cover the languages of the world. If these do not fit the project’s needs, one may consider participating in standardization activities. Important sources are:
要开始本地化,一个好办法是研究相关的标准,并在项目中自始至终使用这些标准。现在,人们开发了许多国际标准和技术指标以覆盖世界各地的语言。如果这些标准不符合项目的需要,可以考虑参与标准化活动。重要的标准来源包括:
ISO/IEC JTC1 (International Organization for Standardization and International Electrotechnical Commission Joint Technical Committee 1): A joint technical committee for international standards for information technology. There are many subcommittees (SC) for different categories, under which working groups (WG) are formed to work on subcategories of standards. For example, ISO/IEC JTC1/SC2/WG2 is the working group for Universal Coded Character Set (UCS). The standardization process, however, proceeds in a closed manner. If the national standard body is an ISO/IEC member, it can propose the requirements for the project. Otherwise, one may need to approach individual committees. They may ask for participation as a specialist. Information for JTC1/SC2 (coded character sets) is published at anubis.dkuug.dk/JTC1/SC2. Information for JTC1/SC22 (programming languages, their environments and system software interfaces) is at anubis.dkuug.dk/JTC1/SC22.
ISO/IEC JTC1 (国际标准化组织和国际电子技术协会第一联合技术委员会,International Organization for Standardization and International Electrotechnical Commission Joint Technical Committee 1):一个负责制定信息技术国际标准的联合技术委员会。其中包括许多负责不同方面的子委员会(subcommittees, SC),再下一层则是负责标准子类工作的工作组(working groups, WG)。例如,ISO/IEC JTC1/SC2/WG2 是负责统一编码字符集(Universal Coded Character Set, UCS)的工作组。标准化的过程是以封闭的方式进行的。如果国家的标准组织是 ISO/IEC 的成员,它可以提出项目需求的建议。否则,提议制定标准需要接触单独的委员会。委员会可能需要专家参与。关于 JTC1/SC2 (编码字符集)的信息在 anubis.dkuug.dk/JTC1/SC2 发布。关于 JTC1/SC22 (编程语言,环境和系统软件界面)的信息在 anubis.dkuug.dk/JTC1/SC22 发布。
Unicode Consortium: A non-profit organization working on a universal character set. It is closely related to ISO/IEC JTC1 subcommittees. Its Web site is at www.unicode.org, where channels of contribution are provided.
Unicode 联合会:一个开发通用字符集的非营利组织。它与 ISO/IEC JTC1 子委员会的关系密切。其网站是 www.unicode.org,上面提供了参与工作的渠道。
Free Standards Group: A non-profit organization dedicated to accelerating the use of FOSS by developing and promoting standards. Its Web site is at www.freestandards.org. It is open to participation. There are a number of work groups under its umbrella, including OpenI18N for internationalization (www.openi18n.org).
自由标准小组:一个专门通过开发和提倡标准来促进自由/开源软件使用的非营利组织。它的网站是 www.freestandards.org。任何人都可以参与。在其下有许多工作组,包括做国际化工作的 OpenI18N (www.openi18n.org)。
Note, however, that some issues such as national keyboard maps and input/output methods are not covered by the standards mentioned above. The national standards body should define these standards, or unify existing solutions used by different vendors, so that users can benefit from the consistency.
但是注意,像国家键盘布局和输入/输出方法这样的问题并不包含在上述的标准中。国家的标准组织应当定义这些标准,或者把不同提供商使用的解决方案统一起来,这样用户才能从一致的标准中受益。
Unicode
=Unicode=
Characters are the most fundamental units for representing text data of any particular language. In mathematical terms, the character set defines the set of all characters used in a language. In ICT terms, the character set must be encoded as bytes in the storage, according to some conventions, called encoding. These conventions must be agreed upon both by the sender and receiver of data for the information to remain intact and exact.
字符是表示任何语言文本资料的最基本单位。字符集以数字形式定义了一种语言中使用的所有字符。在国际通信技术术语中,字符集必须按照一定的规定被编码成字符以便存储,这个过程称为编码。这些规定必须由数据的发送方和接收方达成一致,以保证信息完整准确。
In the 1970s, the character set used by most programs consisted of letters of the English alphabet, decimal digits and some punctuation marks. The most widely used encoding was the 7-bit ASCII (American Standard Code for Information Interchange), in which up to 128 characters can be represented, which is just sufficient for English. However, when the need to use non-English languages in computers arose, other encodings were defined. The concept of codepages was devised as enhancements to ASCII by adding characters as the second 7-bit half, making an 8-bit code table in total. Several codepages were defined by vendors for special characters for decoration purpose and for Latin accents. Some non-European languages were added by this strategy, such as Hebrew and Thai. National standards were defined for character encoding.
在20世纪70年代,大多数程序使用的字符包括了英文字母表,10进制数字和一些标点符号。最广泛使用的编码是7位的 ASCII(美国信息交换标准代码,American Standard Code for Information Interchange),最多可以表示128个字符,仅仅足够英语使用。不过,随着使用非英语语言的需要产生,其他的编码也被定义出来。编码页(codepage)的概念作为通过在第二个半区增加字符增强 ASCII 的方法而被发明出来,使得码表达到8比特。供应商定义了几种编码页用于表示修饰和拉丁文音调的特殊字符。一些非欧洲语言通过这种策略被加入码表中,例如希伯来语和泰语。一些字符编码的国家标准也被开发出来。
The traditional encoding systems were not suitable for Asian languages that have large character sets and particular complexities. For example, the encoding of Han characters used by the Chinese, Japanese and Korean (CJK), the total number of which are still not determined, is much more complicated. A large number of codepages must be defined to cover all of them. Moreover, compatibility with other single-byte encodings is another significant challenge. This ends up in some multi-byte encodings for CJK.
传统的编码系统不适合有大量字符和特殊形式的亚洲语言。例如,汉语,日语和韩语(CJK)使用的尚未确定总数的汉字编码就复杂得多。要提供所有的汉字编码,必须定义大量的编码页。此外,与其他单字节编码系统的兼容性也是一个巨大的挑战。因此 CJK 使用了多字节的编码。
However, having a lot of encoding standards to support is a problem for software developers. A group of vendors thus agreed to work together to define a single character set that covers the characters of all languages of the world, so that developers have a single point of reference, and users have a single encoding. The Unicode Consortium was thus founded. Major languages in the world were added to the code table. Later on, ISO and IEC formed JTC1/SC2/WG2 to standardize the code table, which is published as ISO/IEC 10646. Unicode is also a member of the working group, along with standard bodies of ISO member countries. Both Unicode and ISO/IEC 10646 are synchronized, so the code tables are the same. But Unicode also provides additional implementation guidelines, such as character properties, rendering, editing, string collation, etc.
但是,需要支持多种编码标准对软件开发者来说是个问题。一组厂商为此商定写作定义一个统一的字符集,涵盖世界上所有语言的字符,这样开发者就有一个单一的参照标准,用户也只需要使用一种编码。为此 Unicode 联合会就成立了。世界上的主要语言都被加入到码表中。不久后,ISO 和 IEC 组建了 JTC1/SC2/WG2 来标准化码表,并作为 ISO/IEC 10646 发布。Unicode 联合会也和其他 ISO 成员国的标准组织一样是工作组的成员之一。Unicode 和 ISO/IEC 10646 是同步的,因此码表一致。但 Unicode 也提供了附加的实现导则,例如字符属性、渲染、编辑、字符排序,等等。
Nowadays, many applications have moved to Unicode and have benefited from the clear definitions for supporting new languages. Users of Unicode are able to exchange information in their own languages, especially through the Internet, without compatibility issues.
现在,许多应用程序已经使用了 Unicode 并且从新语言定义的清晰支持中受益。Unicode 的用户可以用自己的语言交换信息,尤其是通过因特网,而没有兼容性的问题。
Fonts
=字体=
Once the character set and encoding of a script are defined, the first step to enabling it on a system is to display it. Rendering text on the screen requires some resource to describe the shapes of the characters, i.e., the fonts, and some process to render the character images as per script conventions. The process is called the output method. This section will try to cover important aspects of these requirements.
一种文字的字符集和编码得到定义后,在一种系统上使用它的第一步就是显示。在屏幕上渲染文本需要一些资源来描述字符的形状,即字体,还需要一些按照每种文字的规定渲染字符图像的程序。这种程序被称为输出方法。本节将讨论有关这些需求的一些重要事项。
Characters and Glyphs
==字符和符号==
A font is a set of glyphs for a character set. A glyph is an appearance form of a character or a sequence of characters. It is quite important to distinguish the concepts of characters and glyphs. For some scripts, a character can have more than one variation, depending on the context. In that case, the font may contain more than one glyph for each of those characters, so that the text renderer can dynamically pick the appropriate one. On the other hand, the concept of ligatures, such as “ff” in English text, also allows some sequence of characters to be drawn together. This introduces another kind of mapping of multiple characters to a single glyph.
字体是对应一个字符集的一系列符号。符号是一个字符或一串字符的表现形式。区分字符和符号的概念非常重要。对于一些文字,一个字符可能根据上下文有多种形式。这种情况下,字体对于每个这样的字符需要包含多个符号,这样文字渲染程序可以动态地选取适合的符号。另一方面,像英文中的“ff“这样的连写的概念也允许一些特定顺序的字符在一起描画。这引入了一种将多个字符映射到单个符号的做法。
Bitmap and Vector Fonts
==点阵和矢量字体==
In principle, there are two methods of describing glyphs in fonts: bitmaps and vectors. Bitmap fonts describe glyph shapes by plotting the pixels directly onto a two-dimensional grid of determined size, while vector fonts describe the outlines of the glyphs with line and curve drawing instructions. In other words, bitmap fonts are designed for a particular size, while vector fonts are designed for all sizes. The quality of the glyphs rendered from bit-map fonts always drops when they are scaled up, while that from vector fonts does not. However, vector fonts often render poorly in small sizes in low-resolution devices, such as computer screens, due to the limited pixels available to fit the curves. In this case, bitmap fonts may be more precise.
原则上,在字体中有两种描述符号的方法:点阵和向量。点阵字体通过直接在确定大小的二维网格上描画像素来描述符号形状,而向量字体用勾画直线和曲线的指令描述符号的轮廓。换句话说,点阵字体是为特定的大小而设计,而向量字体是为所有的规格设计。点阵字体符号在放大时质量一定会下降,而向量符号则不会。但是,在计算机屏幕这样的低分辨率设备上,由于用于拟合曲线的像素数量有限,小尺寸的向量符号渲染很差。这种情况下,点阵字体可能更为准确。
Nevertheless, the quality problem at low resolution has been addressed by font technology. For example:
但是,字体技术试图解决低分辨率下的质量问题,例如:
Hinting, additional guideline information stored in the fonts for rasterizers to fit the curves in a way that preserves the proper glyph shape.
Anti-aliasing, capability of the rasterizer to simulate unfitted pixels with some illusion to human perception, such as using grayscales and coloured-subpixels, resulting in the feeling of “smooth curves.”
Hinting:用字体中储存的附加指导信息进行点阵化,以便用保持正确符号形状的方式拟合曲线。
反锯齿:点阵化时用人眼错觉来模拟未拟合的像素的技术,例如用灰度和各种颜色的次像素让人产生“平滑曲线”的感觉。
These can improve the quality of vector fonts at small sizes. Moreover, the need for bitmap fonts in modern desktops is gradually diminishing.
这些技术可以提高小尺寸矢量字体的质量。而且,现代桌面系统对点阵字体的需求正在逐渐减少。
Font Formats
==字体格式==
Currently, the X Window system for GNU/Linux desktop supports many font formats.
目前,GNU/Linux 桌面使用的 X Window 系统支持多种字体格式。
BDF Fonts
BDF 字体
BDF (Bit-map Distribution Format) is a bitmap font format of the X Consortium for exchanging fonts in a form that is both human-readable and machine-readable. Its content is actually in plain text.
BDF(点阵发布格式,Bit-map Distribution Format)是 X 联合会的一种点阵字体格式,用于以人和机器都能读取的形式交换字体。其内容实际上是纯文本。
PCF Fonts
PCF 字体
PCF (Portable Compiled Format) is just the compiled form of the BDF format. It is binary and thus, only machine-readable. The utility that compiles BDF into PCF is bdftopcf. Although BDF fonts can be directly installed into the X Window system, they are usually compiled for better performance.
PCF(可移植编译格式,Portable Compiled Format)是 BDF 格式编译后的形式。它是二进制的,因此只有机器可读。将 BDF 格式编译成 PCF 格式的程序是 bdftopcf。虽然 BDF 字体可以直接安装在 X Window 系统中,为了提高性能一般还是要编译它们。
Type 1 Fonts
Type 1 字体
Type 1 is a vector font standard devised by Adobe and supported by its Postscript standard. So it is well supported under most UNIX and GNU/Linux, through the X Window system and Ghostscript. Therefore, it is the recommended format for traditional UNIX printing.
Type 1 是一种由 Adobe 发明的向量字体标准,并在其 Postscript 标准中提供支持。因此它在绝大多数 Unix 和 GNU/Linux 系统下都被通过 X Window 系统和 Ghostscript 很好地支持。因此,它是传统 Unix 打印输出的推荐格式。
TrueType Fonts
TrueType 字体
TrueType is a vector font standard developed by Apple, and is also used in Microsoft Windows. Its popularity has grown along with the growth of Windows. XFree86 also supports TrueType fonts with the help of the FreeType library. Ghostscript has also supported TrueType. Thus, it becomes another potential choice for fonts on GNU/Linux desktops.
TrueType 是由 Apple 开发的一种向量字体标准,也在微软 Windows 中使用。它随着 Windows 的成长而得到广泛使用。Ghostscript 也支持 TrueType。因此,它成为 GNU/Linux 桌面的又一种可能选择。
OpenType Fonts
OpenType 字体
Recently, Adobe and Microsoft have agreed to create a new font standard that covers both Type 1 and TrueType technologies with some enhancements to cover the requirements of different scripts in the world. The result is OpenType.
最近,Adobe 和微软同意开发一种包括 Type 1 和 TrueType 技术并按照世界上不同字体的需要增强的新字体标准。其产物就是 OpenType。
An OpenType font can describe glyph outlines with either Type 1 or TrueType splines. In addition, information for relative glyph positioning (namely, GPOS table) has been added for combining marks to base characters or to other marks, as well as some glyph substitution rules (namely, GSUB table), so that it is flexible enough to draw characters of various languages.
OpenType 字体可以用 Type 1 或者 TrueType 样条曲线描述符号轮廓。此外,还增加了符号相对位置的信息(例如 GPOS 表)以便在基础字符上增加标记或者组合不同的标记,以及一些符号替换规则(例如 GSUB 表),因此它具有足够的灵活性,可以描绘多种语言的字符。
Output Methods
=输出方法=
Output method is a procedure for drawing texts on output devices. It converts text strings into sequences of properly positioned glyphs of the given fonts. For the simple cases like English, the character-to- glyph mapping may be straightforward. But for other scripts the output methods are more complicated. Some could be with combining marks, some written in directions other than left-to-right, some with glyph variations of a single character, some requiring character reordering, and so on.
输出方法是在输出设备上描绘文本的过程。它将文本串转换成一系列正确放置的给定字体中的符号。对于像英文这样的简单情况,字符到符号的映射可能非常直观。但对于其他文字来说输出方法要更复杂。有些文字可能有复合的标记,有些可能不是按从左到右的顺序书写,有些对于一个字符有不同符号的变化形式,有些需要字符的重新排序,等等。
With traditional font technologies, the information for handling complex scripts is not stored in the fonts. So the output methods bear the burden. But with OpenType fonts, where all of the rules are stored, the output methods just need the capability to read and apply the rules.
在传统的字体技术中,处理复杂文字的信息并没有贮存在字体中。这个工作就由输出方法来承担。但对于贮存了所有规则信息的 OpenType 字体,输出方法只需要读取和应用规则的能力。
Output methods are defined at different implementations. For X Window, it is called X Output Method (XOM). For GTK+, it uses a separate module called Pango. For Qt, it implements the output method by some classes. Modern rendering engines are now capable of using OpenType fonts. So, there are two ways of drawing texts in output method implementations. If you are using TrueType or Type 1 fonts and your script has some complications over Latin-based languages, you need to provide an output method that knows how to process and typeset characters of your script. Otherwise, you may use OpenType fonts with OpenType tables that describe rules for glyph substitution and positioning.
输出方法在不同的实现中被定义。对于 X Window,它被称为 X 输出方法(X Output Method, XOM)。对于 GTK+,它使用一个称为 Pango 的单独模块。对于 Qt,则通过某些类来实现输出方法。现代渲染引擎能够使用 OpenType 字体。因此,在输出方法实现中有两种描绘文本的方法。如果使用 TrueType 或者 Type 1 字体,而且你的文字相对基于拉丁字母的语言有一些变化,你需要提供知道怎样处理和排版你的文字中字符的输出方法。否则,你需要使用带有描述符号替换和定位规则的表格的 OpenType 字体。
Input Methods
=输入方法=
There are many factors in the design and implementation of input methods. The more different the character set size and the input device capability are, the more complicated the input method becomes. For example, inputting English characters with a 104-key keyboard is straightforward (mostly one-to- one – that is, one key stroke produces one character), while inputting English with mobile phone keypad requires some more steps. For languages with huge character sets, such as CJK, character input is very complicated, even with PC keyboards.
输入方法的设计与实现中有许多考虑因素。字符集的大小和输入设备的能力之间差别越大,输入法就需要变得越复杂。例如,用104键键盘输入英文字符是非常简单的(基本上是一一对应――即,一次击键产生一个字符),而用手机键盘输入英文就需要更多的步骤。对于有大量字符的语言,例如中日韩文字,字符输入即使使用 PC 键盘也是非常复杂的。
Therefore, analysis and design are important stages of input method creation. The first step is to list all the characters (not glyphs) needed for input, including digits and punctuation marks. The next step is to decide whether it can be matched one-to-one with the available keys, or whether it needs some composing (like European accents) or conversion (like CJK Romanji input) mechanisms in which multiple key strokes are required to input some characters.
因此,在输入法的开发中分析和设计是重要的步骤。第一步是列出所有需要输入的字符(不是符号),包括数字和标点符号。下一步是决定现有的键位能否与这些字符一一对应,还是需要一些合成(例如欧洲变音)或转换(例如中日韩注音输入)机制,用多次击键输入一些字符。
When the input scheme is decided for the script, the keyboard layout may be designed. Good keyboard layout should help users by putting most frequently used characters in the home row, and the rest in the upper and lower rows. If the script has no concept of upper/lower cases (which is almost the case for non-Latin scripts), rare characters may be put in the shift positions.
决定文字使用的输入方式后,就可以设计键盘布局。好的键盘布局应当通过把最常使用的字符放在基准键位上,其他放在上方或下方键位上来方便用户。如果文字没有大小写的概念(对于非拉丁文字基本都如此),少见的字符可以放在上档键位上。
Then, there are two major steps to implement the input method. First, a map of the keyboard layout is created. This is usually an easy step, as there are existing keyboard maps to refer to. Then, if necessary, the second step is to write the input method based on the keyboard map. In general, this means writing an input method module to plug into the system framework.
之后,输入方法的实现有两个主要步骤。第一,建立键盘布局图。这一步通常比较简单,因为可以借鉴已有的键盘图。然后,如果有必要,第二步是基于键盘图编写输入方法。通常,这意味着编写一个输入方法模块插入到系统框架中。
Locales
=区域设置=
Locale is a term introduced by the concept of internationalization (I18N), in which generic frameworks are made so that the software can adjust its behaviour to the requirements of different native languages, cultural conventions and coded character sets, without modification or re-compilation.
区域设置是国际化(internationalization, I18N)概念引入的一个术语,国际化提供了一个框架,让软件能够根据当地语言、文化习惯和编码字符集的不同需要调整自身的行为,而不需要修改代码或重新编译。
Within such frameworks, locales are defined for describing particular cultures. Users can configure their systems to pick up their locales. The programs will load the corresponding predefined locale definition to accomplish internationalized functions. Therefore, to make internationalized software support a new language or culture, one must create a locale definition and fill up the required information, and things will work without having to touch the software code.
在这样一个框架中,定义了区域设置用来描述特定的文化习惯。用户可以通过配置系统来选择区域设置。程序可以载入相应的预先定义的区域设置来实现国际化功能。因此,要让国际化的软件支持一种新的语言或文化,必须建立一个区域定义并填入所需要的信息,这样软件就能工作而不需要改动代码。
According to POSIX (1), a number of C library functions, such as date and time formats, string collation, numerical and monetary formats, are locale-dependent. ISO/IEC 14652 has added more features to POSIX locale specifications and defined new categories for paper size, measurement unit, address and telephone formats, and personal names. GNU C library has implemented all of these categories. Thus, cultural conventions may be described through it.
根据 POSIX 标准(1),像日期和时间格式、字符串排序、数字和货币格式这样的一系列 C 程序库函数,都是依赖区域设置的。ISO/IEC 14652 为 POSIX 区域设置增加了更多的功能,并为纸张规格,计量单位,地址和电话格式,以及人名定义了新的类别。GNU C 库实现了所有这些分类。这样,就可以通过它来描述文化的不同。
Locale definitions are discussed in detail on pages 41-42.
区域设置的定义在附录 B 进行了详细的讨论。
Translation
=翻译=
Translating messages in programs, including menus, dialog boxes, button labels, error messages, and so on, ensures that local users, not familiar with English, can use the software. This task can be accomplished only after the input methods, output methods and fonts are done – or the translated messages will become useless.
翻译程序中的信息,包括菜单、对话框、按钮标签、错误信息等等,确保不熟悉英语的本地用户可以使用软件。这项任务只有在输入方法、输出方法和字体都完成后才有可能成功――否则翻译出来的信息是无用的。
There are many message translation frameworks available, but the general concepts are the same. Messages are extracted into a working file to be translated and compiled into a hash table. When the program executes, it loads the appropriate translation data as per locale. Then, messages are quickly looked up for the translation to be used in the user interface.
有许多的信息翻译框架可供使用,但一般概念都是相同的。信息被提取到工作文件中进行翻译,并被编译成一张哈希表。执行程序时,程序按照区域设置载入适当的翻译资料。然后翻译的信息被快速地查找出来并应用于用户界面。
Translation is a labour-intensive task. It takes time to translate a huge number of messages, which is why it is always done by a group of people. When forming a team, make sure that all members use consistent terminology in all parts of the programs. Therefore it is vital to work together in a forum through close discussion and to build the glossary database from the decisions made collectively. Sometimes the translator needs to run the program to see the context surrounding the message, in order to find a proper translation. At other times the translator needs to investigate the source code to locate conditional messages, such as error messages. Translating each message individually in a literal manner, without running the program, can often result in incomprehensible outputs.
翻译是一项费时费力的工作。翻译大量的信息需要时间,因此这项工作总是由一组人来进行。组建队伍时,要确保所有成员在软件的各个部分都使用一致的术语。因此在一个场所中通过大量的讨论来协同工作和通过集体决定建立词汇数据库至关重要。有时翻译者需要运行程序来查看信息的上下文,以便找到正确的翻译。有些时候翻译者需要查看源代码来定位需要条件的信息,例如出错信息。按照字面意思一条一条翻译信息,而不运行程序,常常会导致不合情理的结果。
Like other FOSS development activities, translation is a long-term commitment. New messages are usually introduced in every new version. Even though all messages have been completed in the current version, it is necessary to check for new messages before the next release. There is usually a string freeze period before a version is released, when no new strings are allowed in the code base, and an appropriate time period is allocated for the translators. Technical aspects of the message translation process are discussed on page 45.
像其他自由/开源软件开发活动一样,翻译是一项长期的任务。通常每个新版本都会加入新的信息。即使现有版本中所有信息都得到了翻译,在下一个版本发布时也有必要检查新信息。在新版本发布之前通常都有一个字符串冻结时期,这期间新字符串不允许加入代码库,并分配给翻译人员足够的时间。信息翻译过程的技术在附录 B 最后进行了讨论。
GNU/Linux Desktop Structure
=GNU/Linux 桌面构架=
Before planning to enable a language in GNU/Linux desktop, a clear understanding of the overview of its structure is required. GNU/Linux desktop is composed of layers of subsystems working on top of one another. Every layer has its own locale-dependent operations. Therefore, to enable a language completely, it is necessary to work in all layers. The layers, from the bottom up, are as follow (See Figure 1):
在准备为 GNU/Linux 桌面提供一种语言支持之前,需要对它的大致构架有一个清晰的了解。GNU/Linux 桌面是由在不同层次上协调工作的许多子系统构成的。每一层都有依赖于区域设置的操作。因此,要完全支持一种语言,需要在所有层次进行工作,这些层次从上到下包括(见图1):
1. The C Library. C is the programming language of the lowest level for developing GNU/Linux applications. Other languages rely on the C library to make calls to the operating system kernel.
2. The X Window. In most UNIX systems, the graphical environment is provided by the X Window system. It is a client-server system, where X clients make requests to X server and receive events from it across the network connection (or through a local inter-process communication channel) based on X protocol. A library called X Library (Xlib) encapsulates this protocol by a set of application programming interfaces (API), so that X clients can do everything in terms of function calls. Due to its liberal license terms, which allow even commercial redistributions, there have been several versions of X Window in the UNIX market. For GNU/Linux, XFree86 is the major code base, although new releases of some major distributions are now migrating to the newly forked X.org released in April 2004. All forks differ mainly in X server implementation and some extensions. But the X protocol and Xlib function calls are still standardized.
3. Toolkits. Writing programs using the low-level Xlib can be tedious as well as a source of inconsistent GUI when all applications draw menus and buttons by their own preferences. Some libraries are developed as a middle layer to help reduce both problems. In X terminology, these libraries are called toolkits. And the GUI components they provide, such as buttons, text entries, etc., are called widgets. Many historic toolkits have been developed in the past, either by the X Consortium itself like the X Toolkit and Athena widget set (Xaw), or by vendors like XView from Sun, Motif from Open Group, etc. In the FOSS realm, the toolkits most widely adopted are GTK+ (The GIMP Toolkit)(2) and Qt(3).
4. Desktop Environments. Toolkits help developers create a consistent look-and-feel among a set of programs. But to make a complete desktop, applications need to interoperate more closely to form a convenient workplace. The concept of desktop environment has been invented to provide common conventions, resource sharing and communication among applications. The first desktop environment ever created on UNIX platforms was CDE (Common Desktop Environment) by Open Group, based on its Motif toolkit. But it is proprietary. The first FOSS desktop environment for GNU/Linux is KDE (K Desktop Environment) (4), based on TrollTech’s Qt toolkit. However, due to some licensing conditions of Qt at that time, some developers didn’t like it. A second one was thus created, called GNOME (GNU Network Object Modelling Environment)(5), based on GTK+. Nowadays, although the licensing issue of Qt has been resolved, GNOME continues to grow and get more support from vendors and the community. KDE and GNOME have thus become the desktops most widely used on GNU/Linux and other FOSS operating systems such as FreeBSD.
C 函数库。C 语言是用于开发 GNU/Linux 程序的最低级语言。其他语言都依赖 C 函数库来调用操作系统内核。
X Window。在大多数 UNIX 系统中,图形环境都是由 X Window 系统提供的。它是一个客户-服务结构的系统,由 X 客户程序通过基于 X 协议的网络连接(或通过本地进程间通信渠道)向 X 服务器发出请求并接收事件。一个称为 X 库(Xlib)的函数库用一套应用程序编程接口(application programming interfaces, API)的形式封装了这套协议,这样 X 客户程序就可以调用所有的函数。因为 X Window 系统的自由授权方式甚至允许商业版本,在 UNIX 市场上有几个不同版本的 X Window 并存。对于 GNU/Linux 来说,XFree86 是主要的代码基础,但几个主要的发行版本在新版中都转向了在2004年4月分离出来的 X.org 分支。所有的分支版本主要是在 X 服务器的实现和一些扩展上有区别。X 协议和 Xlib 函数仍然是标准化的。
工具包。用低级的 Xlib 编写程序工作量巨大,而且当所有程序都按照自己的设置描绘菜单和按钮时也带来了不一致的 GUI。一些函数库被作为中间层开发出来以解决这两个问题。在 X 术语中,这些函数库被称为工具包。它们提供的按钮、文本框等 GUI 组件被称为 widgets。许多经典的工具包都是在过去开发的,例如 X 联合会自己的 X 工具包和 Athena widget 集(Xaw),或者 Sun 这样的提供商的 XView,Open Group 的 Motif,等等。在自由/开源软件世界,最广泛应用的工具包是 GTK+ (GIMP 工具包)(2) 和 Qt(3)。
桌面环境。工具包帮助开发者创作一套观感一致的程序。但要开发一套完整的桌面,应用程序需要更紧密地整合起来,形成一个方便的工作空间。人们因此提出桌面环境的概念来提供共同遵守的惯例,资源共享和程序间通讯的能力。UNIX 平台上第一个桌面环境是 Open Group 开发的 CDE (共用桌面环境,Common Desktop Environment),基于它的 Motif 工具包。但这个桌面环境是私有软件。第一个用于 GNU/Linux 的自由/开源软件桌面环境是 KDE (K 桌面环境,K Desktop Environment)(4),基于 TrollTech 的 Qt 工具包。但是,由于当时 Qt 的一些许可条件,一些开发者并不喜欢这个工具包。于是出现了又一个桌面环境,称为 GNOME (GNU 网络对象模型环境,GNU Network Object Modelling Environment)(5),基于 GTK+。现在,虽然 Qt 的授权问题已经解决,GNOME 仍在继续发展并得到了供应商和社区更多的支持。KDE 和 GNOME 因此成为 GNU/Linux 和其他自由/开源操作系统如 FreeBSD 上最广泛使用的桌面。
Each component is internationalized, allowing local implementation for different locales:
每一部分都是国际化的,使得不同区域设置下的本地化实现成为可能:
1. GNU C Library: Internationalized according to POSIX and ISO/IEC 14652.
2. XFree86 (and X Window in general): Internationalization in this layer includes X locale (XLC) describing font set and character code conversion; X Input Method (XIM) for text input process, in which X Keyboard Extension (XKB) is used in describing keyboard map; and X Output Method (XOM) for text rendering. For XOM, it was implemented too late, when both GTK+ and Qt had already handled the rendering by their own solutions. Therefore, it is questionable whether XOM is still needed.
3. GTK+: For GTK+ 2, internationalization frameworks have been defined in a modular way. It has its own input method framework called GTK+ IM, where input method modules can be dynamically plugged in as per user command. Text rendering in GTK+ 2 is handled by a separate general-purpose text layout engine called Pango. Pango can be used for any application that needs to render multilingual texts and not just for GTK.
4. Qt: Internationalization in Qt 3 is done in a minimal way. It relies solely on XIM for all text inputs, and handles text rendering with QComplexText C++ class, which relies completely on Unicode data for character properties from Unicode.org.
For the desktop environment layer, namely, GNOME and KDE, there is no additional internationalization apart from what is provided by GTK+ and Qt.
GNU C 函数库:根据 POSIX 和 ISO/IEC 14652 标准进行国际化。
XFree86(以及一般的 X 窗口系统):这一层的国际化包括描述字符集和字符编码转换的 X 区域设置(XLC);用于文本输入处理的 X 输入方法(X Input Method, XIM),其中用 X 键盘扩展(X Keyboard Extension, XKB)来描述键盘映射;以及用于文本渲染的 X 输出方法(X Output Method, XOM)。对于 XOM, 由于其出现太晚,GTK+ 和 Qt 都用自己的解决方案来处理渲染。因此,是否还需要 XOM 成了一个问题。
GTK+:对于 GTK+ 2,国际化框架是以一种模块化的方式定义的。它有自己的输入法框架,称为 GTK+ IM,可以按照不同用户的命令动态插入输入法模块。GTK+ 2中的文本渲染是由一个称为 Pango 的单独的通用文本布局引擎来处理。Pango 可以被任何需要渲染多国语言文字的程序使用,不仅仅在 GTK 下工作。
Qt:Qt 3中的国际化是以一种简约的方式进行的。它依赖 XIM 完成所有的文本输入,然后用 QComplexText 这个 C++ 类处理文字渲染,完全依赖 Unicode.org 提供的 Unicode 字符属性数据。
在桌面环境这一层,如 GNOME 和 KDE,除了 GTK+ 和 Qt 提供的国际化框架外没有其他的国际化技术。
——————————————————————————–
1 POSIX is the acronym for Portable Operating System specification
1 POSIX 是可移植操作系统指标(Portable Operating System specification)的缩写
2 GTK+,`GTK+ – The GIMP Toolkit’; available from www.gtk.org.
3 TrollTech,`TrollTech – The Creator of Qt – The multi-platform C++ GUI/API’; available from www.trolltech.com.
4 KDE, `KDE Homepage – Conquer your Desktop!’; available from www.kde.org.
5 GNOME, `GNOME: The Free Software Desktop Project’; available from www.gnome.org.
(文章来源:洛基开放文化实验室)