Про компанію Послуги Портфоліо Підтримка Відгуки клієнтів Контакнта інформація ТОВ Брутка: розробка програмного забеспечення та створення сайтів

Архів новин


Worldly Windows: Extend The Global Reach Of Your Applications With Unicode 5.0

Most major software vendors provide a number of localized or translated products, as well as products that are capable of handling non-Latin writing systems. But there's still a long way to go. Ultimately, the goal of software internationalization is to allow the greatest number of people to communicate with each other, independent of language, writing system, or location. Simply put, people want to communicate with others, on their terms, in their own languages.
As members of the Unicode Consortium, Microsoft and other companies help the industry move closer to that goal of comprehensive language and regional support. In this article we will discuss the role Microsoft plays in this particular standards effort and, more importantly, describe the new features of Unicode 5.0 and how they are implemented in Windows Vista™.
Why Unicode Matters
Microsoft has actively participated in the Unicode Consortium for a number of reasons. First of all, consistency, stability, and interoperability of data, independent of writing system, location, and platform, are some of the key tenets of global software development. The industry needs internationalization standards, especially for encoding, that work consistently regardless of the platform they run on. These standards must also be representative of the key linguistic and cultural stakeholders they support. The Unicode Consortium develops standards with ongoing input from key stakeholders: industry, government bodies, language authorities, and related standards bodies. The existence of a single worldwide system to store, transmit, and manipulate the world's data streamlines and considerably accelerates the development and deployment of international software.
The benefits Unicode delivered to Microsoft were made clear when adoption of a single-source codebase supporting Unicode reduced the development time. For Windows® 2000, it took months to prepare the localized versions after shipping the English language product. This decreased to weeks in Windows XP.
The Unicode Consortium is also a crucial player in ensuring that the languages and cultures of the world are capable of being represented in technology, most notably through their cooperation with the Script Encoding Initiative (see unicode.org/pending/about-sei.html). This initiative is supported by Microsoft and other software vendors via their various emerging market programs. The result from Microsoft has been the Language Interface Packs (LIPs) and Enabling Language Kits (ELKs) programs, developer tools such as the Microsoft® Locale Builder and the Microsoft Keyboard Layout Creator. Also see the sidebar "ELKs and LIPs," as well as Figures 1 and 2, for a glimpse of Unicode in action.
What's New in Unicode 5.0
Like earlier major versions, the Unicode Standard, Version 5.0, has been released as a book (Addison-Wesley, 2006) with associated data files. The book has been updated substantially and contains pages of new and clarified text, new and improved tables and illustrations, and more. For the first time, the Unicode Standard Annexes, which contain information about topics such as the Bidirectional Algorithm, identifiers, line-breaking properties, and text boundaries, are printed in the book. The annexes have been carefully edited and rewritten to provide improved guidance to developers and implementers.
The conformance clauses and definitions in the book have been renumbered in Version 5.0, eliminating the confusion of numbers such as C12b and D30e used in Version 4.0. (Appendix D includes a handy table showing the numbers used in earlier versions.)
Unicode 5.0 has more explicit guidelines for rendering Indic scripts. For CJK characters, the book now contains the IICore radical-stroke index, focusing on the subset of roughly 10,000 CJK characters most important for the East Asian markets. This new index makes it much easier to look up important characters in the standard; the full CJK radical-stroke index is available online for looking up rare characters. Character properties have been extended and improved.
Information about security considerations and reliability, topics of great importance to implementers, has been vastly extended. Version 5.0 is synchronized with coordinated updates to other important standards, including the Unicode Collation Algorithm, the Common Locale Data Repository, and the International Standard ISO/IEC 10646. (See the sidebar "Phishing and Unicode Security.")
Version 5.0 adds new characters for Cyrillic, Greek, Hebrew, Kannada, Latin, mathematics, phonetic extensions, and symbols. It also adds several minority and historical scripts, such as Tifinagh and Phoenician. The Tifinagh script is used by more than 20 million people in Morocco and elsewhere who speak varieties of languages commonly called Berber or Amazigh. (Incidentally, the word "Tifinagh" means the Phoenician letters.) Phoenician is the script that sailed the ancient Mediterranean and the forerunner to Arabic, Greek, Hebrew, and other scripts in modern use. The book Unicode Standard, Version 5.0 is a treasure trove of intriguing details about scripts, as well as useful implementation information. It is an indispensable reference. Figures 3 and 4 show a few more character sets.
Why the Change?
In prior versions of Windows, Microsoft products lagged a few versions behind Unicode; for example, data that came from a number of Unicode versions (anywhere from Unicode 1.1 to 3.1) would ship in Windows XP, so that whichever feature was leveraging the data, whether it was collation, character properties, line breaking, the bidirectional algorithm, or UTF-8 conversion, the necessary Unicode version was available. This mixed approach, while expedient for a shortened development cycle, led to various problems, including:
Inadequate algorithmic support for operations such as UTF-8 conversions
Insufficient coverage for languages for which there later turned out to be a business need to support
Lack of good coverage of security-sensitive issues clarified in later versions
Such problems became more transparent when Microsoft started shipping the ELKs in order to support LIPs beginning with Windows XP Service Pack 2 (SP2). Several of the languages on the list of potential ELKs and LIPs had to be removed from consideration because core support for Unicode properties, casing, and collation did not exist for languages such as Mongolian and Yi. Furthermore, Microsoft learned the hard way that trying to limit the supported ranges of Unicode to those considered strategic business cases for the company often proved ineffective, since yesterday's weak business cases often become tomorrow's strategic interests.
Besides, updating to the most recent version of the Unicode standard comprehensively across Windows and the Microsoft .NET Framework will help fill some gaps in terms of expected international support in Microsoft products, in the standard itself, and with related linguistic community concerns.
The customer requirement for Unicode 5.0 support on Windows Vista spanned the core services that ISVs could not extend themselves, such as casing, property, and character type values; default collation weights; and normalization support. (We did not try to ship font and rendering support of every character in Unicode 5.0, as that would be an incredibly time-consuming and difficult task, especially given the difference in targets between Unicode and Windows. The goal was to support languages as well as they could be supported with regard to the aforementioned core operations, while leaving other extensible details like typography and keyboard support to font authors and tools like the Microsoft Keyboard Layout Creator.)
Given that version support in Windows and the .NET Framework lay somewhere between Unicode 1.1 and Unicode 3.1, describing what was updated in Unicode 5.0 for Microsoft is significantly complex. The support of new languages and scripts is perhaps the easiest part for people to understand in terms of updates to the standard. However, there are many more additional must-have features in the new version of the standard that customers request in terms of Unicode support. Specifically on Windows, the following Unicode-related data has been updated and these character additions have been included:
Enhanced security features in Unicode.
Updates for normalization and international domain names.
Latest version support for simple casing and other common operations.
Support in collation beyond the simple code point ordering for Extension B that was added to Windows XP, needed for both Japanese/JIS X 213 and Hong Kong /HKSCS, due to many new characters. This particular item is not a feature of Unicode as much as a feature of collation efforts by Microsoft; it now follows the example of the Unicode Collation Algorithm and adds the characters in the latest version of Unicode to its default table.
Support for mathematics in Unicode.
Stronger model for scripts in South Asia and Southeast Asia, improved due to implementers' input in recent years.
A great deal of work on the conformance model for the standard that makes Unicode as a whole more stable and consistent due to better validation of the data and properties provided by the standard.
Because extensibility of language and culture support, as well as consistent Unicode implementation, were important customer requirements for Windows Vista, it was crucial to develop an internationalization standards strategy and implementation plan that provided integration of the most recent version of the Unicode Standard across the platform.
While Microsoft is one of the early adopters of Unicode 5.0, other industry players are also already using this release or plan to upgrade in the near future. Adoption of this latest version of the standard benefits a wide spectrum of people, including developers, ISVs, and end users. All of the new Unicode features beneficial to Microsoft products apply just as significantly to others in the industry. Some obvious technical benefits include more software that leverages compatible UTF-8 conversions and more browsers that add anti-phishing capabilities.

Conclusion
For Microsoft, the business case for upgrading to Unicode 5.0 was clear and compelling. Other organizations interested in supporting the Unicode effort should consider the many benefits of joining the Unicode Consortium and being actively involved in important projects. Upgrading to Unicode 5.0 provides an even more comprehensive set of characters for languages, but perhaps even more significantly, it makes for a safer and more functional Internet for everyone around the world, including those who never venture far beyond the world of ASCII.

2007-03-27

 

Архів новин: новини IT, опис технологій, ціни