yidabu 2007-5-26 17:19
utf-8 gb2312编码转换工具D语言版
utf-8 gb2312编码转换工具D语言版知识若不分享 实在没有意义 http://www.d-programming-language-china.org 20070520点击下面网址查看原文:http://www.d-programming-language-china.orgtag:utf-8转换工具,utf-8 utf-16,gb2312转utf-8utf-8转gb2312和gb2312转utf-8 D语言源程序是用utf-8编码保存的,而我们要在D程序时处理的文本可能是GB2312或其他编码,这里就需要在处理文本时先把编码转换为utf-8,写文件时又转换为原来的编码。 在std.windows.charset里有两个函数可以作为转换gb2312 utf8编码转换参考:fromMBSz,toMBSz。但这两个函数主要是为调用windows api服务的,所以我们改成: [Copy to clipboard] [ - ]CODE: /****************************************** * Converts the UTF-8 string s into a MB string in a Windows * 8-bit character set. * * Params: * s = UTF-8 string to convert. * codePage = is the number of the target codepage, or * 0 - ANSI, * 1 - OEM, * 2 - Mac * * Authors: * yaneurao, Walter Bright, Stewart Gordon * D语言论坛 http://www.d-programming-language-china.org */ char[] toMBS(char[] s, uint codePage = 0) { // Only need to do this if any chars have the high bit set foreach (char c; s) { if (c >= 0x80) { char[] result; int readLen; wchar* ws = std.utf.toUTF16z(s); result.length = WideCharToMultiByte(codePage, 0, ws, -1, null, 0, null, null); if (result.length) { readLen = WideCharToMultiByte(codePage, 0, ws, -1, result.ptr, result.length, null, null); } if (!readLen || readLen != result.length) { throw new Exception("Couldn't convert string: " ~ sysErrorString(GetLastError())); } return result[0..$-1];//去掉尾随的\0,只有这句和下句和toMBSz不同 } } return s; } /********************************************** * Converts the MB string s from a Windows 8-bit character set * into a UTF-8 char array. * * Params: * s = UTF-8 string to convert. * codePage = is the number of the source codepage, or * 0 - ANSI, * 1 - OEM, * 2 - Mac * Authors: Stewart Gordon, Walter Bright * D语言论坛 http://www.d-programming-language-china.org */ char[] fromMBS(char* s, int codePage = 0) { char* c; for (c = s; *c != 0; c++) { if (*c >= 0x80) { wchar[] result; int readLen; result.length = MultiByteToWideChar(codePage, 0, s, -1, null, 0); if (result.length) { readLen = MultiByteToWideChar(codePage, 0, s, -1, result.ptr, result.length); } if (!readLen || readLen != result.length) { throw new Exception("Couldn't convert string: " ~ sysErrorString(GetLastError())); } return std.utf.toUTF8(result[0 .. result.length-1]); // omit trailing null } } return s[0 .. c-s+1]; // string is ASCII, no conversion necessary 只有这句和fromMBSz不同,也就是多了+1 } 使用举例: [Copy to clipboard] [ - ]CODE: // GB2312 转UTF-8编码 char[] srcContent = fromMBS( ( (cast(char[])std.file.read(srcPath)) ).ptr, 936 ); // UTF-8转GB2312编码 void[] v = toMBS( temContent,936 ); 至于utf-8,utf-16,utf-32,unicode之间互相转换,D标准库已经有相应函数可用,d-programming-language-china.org 不再举例。( 本文出处: http://www.d-programming-language-china.org )得到特定字符的utf-8编码 下面函数可以得到特定字符的utf-8编码: [Copy to clipboard] [ - ]CODE: char[] getPattern(char[] s) //by qiezi { char[] result; foreach(c; s) result ~= std.string.format("\\x%x", c); return result; } 参考: The Basics of UTF-8 http://www.codeguru.com/Cpp/misc/misc/multi-lingualsupport/article.php/c10451/( lastupdate:20070526 最新文章请访问http://www.d-programming-language-china.org )关于一大步成功社区:yidabu提倡在交流中学习,在分享中提高收集感兴趣的知识,写下心得,通过网络与别人一起分享理解一点就实践一步,收获什么就分享什么,成功就是这样一点点一步步累积起来的网络只是一个工具,只有自己身心提高才是实实在在的。d-programming-language-china.org为大家提供一个学习交流各种知识的平台