JCharset - Java Charset package
Download
Download the latest release: (What's new?)JCharset 1.3 (includes source code) (65K)
What is the Java Charset package?
The Java Charset package is an open-source implementation of character sets that were missing from the standard Java platform.How do I use the Java Charset package?
The Java Charset package is written in pure Java, and thus requires no special installation. Just add the "jcharset.jar" file to your classpath, or place it in any of the usual extension directories.The JVM will recognize the supported character sets automatically, and they will be available anywhere character sets are used in the Java platform.
As an example, you can take a look at java.lang.String's constructor and getBytes() method, both of which have an overloaded version that receives a charset name as an argument.
Note: Some web/mail containers run each application in it's own JVM context. In this case check the container documentation for information on where/how to configure the classpath, such as in WEB-INF/lib, shared/lib, jre/lib/ext, etc. You may need to restart the server for changes to take effect. However, if you use Sun's JRE, it will work only if you put it in the jre/lib/ext extension directory, or in the container's classpath. This is due to a bug in Sun's JRE implementation (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4619777). Voting for the bug will hasten their fixing it, so please do...
Which charsets are supported?
-
"UTF-7" (a.k.a. "UTF7", "UNICODE-1-1-UTF-7", "csUnicode11UTF7", "UNICODE-2-0-UTF-7")
The 7-bit Unicode character encoding defined in RFC 2152. The O-set characters are encoded as a shift sequence. Both O-set flavors (direct and shifted) are decoded. -
"UTF-7-OPTIONAL" (a.k.a. "UTF-7O", "UTF7O", "UTF-7-O")
The 7-bit Unicode character encoding defined in RFC 2152. The O-set characters are directly encoded. Both O-set flavors (direct and shifted) are decoded. -
"SCGSM" (a.k.a. "GSM-default-alphabet", "GSM_0338", "GSM_DEFAULT", "GSM7", "GSM-7BIT")
The GSM default charset as specified in GSM 03.38, used in SMPP for encoding SMS text messages.
Additional flavors of the GSM charset are "CCGSM", "SCPGSM" and "CCPGSM":
The CC prefix signifies mapping the Latin capital letter C with cedilla character, the SC prefix signifies mapping the Latin small letter c with cedilla character, and the P prefix signifies the packed form (8 characters packed in 7 bytes), as specified by the spec. See javadocs for details. -
"hp-roman8" (a.k.a. "roman8", "r8", "csHPRoman8", "X-roman8")
The HP Roman-8 charset, as provided in RFC 1345. -
"ISO-8859-8-BIDI" (a.k.a. "csISO88598I", "ISO-8859-8-I", "ISO_8859-8-I",
"csISO88598E", "ISO-8859-8-E", "ISO_8859-8-E")
The ISO 8859-8 charset implementation exists in the standard JRE. However, it is lacking the i/e aliases, which specify whether bidirectionality is implicit or explicit. The charsets conversions themselves are similar. This charset complements the standard one. -
"ISO-8859-6-BIDI" (a.k.a. "csISO88596I", "ISO-8859-6-I", "ISO_8859-6-I",
"csISO88596E", "ISO-8859-6-E", "ISO_8859-6-E")
The ISO 8859-6 charset implementation exists in the standard JRE. However, it is lacking the i/e aliases, which specify whether bidirectionality is implicit or explicit. The charsets conversions themselves are similar. This charset complements the standard one. -
"KOI8-U" (a.k.a. "KOI8-RU")
The KOI8-U Ukrainian charset, as defined in RFC 2319.
What's New?
In version 1.3:
- Added X-roman8 as an hp-roman8 alias.
- Added the generic EscapedByteLookupCharset to simplify implementation of single-escape-byte charsets.
- Created two flavors of the GSM charset: CCGSMCharset (mapping the Latin capital letter C with cedilla) and SCGSMCharset (mapping the Latin small letter c with cedilla). See javadocs for details.
- Added support for Packed GSM charset, with the two flavors as well.
- Renamed the canonical charset name for the new GSM family, to make the flavor choices explicit.
In version 1.2.1:
- Fixed a combined JavaMail-JCharset bug that could cause an infinite loop on some inputs.
- Updated the ISO-8859-8-i/e mapping for the MACRON character. The incorrect mapping in the JDK's implementation of ISO-8859-8 is fixed as of JDK 1.5 (see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4760496). We now determine the running JDK version, and if it's JDK 1.5 or higher we use the correct mapping. This way we remain consistent with the running JDK ISO-8859-8 charset implementation.
In version 1.2:
- Added KOI8-U charset.
In version 1.1:
- Added ByteLookupCharset class to simplify implementation of single byte charsets.
- Added GSM-default-alphabet charset (used in SMPP).
- Added hp-roman8 charset.
- Added ISO-8859-8-i/e charset.
- Added ISO-8859-6-i/e charset.
In version 1.0:
- This is the first release of the Java Charset package.
License
The JCharset Package is provided under the GNU General Public License agreement.For non-GPL commercial licensing please contact the author.
Donate
Please help support this project by making a donation. These donations are not meant to make the author rich, but to try and offset the costs of creating and maintaining the project. Any amount will help!
Contact
you can contact the author via e-mail at:Please write in to report bugs, problems, suggestions, ideas, questions, answers, source code queries and especially just to let me know you've found the JCharset Package useful. Getting feedback will encourage me to continue development and add some advanced features I have in mind...
For updates and additional information, you can always visit the website at: