Ticket #321 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

Windows: SpeechSubmission app for German - umlauts not displaying properly

Reported by: kmaclean Owned by: kmaclean
Priority: major Milestone: SpeechSubmission 0.1.8
Component: SpeechSubmission Version: SpeechSubmission0.1.2
Keywords: Cc:

Description

message from Robin:

By the way when I look at the German read page there are a lot of strange symbols in sentences. This happens when there should be letters like ä ü etc. I don't know what causes this, but I am working on a fairly standard Windows installation, so I am probably not alone.

Actually Dutch uses letters like that as well. Though not as often as German and I have avoided them until now in the prompts, because in our dictionary there are not so many of them (might be because of similar reasons).

Change History

comment:1 Changed 12 years ago by kmaclean

  • Summary changed from German read page there are a lot of Windows: German read page has strange symbols in sentences to German read page there are a lot of Windows: SpeechSubmission app for German umlauts not displaying properly

This does not occur in Linux - umlaut display correctly in Fedora Core 6

Recorded a test and the prompts that are included in the zip file display properly. So it must be in the JLabel code used to display the prompt text on the screen:

// 		############ Prompt1 ####################################        
        JPanel prompt1Panel = new JPanel(); 
        prompt1Panel.setLayout(new FlowLayout(FlowLayout.RIGHT)); 
        JPanel prompt1InnerPanel = new JPanel();
        prompt1InnerPanel.setBorder(BorderFactory.createLineBorder (voxforgeColour, 1));
        prompt1InnerPanel.add(new JLabel(this.prompt1));
        prompt1Panel.add(prompt1InnerPanel);
        play1 = addButton(playButton, prompt1Panel, false);
        play1.setSize(10,10);
        capt1 = addButton(recordButton, prompt1Panel, true);
        prompts.add(prompt1Panel);

comment:3 Changed 12 years ago by kmaclean

workaround might be to hard code the prompts in the code rather than read it from a file.

comment:4 Changed 12 years ago by kmaclean

from jGuru:

The default encoding used by locale/encoding sensitive API in the Java libraries is determined by the System property "file.encoding". This system property is initialized by the JVM startup code after querying the underlying native operating system. For example on my English USA NT box it is initialized to:

Cp1252

It is generally recommended that you do not modify it. However if you know what you are doing you could override the system property either on the command line using the -

java -Dfile.encoding=...

syntax or programmatically at startup.

Here is the reference URL for supported encodings -

You may also find Sun's online tutorial helpful:

comment:5 Changed 12 years ago by kmaclean

  • Summary changed from German read page there are a lot of Windows: SpeechSubmission app for German umlauts not displaying properly to Windows: SpeechSubmission app for German umlauts not displaying properly

comment:6 Changed 12 years ago by kmaclean

  • Summary changed from Windows: SpeechSubmission app for German umlauts not displaying properly to Windows: SpeechSubmission app for German - umlauts not displaying properly

comment:7 Changed 12 years ago by kmaclean

[Attesoro http://attesoro.org/] - A free, open source, translation editor for Java programs.

Java programs that support internationalization (i18n) usually use resource bundles and keep their translatable Strings in properties files. (See Sun's Java Tutorial on i18n.)

comment:8 Changed 12 years ago by kmaclean

see post from Ralf:

But there is one small thing: on my computer, the German "Sonderzeichen" (ä, ö, ü, ß) aren't displayed correctly. I am using Windows XP professional (English language). It would be better if those graphemes would be displayed correctly by the German speech submission application.

comment:9 Changed 12 years ago by kmaclean

When processing files from SpeechSubmission app in Perl, need to convert contents of README file to "utf-8", but no change required for Prompts or License files.

Code snippet from WebGUIForum.pm:

sub getSubmissionContents {

[...]

	#!!!!!!
	#$content = join(" ",@readme);	
	my $tempContent = join(" ",@readme); # for some reason, README has different characert encoding than Prompts and License???
	$content = encode("utf8", $tempContent);
	# !!!!!!

so it seems like the label text within the SpeechSubmission app are encoded to some Java default, whereas the prompt and readme are in utf-8?????

comment:10 Changed 12 years ago by kmaclean

comment:11 Changed 12 years ago by kmaclean

Sun I18n page talks about properties text file:

Properties Files

    A properties file stores information about the characteristics of a program or environment. A properties file is in plain-text format. You can create the file with just about any text editor.

    In the example the properties files store the translatable text of the messages to be displayed. Before the program was internationalized, the English version of this text was hardcoded in the System.out.println statements. The default properties file, which is called MessagesBundle.properties, contains the following lines:

        greetings = Hello
        farewell = Goodbye
        inquiry = How are you?

    Now that the messages are in a properties file, they can be translated into various languages. No changes to the source code are required. The French translator has created a properties file called MessagesBundle_fr_FR.properties, which contains these lines:

        greetings = Bonjour.
        farewell = Au revoir.
        inquiry = Comment allez-vous?

comment:12 Changed 12 years ago by kmaclean

comment:13 Changed 12 years ago by kmaclean

in prompts.java, can specify character set on the InputStreamReader?:

    /**
     * Creates an InputStreamReader that uses the given charset. </p>
     *
     * @param  in       An InputStream
     * @param  cs       A charset
     *
     * @since 1.4
     * @spec JSR-51
     */
    public InputStreamReader(InputStream in, Charset cs) {
        super(in);
	if (cs == null)
	    throw new NullPointerException("charset");
	sd = StreamDecoder.forInputStreamReader(in, this, cs);
    }

comment:14 Changed 12 years ago by kmaclean

In creating the README file in SpeechSubmissionApp?:

/**
 * Convenience class for writing character files.  The constructors of this
 * class assume that the default character encoding and the default byte-buffer
 * size are acceptable.  To specify these values yourself, construct an
 * OutputStreamWriter on a FileOutputStream.
 *
[...]

public class FileWriter extends OutputStreamWriter {

    /**
     * Constructs a FileWriter object given a file name.
     *
     * @param fileName  String The system-dependent filename.
     * @throws IOException  if the named file exists but is a directory rather
     *                  than a regular file, does not exist but cannot be
     *                  created, or cannot be opened for any other reason
     */
    public FileWriter(String fileName) throws IOException {
	super(new FileOutputStream(fileName));
    }

comment:15 Changed 12 years ago by kmaclean

  • Status changed from new to closed
  • Resolution set to fixed

Prompts files are now opened using UTF-8:

private String [] getPromptTextFile(String File, int numberOfPrompts) {
    String [] words= new String [numberOfPrompts];
	try {
	    InputStream is = getClass().getResourceAsStream(File); 
	    // !!!!!!
	    //InputStreamReader isr = new InputStreamReader(is);
	    InputStreamReader isr = new InputStreamReader(is,"UTF-8" );
		System.err.println(System.getProperty("line.separator") + "PromptList Character Encoding:" + isr.getEncoding());  // doesn't work ????
	    // !!!!!!

All text files create created by the speech submission app, are created with UTF-8 encoding:

			//############ ReadMe file#################################### 
			try {
				// !!!!!!
				// BufferedWriter out_readme = new BufferedWriter(new FileWriter(readmeFile));
				BufferedWriter out_readme = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(readmeFile),"UTF-8"));
				// !!!!!!
			//############ License Notice File ####################################    
			try {
				Calendar cal = Calendar.getInstance();
				//int year = cal.get(Calendar.YEAR);
				// !!!!!!
				//BufferedWriter out_licenseNoticeFile = new BufferedWriter(new FileWriter(licenseNoticeFile));
				BufferedWriter out_licenseNoticeFile = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(licenseNoticeFile),"UTF-8"));
				// !!!!!!
			//############ license file ####################################    
			try {
				// !!!!!!
				//BufferedWriter out_licenseFile = new BufferedWriter(new FileWriter(licenseFile));
				BufferedWriter out_licenseFile = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(licenseFile),"UTF-8"));
				// !!!!!!

This prevents Java from trying to encode any files it opens or writes from trying to encode them to the user default encoding value.

Note: See TracTickets for help on using tickets.