Centos 4.8: Perl CGI-based applications and UTF-8

Issues related to software problems
Zoro
Posts: 29
Joined: 2007/12/15 11:05:30
Location: USA

Centos 4.8: Perl CGI-based applications and UTF-8

Postby Zoro » 2010/03/27 07:13:15

Dear friends,

We have a new Centos 4.8 server and have moved all our data and applications, including the legacy but top-notch Perl applications which we use daily.

The previous server was based on charset=ISO-8859-1 and displayed the English text correctly. However, our needs today requires us to change the server character set to UTF-8 to enable our applications to handle several languages on any posted web pages or forum html messages since our communities are located in many countries with different languages. Here is an example of the benefits of UTF-8... http://www.columbia.edu/kermit/utf8.html

We have added the following two lines within the configuration file: /etc/httpd/conf/vhosts/site1

Code: Select all

SetEnv LANG en_US.UTF-8
AddDefaultCharset utf-8


...and all web pages have these lines within them:

Code: Select all

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Title</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8">


PROBLEM: There are characters on web pages that do not appear properly when using the CGI applications to generate the pages. For example, a Perl CGI script that is a web form with fields Title, Subject, Message and once submitted makes a web html page, does not display some of the characters correctly, like quotes “, ”, or a long dash — . Additionally, foreign languages do not display correctly either. For reference purposes, these CGI scripts use text files and web html pages to write information into and not MySQL. MySQL is in use for another database project.

After researching various websites for a solution in getting Perl CGI programs to work correctly with UTF-8, we are still looking for a solution. Perhaps we need to add additional Perl statements but are unsure of which are effective and efficient. This link was helpful but did not resolve our issues,
http://sites.google.com/site/kbinstuff/ ... ,mod_perla

I appreciate any help from the community.

Zoro

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Centos 4.8: Perl CGI-based applications and UTF-8

Postby pschaff » 2010/03/27 13:27:09

OT for your question, which I can't answer, but is there a reason you are using the old CentOS 4.x for a new server install rather than the much newer, more widely used, and better supported CentOS 5? Have you tested your legacy apps on 5.x?

Zoro
Posts: 29
Joined: 2007/12/15 11:05:30
Location: USA

Re: Centos 4.8: Perl CGI-based applications and UTF-8

Postby Zoro » 2010/03/27 19:33:56

Thank you for your message and question. The server migration has been a long journey, please refer to this message posted in these forums: Cobalt Migration Utility (CMU) on CentOS 5.1

In summary, these were our reasons for the selection of CentOS 4.x:

· The CentOS-BlueQuartz-Nuonce 4.8 CD correctly partitioned and formatted hard drives and installed the operating system with the correct paths, matching the previous Cobalt RaQ3i configurations, ideal for legacy application concerns
· Legacy Perl CGI applications work perfectly (except for UTF-8, hence the reason for this message) and are very cozy within CentOS 4.x... we love Perl
· RAID-1 software raid works superbly, for example: turn off the server, swap in a new hard drive to replace the failed drive or working drive for backup purposes, start the server and the new drive will be formatted and prepared for data in a RAID-1 configuration, automatically and in a short period of time while the server website is online
· Our volunteers were familiar with the BlueQuartz administrative features, however, BlueQuartz has not been ported to the CentOS 5.x
· Lastly, I am from the old school and believe in using heavily tested, stable environments without the bells and whistles to host important information.

I hope this is helpful for your request and look forward to resolving the last remaining issue: Perl CGI and UTF-8 working together.

Zoro
Posts: 29
Joined: 2007/12/15 11:05:30
Location: USA

Re: Centos 4.8: Perl CGI-based applications and UTF-8

Postby Zoro » 2010/04/08 00:24:45

Dear friends,

Although the problem of getting Perl CGI and UTF-8 to work together still persists within our CGI applications, I would like to add further notes to this message in hopes of helping others who will encounter these problems in the future. In addition to the initial message, I have added these codes into the web pages and CGI applications. I hope to find a formula which is able to display most of the world's languages on the web pages of systems running CentOS with UTF-8 as the server's default character set. Any help will be highly appreciated. Thank you.

Web Forms:

Code: Select all

...
   <FORM ACTION="$selfurl" METHOD="post" acceptcharset="utf-8" accept-charset="utf-8">
...
[i](yes, both accept statements for browser compatibility)[/i]


In CGI Programs:

Code: Select all

...
#$cgi->charset( "utf-8" );
...
use utf8;   # to state that the script itself is in utf8
...
use as_utf8;  # which is the code posted at [url=http://www.perlmonks.org/?node_id=651574]PerlMonks[/url]
...
use Encode;
...
require Encode;
require CGI;
...
binmode STDIN, ":encoding(utf8)";  # will interfere with file uploads
binmode STDOUT, ":encoding(utf8)";
...
print("content-type: text/html; charset=utf-8\n\n");
...


Helpful Reference Links:

Perl Programming/Unicode UTF-8
Unicode-processing issues in Perl and how to cope with it
perl, UNICODE/utf8, CGI.pm, apache, mod_perl and MySQL
UTF8, MySQL, Perl and PHP
PerlDoc UTF8
PerlUniFaq
W3C - Character Model for the World Wide Web 1.0: Fundamentals

Zoro
Posts: 29
Joined: 2007/12/15 11:05:30
Location: USA

Re: Centos 4.8: Perl CGI-based applications and UTF-8

Postby Zoro » 2011/09/14 04:21:30

Dear friends,

After much testing, these are the recommendations to get Perl CGI and UTF-8 to work together properly.
I hope these additional notes help others during their important migration to utf-8.


On the server configuration file...

Code: Select all

AddDefaultCharset Off
AddCharset utf-8 .html .shtml


On the web page...

Code: Select all

<meta http-equiv="content-type" content="text/html; charset=utf-8">


On the web form...

Code: Select all

...
<FORM ACTION="$selfurl" METHOD="post" acceptcharset="utf-8" accept-charset="utf-8">
...
(yes, both accept statements for browser compatibility)


In CGI programs...

Code: Select all

...
print("content-type: text/html; charset=utf-8\n\n");
...
use utf8; # to state that the script itself is in utf8
...
use Encode;
...
require Encode;
require CGI;
...
binmode STDIN, ":encoding(utf8)"; # will interfere with file uploads
binmode STDOUT, ":encoding(utf8)";
...


Helpful Reference Links:

Perl Programming/Unicode UTF-8
PerlDoc UTF8
UTF8, MySQL, Perl and PHP
perl, UNICODE/utf8, CGI.pm, apache, mod_perl and MySQL