Subject: |
catools (caget/caput/camonitor) and locale settings from the environment |
From: |
Goetz Pfeiffer <[email protected]> |
To: |
EPICS tech-talk <[email protected]> |
Date: |
Wed, 5 Nov 2014 10:25:59 +0100 |
Hello Everybody,
when using catools with strings that contain non ASCII characters, these
characters are always printed or read as octal constants, no matter what the
locale settings are.
Note: In the following text all command examples or outputs on the
console are
indented by two characters and preceded by a double colon (::), this is
taken
from reStructuredText format ( http://docutils.sourceforge.net/rst.html ).
In the following example we want to use a character of the ISO-8859-1
character
set. Why not simply use unicode UTF-8 ? The reason is that display managers
like DM2K and EDM do not support unicode. If we want to display non-ASCII
characters in string fields of records with these display managers we
must use
a character set like ISO-8859-1 (also known as Latin 1).
Here is an example on a linux host with unicode UTF-8:
First we write the degree character '°' in ISO-8859-1 encoding to the
EGU field of a record::
> echo "°" | iconv -f UTF-8 -t ISO_8859-1 | xargs caput
U49ID8R:AmsTempT1.EGU
When we now read the value::
> caget U49ID8R:AmsTempT1.EGU
we get::
U49ID8R:AmsTempT1.EGU \260
The '°' character is printed as an octal number "260". This is okay
since with
UTF-8 on our host system we couldn't display an ISO-8859-1 character.
This is our locale::
> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Now we change the locale to ISO-8859-1::
> export LC_ALL=de_DE.iso88591
> locale
LANG=en_US.UTF-8
LC_CTYPE="de_DE.iso88591"
LC_NUMERIC="de_DE.iso88591"
LC_TIME="de_DE.iso88591"
LC_COLLATE="de_DE.iso88591"
LC_MONETARY="de_DE.iso88591"
LC_MESSAGES="de_DE.iso88591"
LC_PAPER="de_DE.iso88591"
LC_NAME="de_DE.iso88591"
LC_ADDRESS="de_DE.iso88591"
LC_TELEPHONE="de_DE.iso88591"
LC_MEASUREMENT="de_DE.iso88591"
LC_IDENTIFICATION="de_DE.iso88591"
LC_ALL=de_DE.iso88591
Now we call caget again::
U49ID8R:AmsTempT1.EGU \260
The character is still printed as an octal value although our locale
settings
(LC_ALL) define that this is a printable character. caget uses function
epicsStrnEscapedFromRaw() from libCom in EPICS base to convert a string to a
printable form. This function calls isprint() to determine which
characters are
printable. The way caget is written means that locale settings from the
environment are ignored.
Using locale settings from the environment in C is simple. The C program
must
have this include::
#include <locale.h>
And it has to call setlocale like this::
setlocale(LC_ALL, "");
Here is, as an example, my patch of caget.c in Epics base:
---------------------------------
--- caget.c.old 2014-11-05 09:31:48.010589013 +0100
+++ caget.c 2014-11-05 09:43:28.611042679 +0100
@@ -28,6 +28,7 @@
#include <stdio.h>
#include <string.h>
+#include <locale.h>
#include <epicsStdlib.h>
#include <epicsString.h>
@@ -59,6 +60,10 @@
" -w <sec>: Wait time, specifies CA timeout, default is %f
second(s)\n"
" -c: Asynchronous get (use ca_get_callback and wait for
completion)\n"
" -p <prio>: CA priority (0-%u, default 0=lowest)\n"
+ "Locale:\n"
+ " -L: use locale according to environment variables in order to\n"
+ " determine what characters are printable. Non printable
characters\n"
+ " are shown as 3 digit octal numbers preceded by a backslash\n"
"Format options:\n"
" Default output format is \"name value\"\n"
" -t: Terse mode - print only value, without name\n"
@@ -389,11 +394,14 @@
LINE_BUFFER(stdout); /* Configure stdout buffering */
- while ((opt = getopt(argc, argv, ":taicnhsSe:f:g:l:#:d:0:w:p:F:"))
!= -1) {
+ while ((opt = getopt(argc, argv, ":taicnhLsSe:f:g:l:#:d:0:w:p:F:"))
!= -1) {
switch (opt) {
case 'h': /* Print usage */
usage();
return 0;
+ case 'L': /* use environment locale settings */
+ setlocale(LC_ALL, "");
+ break;
case 't': /* Terse output mode */
complainIfNotPlainAndSet(&format, terse);
break;
---------------------------------
With these changes the new option "-L" causes caget to use locale
settings from
the environment. Here is an example how to use this::
> export LC_ALL=de_DE.iso88591
> caget -L U49ID8R:AmsTempT1.EGU
U49ID8R:AmsTempT1.EGU °
If the encoding of the terminal emulator (xterm, konsole etc.) is also
set to
ISO-8859-1 (Latin 1) the "°" character is now displayed correctly.
Maybe we could add support for locale settings from the environment to all
catools programs and possibly the IOC shell. I would propose an option "-L"
that enables this feature. What is your opinion ?
Greetings,
Goetz Pfeiffer
Attachment:
signature.asc
Description: OpenPGP digital signature
- Replies:
- Re: catools (caget/caput/camonitor) and locale settings from the environment Ralph Lange
- Re: catools (caget/caput/camonitor) and locale settings from the environment Torsten Bögershausen
- Re: catools (caget/caput/camonitor) and locale settings from the environment Torsten Bögershausen
- Navigate by Date:
- Prev:
Re: Re: read waveform data using ca_create_subscription 吴煊
- Next:
Re: catools (caget/caput/camonitor) and locale settings from the environment Ralph Lange
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
<2014>
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
autosave R5-5 Mooney, Tim M.
- Next:
Re: catools (caget/caput/camonitor) and locale settings from the environment Ralph Lange
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
<2014>
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|