I need to replace special characters with equivalent hexadecimal unicode under Linux or Unix like operating system. How do I list or find out unicodes for given characters?
You need to use the unum program which is written in Perl. From the man page:
It is a command line utility which allows you to convert decimal, octal, hexadecimal, and binary numbers; Unicode character and block names; and HTML/XHTML character entity names into one another. It can be used as an on-line special character reference for Web authors. This program written in portable Perl which allows you to look up Unicode and HTML characters by name or number, and inter convert numbers in decimal, hexadecimal, and octal bases.
Use the unum program to insert special characters into a document or a text field. This is useful for characters that are not available on your keyboard.
Download and Install unum program
Type the following wget command:
$ [ ! -d ~/bin/perl ] && mkdir -p ~/bin/perl
$ cd ~/bin/perl
$ wget http://www.fourmilab.ch/webtools/unum/download/unum.tar.gz
Untar unum.tar.gz using tar command, enter:
$ tar xvf unum.tar.gz
Use ln command to create a softlink, run:
$ ln -s unum.pl unum
$ export PATH=$PATH:$HOME/bin:$HOME/bin/perl
How do I use unum program?
The syntax is:
unum arg unum query unum character unum a unum 9
Please note that all name queries are case-insensitive and accept regular expressions. Be sure to quote regular expressions if they contain characters with meaning to the shell.
Perform unicode look for a character called ‘d’, run:
$ unum d
Octal Decimal Hex HTML Character Unicode 0144 100 0x64 d "d" LATIN SMALL LETTER D
To perform unicode look up for ‘abc’ (non-digit), enter:
$ unum abc
Octal Decimal Hex HTML Character Unicode 0141 97 0x61 a "a" LATIN SMALL LETTER A 0142 98 0x62 b "b" LATIN SMALL LETTER B 0143 99 0x63 c "c" LATIN SMALL LETTER C
## arg ## ## Description ## 147 Decimal number 0371 Octal number 0xfa75 Hexadecimal number (letters may be A-F or a-f) 0b11010011 Binary number '∫π' One or more XHTML numeric entities (hex or decimal) xyz The characters xyz (non-digit) c=7Y The characters 7Y (any Unicode characters) b=cherokee List Unicode blocks containing "CHEROKEE" h=alpha List XHTML entities containing "alpha" n=aggravation Unicode characters with "AGGRAVATION" in the name n=^greek.*rho Unicode characters beginning with "GREEK" and containing "RHO" l=gothic List all characters in matching Unicode blocks
A note about GUI programs
You can use gucharmap GUI tool that allows you to browse through all the available Unicode characters and categories for the installed fonts, and to examine their detailed properties. You can start this app by visiting Applications menu:
Applications menu â–¸ Choose AccessoriesÂ â–¸ Character Map
Or, execute the following command:
To display detailed information about a character, perform the following steps:
- Select a character set from the Script or Unicode Block list box. Example: Basic Latin
- Select a character from the Character Table tabbed section. Example: @
- Click on the Character Details tabbed section.
A note about KDE users
Use KCharSelect utility for KDE desktop:
KCharSelect is a tool to select special characters from all installed fonts and copy them into the clipboard.
A note about Mac OS X unix users
On the Mac OS X, you need to use the Character Viewer application.
Check out related media
This tutorial is also available is a quick video format:
- unum home page.