Maori Language Characters?

Howdy Folks,

Anyone know the octal codes for Maori Language character, i.e. a vowel with a line overtop…

Thanks!!

-jose

Hi @jocabo,

Are you using GMT or PyGMT?

The overview for octal codes of characters supported by GMT or PyGMT can be found 11. Chart of Octal Codes for Characters — GMT 6.5.0 documentation or Supported Encodings and Non-ASCII Characters — PyGMT. However, I can not find the desired letter in this table. Reading here Māori language - Wikipedia, it seems like there is no octal code for this letter. But maybe there is an ISO/IEC 8859 code Supported Encodings and Non-ASCII Characters — PyGMT.

Another option would be trying to create the letter using @! which can combine two characters (please see text — GMT 6.5.0 documentation): @!\225a (the octal code \255 gives a high dash).

In PyGMT it’s possible to directly using the letter:

import pygmt

size = 1

fig = pygmt.Figure()
fig.basemap(region=[-size, size] * 2, projection=f"X{size * 2}c", frame=0)

fig.text(x=0, y=0, text="ā")

fig.show()

1 Like

Hi @jocabo ,

What you are wanting are macrons over the vowels. These are not available in the character sets that come with GMT.

You can combine characters as @yvonnefroehlich suggested. Another option might be to insert LaTeX: 13. Using LaTeX Expressions in GMT — GMT 6.5.0 documentation - you can do macrons with LaTeX.

Tongan (and many Pacific languages) have similar issues. Many, many years ago I provided the TO language configuration for GMT, and from memory ran into this then. It would be nice if GMT supported Unicode, but I imagine that would require a lot of work to implement.

1 Like

It surely would be nice. But the biggest problem, I guess, would be convincing Adobe to add Unicode support to PostScript.

1 Like

Thanks for the responses everyone…! (@timhume @Joaquim @yvonnefroehlich)

I’m not on PyGMT, I am just an old dinosaur using oldschool GMT who lives in perpetual fear of every update…!

Anyway, @yvonnefroehlich 's suggestion for the @! trick with \255 worked!

I was making a map for a paper my wife is co-authoring in an indigenous studies journal, so it was important to get this detail right …

Thanks again, y’all are awesome!

-jose

:slight_smile:

Still doing Tsunamis?

1 Like

tsunamis and other coastal hazards all the way… (to my grave…)

Upon revisiting this for something I needed, I found the compositing character feature is not perfect, but still passable. I wrote a shell function to convert selected Unicode characters (specifically colons, macrons over vowels and the fakauʻa) in a string to GMT octal codes.

#
# Convert special characters in a string to GMT octal codes.
#
function gmtstr(){
	echo "${1}" | sed -e 's/:/\\072/g' \
		-e 's/ʻ/\\140/g' \
		-e 's/ā/@!\\257a/g' \
		-e 's/Ā/@!\\257A/g' \
		-e 's/ō/@!\\257o/g' \
		-e 's/Ō/@!\\257O/g' \
		-e 's/ē/@!\\257e/g' \
		-e 's/Ē/@!\\257E/g' \
		-e 's/ū/@!\\257u/g' \
		-e 's/Ū/@!\\257U/g' \
		-e 's/ī/@!i\\257/g' \
		-e 's/Ī/@!\\257I/g'
}

This can be expanded if one needs other characters which can be built by compositing. Wherever one has a string with these Unicode characters in it, one can do things like this:

$(gmtstr "Mālō e lelei")

Here is a map I’m working on which demonstrates this (the rest of the map’s code is not ready for prime time viewing). One can immediately see some problems. The combined macron and upper case vowels overlap each other. But it’s better than nothing.

Also, the width of a combined character is determined by the width of the second character in the combination - which is why it’s important to specify @!\257U instead of @!U\257, because the letters tend to be wider than the macron. The exception is ī which is specified with @!i\257 because the lower case i is so narrow. Though this width problem might also be overcome if one used a monospaced font such as Courier, Courier-Bold or Courier-Oblique.

I tried to get the LaTeX feature for strings working, but I’m not sure it recognises the LaTeX macron functions (\={a} etc.)

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[\textsf{\=A\=a\=O\=o\=E\=e\=U\=u\=I\=i} @[ AaOoEeUuIi" -png quick

Thanks @mkononets . The secret is the \textsf{...} around the \=a etc. I’m guessing this knocks LaTeX out of math mode into normal text mode. \=a appears to not work in LaTeX’s maths mode.

If I don’t use \textsf{...}, I get this:

But with \textsf{...} around it I get this:

Another funny thing is that from 13. Using LaTeX Expressions in GMT, 13.2. Technical Details it appears LaTeX typesetting is used as long as there’s @[...@[ math mode with some dummy math mode string.

so one can type utf8 string as long as @[ @[ is included (I used just a space to enable LaTeX unicode typesetting)

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[ @[ĀāŌōĒēŪūĪī AaOoEeUuIi" -png quick

NB Īī vs \=I\=i

1 Like

That’s potentially really useful. However, I’d be a bit cautious. LaTeX didn’t used to support Unicode (that’s why things like lualatex and xelatex are used). So I wonder if it works if one’s system only has “original” LaTeX?

The above doesn’t work for me, but the MiKTex has been very picky here on Win.

Absolutely, I was way too optimistic. With that simple LaTeX preamble used by gmt for enabling LaTeX math mode it seems impossible to typeset e.g. in other alphabets than Latin. I tried Cyrillic characters and those are just getting skipped.

But it helps a bit even if it’s limited to accented Latin chars and even it works only on Linux. Better than looking through the char code tables and typing those annoying escaped numeric character codes.

Here’s my modified bash function. What it does is convert a string containing selected Unicode characters into GMT usable LaTeX format:

#
# Convert special characters in a string to LaTeX format.
#
function gmtstr(){
	echo "@[\textsf{${1}}@[" | sed -e 's/ʻ/\\lq\\hspace{0pt}/g' \
		-e 's/ā/\\=a/g' \
		-e 's/Ā/\\=A/g' \
		-e 's/ō/\\=o/g' \
		-e 's/Ō/\\=O/g' \
		-e 's/ē/\\=e/g' \
		-e 's/Ē/\\=E/g' \
		-e 's/ū/\\=u/g' \
		-e 's/Ū/\\=U/g' \
		-e 's/ī/\\=i/g' \
		-e 's/Ī/\\=I/g'
}

This produces nicer looking results than GMT’s character combining codes. Here’s a snippet of my shell script showing how the function is used (it’s only needed on the H line):

	gmt legend -DjBC+w100%+jTC+o0/1 -F+pthicker,black+r6p <<- END
	G	0.2c
	H	14,Helvetica,black	$(gmtstr "${ship}: ${route_name}")
	G	1c
	I	/Users/tim/ecmwf/tms.png	1.5c	RB
	N	2
	D	thin,black
	V	thin,black
	L	8,Helvetica,black	L	Departure Time\072 ${dep_str}
	L	8,Helvetica,black	L	Arrival Time\072 ${arr_str}
	D	thin,black
	L	8,Helvetica,black	L	ECMWF Cycle\072 ${cycle_str}
	D	thin,black
	V	thin,black
	N	1
	G	0.2c
	L	8,Helvetica,black	C	Wind speed (knots) and direction (red barbs), and wave height (m) and direction (yellow arrows)
    END

NB \=i vs ī vs i, at least on my system:

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[\textsf{\=I\=i}@[ Īī Ii" -png quick

Yeah, I noticed that too. This is getting into detail that needs Unicode support. Still, at least the macrons don’t intersect with the capital vowels when the LaTeX method is used.

You can see from my shell function that I implement the fakau’a (ʻ) as a left quote, but this is not technically correct either, it should really be U+02BB, which is a letter rather than a punctuation symbol.

I thought I’d cracked this matter of Pacific languages once and for all. But sadly I was wrong. Here is what I did though, in case anyone is interested.

GMT supports the characters sets Standard, Standard+, ISOLatin1, ISOLatin1+ and ISO-8859-x, where x is 1-11 or 13-16: https://docs.generic-mapping-tools.org/6.5/gmt.conf.html#postscript-parameters

Vowels with macrons are in ISO-8859-13 (originally designed for Baltic languages). So here’s what to do:

gmt legend (or whatever module) --PS_CHAR_ENCODING=ISO-8859-13 ...

And then refer to the special characters like this (a bash function to convert Unicode characters to ISO-8859-13 octal codes):

function gmtstr(){
	echo "${1}" | sed -e 's/ʻ/\\140/g' \
		-e 's/:/\\072/g' \
		-e 's/ā/\\342/g' \
		-e 's/Ā/\\302/g' \
		-e 's/ō/\\364/g' \
		-e 's/Ō/\\324/g' \
		-e 's/ē/\\347/g' \
		-e 's/Ē/\\307/g' \
		-e 's/ū/\\373/g' \
		-e 's/Ū/\\333/g' \
		-e 's/ī/\\356/g' \
		-e 's/Ī/\\316/g'
}

All well and good. But it would seem the fonts don’t include glyphs for letters with macrons on them. At least Helvetica doesn’t. So when I set the PS_CHAR_ENCODING to ISO-8859-13 and specify the character with octal code \347 I end up with an “e”, not an “ē”.

I did a bit of searching, and back in the days when Adobe supplied the postscript fonts, they had something called Adobe CE fonts, where I think CE stands for Central Europe. And evidently these Adobe CE fonts had macrons, but the standard ones didn’t. Things are easier now with Unicode - if anyone talks about the “good old days”, at least in this matter they’re wrong.

It seems the LaTeX way is the best one can do in GMT at the moment.

What is the character encoding in windows command-line terminal nowadays?

Probably don’t know the answer to that. When I do

C:\v>chcp
Active code page: 437

which according to https://superuser.com/questions/1170656/windows-10-terminal-encoding is IBM437 OEM United States

But the Windows terminal supports Unicode (16, 32?) because the Julia repl can print any unicode character and it uses the Windows terminal.