Maori Language Characters?

jocabo · June 15, 2025, 2:33am

Howdy Folks,

Anyone know the octal codes for Maori Language character, i.e. a vowel with a line overtop…

Thanks!!

-jose

yvonnefroehlich · June 15, 2025, 2:41pm

Are you using GMT or PyGMT?

The overview for octal codes of characters supported by GMT or PyGMT can be found 11. Chart of Octal Codes for Characters — GMT 6.5.0 documentation or Supported Encodings and Non-ASCII Characters — PyGMT. However, I can not find the desired letter in this table. Reading here Māori language - Wikipedia, it seems like there is no octal code for this letter. But maybe there is an ISO/IEC 8859 code Supported Encodings and Non-ASCII Characters — PyGMT.

Another option would be trying to create the letter using @! which can combine two characters (please see text — GMT 6.5.0 documentation): @!\225a (the octal code \255 gives a high dash).

In PyGMT it’s possible to directly using the letter:

import pygmt

size = 1

fig = pygmt.Figure()
fig.basemap(region=[-size, size] * 2, projection=f"X{size * 2}c", frame=0)

fig.text(x=0, y=0, text="ā")

fig.show()

timhume · June 15, 2025, 4:01pm

Hi @jocabo ,

What you are wanting are macrons over the vowels. These are not available in the character sets that come with GMT.

You can combine characters as @yvonnefroehlich suggested. Another option might be to insert LaTeX: 13. Using LaTeX Expressions in GMT — GMT 6.5.0 documentation - you can do macrons with LaTeX.

Tongan (and many Pacific languages) have similar issues. Many, many years ago I provided the TO language configuration for GMT, and from memory ran into this then. It would be nice if GMT supported Unicode, but I imagine that would require a lot of work to implement.

Joaquim · June 15, 2025, 4:55pm

It surely would be nice. But the biggest problem, I guess, would be convincing Adobe to add Unicode support to PostScript.

jocabo · June 15, 2025, 10:05pm

Thanks for the responses everyone…! (@timhume @Joaquim @yvonnefroehlich)

I’m not on PyGMT, I am just an old dinosaur using oldschool GMT who lives in perpetual fear of every update…!

Anyway, @yvonnefroehlich 's suggestion for the @! trick with \255 worked!

I was making a map for a paper my wife is co-authoring in an indigenous studies journal, so it was important to get this detail right …

Thanks again, y’all are awesome!

-jose

Joaquim · June 15, 2025, 10:07pm

Still doing Tsunamis?

jocabo · June 15, 2025, 10:22pm

tsunamis and other coastal hazards all the way… (to my grave…)

timhume · June 25, 2025, 9:34am

Upon revisiting this for something I needed, I found the compositing character feature is not perfect, but still passable. I wrote a shell function to convert selected Unicode characters (specifically colons, macrons over vowels and the fakauʻa) in a string to GMT octal codes.

#
# Convert special characters in a string to GMT octal codes.
#
function gmtstr(){
	echo "${1}" | sed -e 's/:/\\072/g' \
		-e 's/ʻ/\\140/g' \
		-e 's/ā/@!\\257a/g' \
		-e 's/Ā/@!\\257A/g' \
		-e 's/ō/@!\\257o/g' \
		-e 's/Ō/@!\\257O/g' \
		-e 's/ē/@!\\257e/g' \
		-e 's/Ē/@!\\257E/g' \
		-e 's/ū/@!\\257u/g' \
		-e 's/Ū/@!\\257U/g' \
		-e 's/ī/@!i\\257/g' \
		-e 's/Ī/@!\\257I/g'
}

This can be expanded if one needs other characters which can be built by compositing. Wherever one has a string with these Unicode characters in it, one can do things like this:

$(gmtstr "Mālō e lelei")

Here is a map I’m working on which demonstrates this (the rest of the map’s code is not ready for prime time viewing). One can immediately see some problems. The combined macron and upper case vowels overlap each other. But it’s better than nothing.

Also, the width of a combined character is determined by the width of the second character in the combination - which is why it’s important to specify @!\257U instead of @!U\257, because the letters tend to be wider than the macron. The exception is ī which is specified with @!i\257 because the lower case i is so narrow. Though this width problem might also be overcome if one used a monospaced font such as Courier, Courier-Bold or Courier-Oblique.

I tried to get the LaTeX feature for strings working, but I’m not sure it recognises the LaTeX macron functions (\={a} etc.)

mkononets · June 25, 2025, 5:34pm

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[\textsf{\=A\=a\=O\=o\=E\=e\=U\=u\=I\=i} @[ AaOoEeUuIi" -png quick

timhume · June 25, 2025, 9:30pm

Thanks @mkononets . The secret is the \textsf{...} around the \=a etc. I’m guessing this knocks LaTeX out of math mode into normal text mode. \=a appears to not work in LaTeX’s maths mode.

If I don’t use \textsf{...}, I get this:

But with \textsf{...} around it I get this:

mkononets · June 25, 2025, 10:09pm

Another funny thing is that from 13. Using LaTeX Expressions in GMT, 13.2. Technical Details it appears LaTeX typesetting is used as long as there’s @[...@[ math mode with some dummy math mode string.

so one can type utf8 string as long as @[ @[ is included (I used just a space to enable LaTeX unicode typesetting)

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[ @[ĀāŌōĒēŪūĪī AaOoEeUuIi" -png quick

NB Īī vs \=I\=i

timhume · June 25, 2025, 10:56pm

That’s potentially really useful. However, I’d be a bit cautious. LaTeX didn’t used to support Unicode (that’s why things like lualatex and xelatex are used). So I wonder if it works if one’s system only has “original” LaTeX?

Joaquim · June 25, 2025, 11:05pm

The above doesn’t work for me, but the MiKTex has been very picky here on Win.

mkononets · June 26, 2025, 10:38am

Absolutely, I was way too optimistic. With that simple LaTeX preamble used by gmt for enabling LaTeX math mode it seems impossible to typeset e.g. in other alphabets than Latin. I tried Cyrillic characters and those are just getting skipped.

But it helps a bit even if it’s limited to accented Latin chars and even it works only on Linux. Better than looking through the char code tables and typing those annoying escaped numeric character codes.

timhume · June 26, 2025, 10:48am

Here’s my modified bash function. What it does is convert a string containing selected Unicode characters into GMT usable LaTeX format:

#
# Convert special characters in a string to LaTeX format.
#
function gmtstr(){
	echo "@[\textsf{${1}}@[" | sed -e 's/ʻ/\\lq\\hspace{0pt}/g' \
		-e 's/ā/\\=a/g' \
		-e 's/Ā/\\=A/g' \
		-e 's/ō/\\=o/g' \
		-e 's/Ō/\\=O/g' \
		-e 's/ē/\\=e/g' \
		-e 's/Ē/\\=E/g' \
		-e 's/ū/\\=u/g' \
		-e 's/Ū/\\=U/g' \
		-e 's/ī/\\=i/g' \
		-e 's/Ī/\\=I/g'
}

This produces nicer looking results than GMT’s character combining codes. Here’s a snippet of my shell script showing how the function is used (it’s only needed on the H line):

	gmt legend -DjBC+w100%+jTC+o0/1 -F+pthicker,black+r6p <<- END
	G	0.2c
	H	14,Helvetica,black	$(gmtstr "${ship}: ${route_name}")
	G	1c
	I	/Users/tim/ecmwf/tms.png	1.5c	RB
	N	2
	D	thin,black
	V	thin,black
	L	8,Helvetica,black	L	Departure Time\072 ${dep_str}
	L	8,Helvetica,black	L	Arrival Time\072 ${arr_str}
	D	thin,black
	L	8,Helvetica,black	L	ECMWF Cycle\072 ${cycle_str}
	D	thin,black
	V	thin,black
	N	1
	G	0.2c
	L	8,Helvetica,black	C	Wind speed (knots) and direction (red barbs), and wave height (m) and direction (yellow arrows)
    END

mkononets · June 26, 2025, 11:19am

NB \=i vs ī vs i, at least on my system:

gmt basemap -R-200/200/0/2 -JX15c -BS -Bxaf+l"@[\textsf{\=I\=i}@[ Īī Ii" -png quick

timhume · June 26, 2025, 11:28am

Yeah, I noticed that too. This is getting into detail that needs Unicode support. Still, at least the macrons don’t intersect with the capital vowels when the LaTeX method is used.

You can see from my shell function that I implement the fakau’a (ʻ) as a left quote, but this is not technically correct either, it should really be U+02BB, which is a letter rather than a punctuation symbol.

timhume · June 27, 2025, 8:17am

I thought I’d cracked this matter of Pacific languages once and for all. But sadly I was wrong. Here is what I did though, in case anyone is interested.

GMT supports the characters sets Standard, Standard+, ISOLatin1, ISOLatin1+ and ISO-8859-x, where x is 1-11 or 13-16: https://docs.generic-mapping-tools.org/6.5/gmt.conf.html#postscript-parameters

Vowels with macrons are in ISO-8859-13 (originally designed for Baltic languages). So here’s what to do:

gmt legend (or whatever module) --PS_CHAR_ENCODING=ISO-8859-13 ...

And then refer to the special characters like this (a bash function to convert Unicode characters to ISO-8859-13 octal codes):

function gmtstr(){
	echo "${1}" | sed -e 's/ʻ/\\140/g' \
		-e 's/:/\\072/g' \
		-e 's/ā/\\342/g' \
		-e 's/Ā/\\302/g' \
		-e 's/ō/\\364/g' \
		-e 's/Ō/\\324/g' \
		-e 's/ē/\\347/g' \
		-e 's/Ē/\\307/g' \
		-e 's/ū/\\373/g' \
		-e 's/Ū/\\333/g' \
		-e 's/ī/\\356/g' \
		-e 's/Ī/\\316/g'
}

All well and good. But it would seem the fonts don’t include glyphs for letters with macrons on them. At least Helvetica doesn’t. So when I set the PS_CHAR_ENCODING to ISO-8859-13 and specify the character with octal code \347 I end up with an “e”, not an “ē”.

I did a bit of searching, and back in the days when Adobe supplied the postscript fonts, they had something called Adobe CE fonts, where I think CE stands for Central Europe. And evidently these Adobe CE fonts had macrons, but the standard ones didn’t. Things are easier now with Unicode - if anyone talks about the “good old days”, at least in this matter they’re wrong.

It seems the LaTeX way is the best one can do in GMT at the moment.

mkononets · June 28, 2025, 4:45pm

What is the character encoding in windows command-line terminal nowadays?

Joaquim · June 28, 2025, 5:07pm

Probably don’t know the answer to that. When I do

C:\v>chcp
Active code page: 437

which according to https://superuser.com/questions/1170656/windows-10-terminal-encoding is IBM437 OEM United States

But the Windows terminal supports Unicode (16, 32?) because the Julia repl can print any unicode character and it uses the Windows terminal.