How can PyGMT display Chinese properly on Windows?

leeyupeng · December 2, 2021, 12:44pm

Dear all,
Due to the major slowdown of PyGMT in WSL1(this question), I decided to move to PyGMT on Windows, however, the running speeds of PyGMT programs increased significantly, I meet a new problem now. I want to display Chinese characters on my plot, and I have successfully configured the Chinese fonts for Ghostscprit by referring to this link. I can now add Chinese fonts to my plot via a pre-written text file (ANSI coding) including Chinese characters (black characters on the plot), but I can’t add Chinese directly to the plot through Python scripts (red garbled characters).

Here is the text file.

# text.txt
# x y text
1 1 中文测试

Here is my python script.

import numpy as np
import pygmt

fig = pygmt.Figure()
pygmt.config(PS_CHAR_ENCODING="Standard+")

fig.basemap(region = [0,2,0,4], projection="X4c/6c" ,frame=['xaf','yaf'])
fig.text(textfiles="text.txt",font="13p,40,black") 
fig.text(text="中文测试", x=1,y=2,font="13p,40,red")
fig.text(text="English Test", x=1, y=3, font="13p,40,green")

fig.savefig("cn-test.png",dpi=400)

I also tried to save the python script as ANSI coding, but I get the following error when I run the code.

SyntaxError: Non-UTF-8 code starting with '\xd6' in file E:\python\pygmt\cn\cn-test.py on line 9, but no encoding declared; see https://python.org
/dev/peps/pep-0263/ for details

I can add Chinese directly to my plot when I run python on the Linux platform and no need to set a text file or a python script to ANSI coding. Can anyone give me some advice on how to add Chinese to the plot directly from a Python script on Windows?
Thanks in advance.

maxrjones · December 2, 2021, 3:57pm

Does specifying that it is a raw string make any difference?

fig.text(text=r"中文测试", x=1,y=2,font="13p,40,red")

leeyupeng · December 3, 2021, 2:41am

I’ve already tried this, but it didn’t work.

Andreas · December 4, 2021, 9:48pm

Why do you use ansi? It does not have chinese characters.

From https://www.gaijin.at/en/infos/ascii-ansi-character-table:

ASCII (American Standard Code for Information Interchange) is a 7-bit character set that contains characters from 0 to 127.

The generic term ANSI (American National Standards Institute) is used for 8-bit character sets. These character sets contain the unchanged ASCII character set. In addition, they contain further characters from 128 to 255, which differ in the various ANSI character sets. There are character sets for western special characters and umlauts, and for Arabic, Greek or Cyrillic characters.

Should you not be using unicode?

Shot in the dark, but still:

does python use utf-8? Have you tested setting it to utf-8 explicitly?
In unicode, most asian symbols/characters are in fact two or more bytes so maybe try utf-16 as well.
I guess if one part of your pipeline does not support the needed characters, you will get strange results (theory).

I found this video interesting: https://www.youtube.com/watch?v=ONf1x7pOZNg.

leeyupeng · December 6, 2021, 6:56am

@Andreas Thanks for your reply. I’ve found out what the problem is.
The knowledge on the webpage I refer to maybe too old, here are the recommended PSL_custom_fonts.txt contents, which only supports ANSI or gb2312 coding.

STSong-Light--GB-EUC-H  0.700    1
STFangsong-Light--GB-EUC-H  0.700    1
STHeiti-Regular--GB-EUC-H   0.700   1
STKaiti-Regular--GB-EUC-H   0.700   1
STSong-Light--GB-EUC-V  0.700    1
STFangsong-Light--GB-EUC-V  0.700    1
STHeiti-Regular--GB-EUC-V   0.700   1
STKaiti-Regular--GB-EUC-V   0.700   1

and I modified the contents to something like this,

STSong-Light-UniGB-UTF8-H  0.700    1
STFangsong-Light-UniGB-UTF8-H  0.700    1
STHeiti-Regular-UniGB-UTF8-H   0.700   1
STKaiti-Regular-UniGB-UTF8-H   0.700   1
STSong-Light-UniGB-UTF8-V  0.700    1
STFangsong-Light-UniGB-UTF8-V  0.700    1
STHeiti-Regular-UniGB-UTF8-V   0.700   1
STKaiti-Regular-UniGB-UTF8-V   0.700   1

Ghostscript can support UTF-8 coding now, the python codes can work perfectly!
And I looked into how to add more non-default fonts to GhostScript, including English and Chinses fonts. Here’s my final product.

Andreas · December 6, 2021, 7:21am

That’s great, leeyupeng!

Isn’t this interesting for the gmt dev’s?! Supporting utf ‘out-of-the-box’ would make life easier for many people I think…?

leeyupeng · December 6, 2021, 8:23am

Yeah, you are right. But the stuff about fonts configuring for GTM/PyGMT is more closely related to Ghostscript than to GMT/PyGMT.