Write eastern language Persian/Farsi/Arabic

saeedsltm · January 16, 2021, 3:42pm

Dear @uleysky, many thanks for your consideration. OK i will try to make a list of all important characters and ligatures and prepare also a more complex sentence for tomorrow.

saeedsltm · January 17, 2021, 8:03am

@uleysky, i’ve selected all required Persian letters (157 Unicode points) + numbers (10 Unicode points). Please check selected.dat. I’ve also checked the ligatures.py and i think we don’t need any of them since they are just text beautifier. So the total number of Unicode points are 167 which is far from 256. The data.csv file now contains a complex string including both Persian letter and number + English letters.
data.dat (157 Bytes)
selected.dat (3.7 KB)

uleysky · January 17, 2021, 12:51pm

I am using the following python script

#!/usr/bin/python
import arabic_reshaper
from bidi.algorithm import get_display

reshaped_text = arabic_reshaper.reshape(u'اولین نقشه با برنامه GMT که فونت پارسی را به درستی نمایش می دهد. نفشه شماره ۱ و ۲ و ۵')
bidi_text = get_display(reshaped_text)
print(bidi_text.encode('raw_unicode_escape'))

but something seems went wrong. The text after the “GMT” is before it and vice versa.
Here, please ex31.pdf (72.3 KB)

saeedsltm · January 17, 2021, 1:15pm

In the PDF everything is OK, except one character (character "و" is replaced by "م" in word "فونت"). Please use the following to be sure all characters being red correctly.

from pandas import read_csv
import arabic_reshaper
from bidi.algorithm import get_display

db = read_csv("data.dat", names=["lon","lat","text","v"])

reshaped_text = arabic_reshaper.reshape(db.text.values[0])
bidi_text = get_display(reshaped_text)
print(bidi_text.encode('raw_unicode_escape'))
#print(bidi_text)

uleysky · January 17, 2021, 1:38pm

Same. Perhaps a bug in the arabic_reshaper?

saeedsltm · January 17, 2021, 1:41pm

But when we use print function (inside the python code) the output is correct! I inserted print in the last line to check it and it was correct. Strange!

uleysky · January 17, 2021, 1:47pm

My mistake. Mapped code point U+FEEE on the same glyph as U+FEE3. Here is correct result ex31.pdf (48.1 KB)

saeedsltm · January 17, 2021, 1:48pm

Beside Persian, the arabic_reshaper also supports Urdu, Pashto and Arabic languages, and there may be several volunteers for some of identical characters. That’s why I separated the Persian Unicode points in selected.dat file.

saeedsltm · January 17, 2021, 1:50pm

Yes, now all Persian characters are correct, but the word “GMT” is replaced by squares!

uleysky · January 17, 2021, 1:54pm

Demonstration of another problem, there is no Latin characters in the Arabic font (NotoSansArabic-Regular), so you have to switch the font (to NotoSans-Regular). It’s good that at least this can be done by means of the GMT (@%NotoSans-Regular%GMT@%%). ex31.pdf (72.4 KB)

saeedsltm · January 17, 2021, 1:59pm

I think this problem is solvable if we use any font which has both Persian and English character set right?

uleysky · January 17, 2021, 2:04pm

The problem is solved anyway, font switching works fine as you can see.

Okay, now I have a rough idea of what needs to be done. I’ll start writing code to automatically generate a sequence of characters for the GMT from a Unicode string. Now this is done manually, you have seen the results with errors. I don’t know if I will have time during the work week, but I plan to continue on the weekend.

saeedsltm · January 17, 2021, 2:06pm

Perfect.
Please let me know if i can do anything. Thanks again for the valuable time you’re spending.

uleysky · January 22, 2021, 6:08am

Here is some results: Farsi.zip (629.2 KB)

Encoding file, farsi.txt, contains unicode code points for language characters. Four forms of 32 letters and some additional characters taken from wikipedia Persian alphabet - Wikipedia. Quite convenient when the language has 32 letters )
Font encoding vector generator, two files, queryfont.cpp and mktable. Creates a font-specific postscript code for the GMT. Also creates PSL_custom_fonts.txt.
Translator of Arabic Unicode into a format suitable for the GMT. Two parts, generator of sed commands, gensed, and run-time translator ar.py. This is a bunch of crap code, I hope you rewrite it yourself normally on Python. Embedding it in a script also is a pain.
Test page generator, test/testtable. Creates a table where you can see how the font matches the encoding, what characters are there, what are missing.
You test example, test.sh and data.csv. Please note that you have to switch the font in the data.csv, since your font does not contain Latin letters. GMT sequences is a problem for bidirectional text (

Everything seems to work, but we may need to add additional symbols to the farsi.txt.

We will also need to create a localization file to draw ticks on the axes correctly.

saeedsltm · January 22, 2021, 8:22am

Dear @uleysky, thanks again for the codes you’ve provided. seems it worked perfectly. i have a problem when running the test.sh, with compiling queryfont. the following is the terminal output.

(base) saeed@saeed-P453UJ:~/Downloads/Compressed/Farsi$ ./test.sh
/usr/bin/ld: /tmp/ccEJYVAs.o: in function main': queryfont.cpp:(.text+0xac): undefined reference to FT_Init_FreeType’
/usr/bin/ld: queryfont.cpp:(.text+0x119): undefined reference to FT_New_Face' /usr/bin/ld: queryfont.cpp:(.text+0x18a): undefined reference to FT_Select_Charmap’
/usr/bin/ld: queryfont.cpp:(.text+0x1e4): undefined reference to FT_Get_First_Char' /usr/bin/ld: queryfont.cpp:(.text+0x21a): undefined reference to FT_Get_Glyph_Name’
/usr/bin/ld: queryfont.cpp:(.text+0x233): undefined reference to FT_Face_GetVariantsOfChar' /usr/bin/ld: queryfont.cpp:(.text+0x38e): undefined reference to FT_Get_Next_Char’
/usr/bin/ld: queryfont.cpp:(.text+0x3a9): undefined reference to FT_Done_Face' /usr/bin/ld: queryfont.cpp:(.text+0x3b8): undefined reference to FT_Done_FreeType’
collect2: error: ld returned 1 exit status
./mktable: line 79: ./queryfont: No such file or directory
./mktable: line 79: ./queryfont: No such file or directory
./test.sh: line 63: gawk: command not found
./test.sh: line 18: gawk: command not found
./test.sh: line 25: gawk: command not found
psconvert [ERROR]: The file /home/saeed/.gmt/sessions/gmt6.14377/gmt_0.ps- has no BoundingBox in the first 20 lines or last 256 bytes. Use -A option.
rm: cannot remove ‘queryfont’: No such file or directory

any idea what is going wrong on my Ubuntu 20.04?

uleysky · January 22, 2021, 8:31am

apt install libfreetype-dev, possibly?

saeedsltm · January 22, 2021, 8:56am

sudo apt-get install libfreetype6 libfreetype6-dev libfreetype-dev but still not working

uleysky · January 22, 2021, 9:19am

Indeed, it does not work in Ubuntu. The solution is as simple as it is idiotic (often in Ubuntu): instead of
g++ -o queryfont pkg-config --cflags --libs freetype2 queryfont.cpp
write
g++ queryfont.cpp -o queryfont pkg-config --cflags --libs freetype2
In Gentoo works any variant.

saeedsltm · January 22, 2021, 11:04am

Yes, that is working now. I will do some tests and then inform you about the results.

saeedsltm · January 22, 2021, 7:56pm

Dear @uleysky, i’ve tested the code with several fonts and variety text strings:
1- All is doing well now, especially when using a double language supported font (simultaneous Farsi and English character). To see its functionality please see the font IRANSans.ttf, so there is no need to specify the font inside text string and it also solved the ticks annotation issue.
2- There are only minor issues with some special characters like (? % …) which despite we have them in farsi.txt file, but they are shown in Unicode form, please see results.zip (111.9 KB)