Patched version of Mozilla Firefox to improve complex-layout and right-to-left languages
Email Stephen Blackheath 1
- Updated 26 Jan 2006
This page contains a modified version of Mozilla Firefox to work better with languages
that require a complex layout, especially right-to-left languages. These changes
should make Firefox render text correctly in the following languages, especially in "justify" mode (please tell me your test results!):
Arabic2
Bangla/Bengali
Devanagari
Farsi
Gujarati
|
Gurmukhi
Hangul
Hebrew2
Kannada2
Khmer
|
Malayalam
Oriya
Sinhala
Syriac
Tamil
|
Telugu
Thai
Tibetan
Urdu
|
1 This link allows you to email me, but you must request my email address through a web form.
This is an anti-spam measure - sorry for the inconvenience.
2 I have done some testing with this language.
- NEWS/NOTE!:
- 27 Nov 2006 - I am not maintaining this page or this patch, but you can find out the latest
on the new Thebes text renderer at [Bug 333659] Move nsTextFrame over to Thebes APIs
Frequently asked question: When will these changes be integrated into the Mozilla build?
ANSWER: The short answer is around the beginning of 2007. In the meantime, the
patches on this page will have to do.
The patch on this page is extremely unlikely to be integrated by
Mozilla. The reason is that Mozilla and Firefox are switching from their existing graphics rendering
layer to a new implementation, and the patches here apply to the old
implementation (used in existing Firefox versions).
The new implementation is called Thebes, which uses the new Cairo
cross-platform graphics rendering library. Cairo is a community project.
I will be contributing to Mozilla in this work. It will be my job to make sure the
complex layout rendering works correctly. I will use this page to co-ordinate
testing, and you will be able to help me test.
Take a look at my thebes work (in progress) here.
Linux note: These changes include the patch to be found on the TA Linux site. Unlike the TA Linux patch, this patch turns pango on by default
(overridden by environment setting MOZ_ENABLE_PANGO=0). Note that Fedora Linux
ships a version of Firefox with Pango support (switch on with 'export MOZ_ENABLE_PANGO=1'), but this version works better for complex layout languages.
Hebrew note: Here are some screenshots of Hebrew rendered by this version. See Mechon Mamre for your Biblical texts and
help on getting your (Windows) system set up for Hebrew.
See Dov Grobgeld's page
(author of the Pango Hebrew module) for Linux help.
Here are the files:
| Microsoft Windows |
| firefox-1.0.4.en-US.rtl-v3.win32.installer.exe |
Modified version of Firefox for Windows (with version 3 patch). |
| firefox-1.0.6.en-US.rtl-v4.8.win32.installer.exe |
Modified version of Firefox for Windows (with version 4.8 patch).
Note: This version doesn't work properly on Windows 98!
|
| firefox-1.0.6.en-US.rtl-v4.9.win32.installer.exe |
Modified version of Firefox for Windows (with version 4.9 patch).
Fixes Windows 98 brokenness - no other changes. Note Arabic is broken. |
| firefox-1.0.6.en-US.rtl-v4.10.win32.installer.exe |
** NEWEST! (stable) ** Modified version of Firefox for Windows (with version 4.10 patch). Note: 4.10 patch is functionally equivalent to 4.12 patch on Windows.
|
| firefox-1.6a1.en-US.rtl-v4.12.win32.installer.exe |
** NEWEST! (unstable) ** Modified version of DeerPark for Windows (with version 4.12 patch).
Note that this is an unstable version of Firefox, built from the very latest CVS. Use the 1.0.6 version if you want a stable one.
Install issues:
- It sometimes fails saying it can't find the "quality feedback agent". Cancel and
re-start the installation, and it will work the second time.
- Ensure you have the correct fonts installed. You may have to tell Firefox
explicitly to use your chosen font through the "Fonts and Colors" Preference.
- Be aware that there are still issues on Windows 98 - see Bug #8 below.
- You have to enable your desired language in the Regional Options under the
Control Panel, otherwise Windows will not render your text properly.
|
msvcp71.dll msvcr71.dll |
- You may see an error that MSVCR71.DLL is missing, or that some DLL fails to load. If you do, then
download these two files, and put them into C:\WINNT\SYSTEM32 (on Windows 2000) or C:\WINDOWS (on Windows 98).
I have not yet figured out how to automate this process.
|
| Gentoo Linux |
Instructions (Gentoo experts might know a better way):
- Install the dependencies for Mozilla Firefox (This does not install Mozilla Firefox
itself):
emerge --onlydeps mozilla-firefox
- Copy the downloaded mozilla-firefox-???.ebuild file
to the following directory, overwriting an
existing file of the same name:
/usr/portage/www-client/mozilla-firefox/
- Make it re-calculate file digests so it treats the new ebuild file as valid:
cd /usr/portage/www-client/mozilla-firefox
ebuild mozilla-firefox-???.ebuild digest
- Build and install patched version:
cd /usr/portage/www-client/mozilla-firefox
ebuild mozilla-firefox-???.ebuild merge
- (Alternatively the ebuild .. merge can be replaced by three steps, compile, install, then qmerge)
As for the Pango problems in bug #21:
- Bug #21 is fixed in pango-1.10.0, which you can install with this command:
cd /usr/portage/x11-libs/pango
ebuild pango-1.10.0.ebuild merge
|
| mozilla-firefox-1.0.6-r6.ebuild |
Gentoo ebuild for version 4.11 patch |
| mozilla-firefox-1.0.6-r6.ebuild |
Gentoo ebuild for version 4.12 patch |
If you want it pre-built on another platform, let me know and I'll see what I can do.
To build on Linux, add --enable-pango to the 'configure' command.
Bugs fixed:
- #1 Pango support added on Unix
See above.
- #7 Gaps at certain sizes
At some sizes (pressing ctrl-+ or ctrl-- to check), I see a gap after
small letters, such as in Gen 2,4's little hey. Not to be compared to the
big gap after the first beyt in Gen 1,1 in the before version of Moz, of
course.
RE-TEST Probably fixed in v3 with #11 and #12 - please re-test.
- #11 Windows font rendering draws in reverse when 'justify' is used
On Windows, text is rendered in reverse when 'justify' is used.
The culprit is the version of nsRenderingContextWin::DrawString that takes
aFontID as an argument.
Bug #12 was introduced to compensate for it, thus it was reversed twice
and looked bad on Windows, but was at least going in the right direction.
Since bug #11 wasn't present on Linux, the compensation in #12 caused
the text to come out in reverse on Linux. Fixed in v3 and v4.6.
- #12 PaintTextSlowly implemented wrong
I fixed bug #2 the wrong way: PaintTextSlowly in nsTextFrame.cpp
SHOULD have been used in the 'justify' case, but it had several things
wrong with it. RenderString called a version of
nsRenderingContext::DrawString that rendered in reverse on Windows.
Fixed in v3 patch. Fixed in CVS for DeerPark.
- #21 Linux, all patch versions: Rendering of vowels occasionally screws up. (screenshot)
The TanakhML page in one tab has the strange ability to make the M-M page in
another tab move all the diacritical marks to be placed *between* letters.
Changing font doesn't make it come right, but changing font *size* does.
Re-displaying the TanakhML tab makes it screw up again.
http://www.whatsup.org.il/ is an even
more reliable page to cause this bug to occur. The effect is seen on the Mechon-Mamre
text.
Resolution: It turns out this is a bug in Pango. This
rough patch fixes it. My bug report:
Bug 313781: Hebrew vowels rendered wrong because shaper font cache gets polluted
This bug is fixed in pango-1.10.0
- #24 nsTextFrame::RenderString() inefficient and measures incorrectly in many cases
Symptom: In justified text, Hebrew text looks like a splattered mess.
This bug is why only SBL Hebrew works with Firefox-1.x+v3 patch. Backporting
this fix to Firefox-1.x would make it work with more fonts.
Resolution: The DeerPark-4.6 patch above contains an implementation that fixes it.
This could be backported to Firefox-1.x. This also adds a new optionally
implemented "GetCharacterSpacing()" function to the 'gfx' implementations,
which does the job properly. In the absence of GetCharacterSpacing(), I have
fixed the measuring of RenderString() by not assuming a string's width is
the sum of the widths of the individual characters (which assumption does not
hold good on Hebrew w/ vowels).
- #25 Windows, Firefox 1.0.6+4.8 patch: Arabic text selection isn't quite right.
Justified Arabic text jumps around all over the place, and it does not
go all the way to the right, indicating that spaces are supposed to be getting
added but are not. Try Omnibus test case at
Bug 297074 - Make nsRenderingContext::GetWidth optionally return an array of glyph widths
Justified Arabic works fine on Windows with DeerPark + 4.9 patch.
Justified Arabic works badly on Linux/Pango with DeerPark + 4.9 patch. (It is
as if the spacing is not being added in the middle of the text.)
Justified Arabic works even worse on Firefox 1.x + 4.9 patch on all platforms.
Resolution: Fixed in v4.10 patch
- #27 Firefox 1.0.x+4.10 patch on Linux: Justified Davanagari, Tamil, Gurmukhi wrong on Omnibus test case
These three languages have spaces added between letters, not just between words, which doesn't look right.
Bug does not occur on Windows.
Resolution: Fixed in v4.11 patch
- #30 Linux: The lines starting position sometimes wrong in RTL
text and moved to the left.
I had installed the firefox version for Sarge
(firefox-1.0.4 4.11) from your site few days ago.
The lines starting position sometimes wrong in RTL
text and moved to the left. Somtimes it happens at the
place of blank symbol. The shift might be up to
several letters.
The most annoying thing is that when you type (is some
kind of dialog box) the position of the cursor
sometimes do not fit the position of the text that is
actually written. So it is impossible to write the
text correctly.
It is similar to the shift of marked text versus
actual mouse pointer position - shift to the left.
However it happens not only when marking text but also
when editing it.
I've found it when I was trying to write/edit the
messages in Hebrew on the forum at
http://whatsup.co.il/
More detail on this bug here.
Note: This bug exists ONLY in my patched versions of Firefox 1.0.x on Linux.
Resolution: I have now fixed this in patch v4.12, though not perfectly (there now
seems to be an error of 1 or 2 pixels, similar to the bug #14 wobble). This bug does not
exist in DeerPark (and the 1 or 2 pixel error is also not there). I don't want to get into
back-porting DeerPark bug fixes to Firefox 1.0.x, therefore this fix will have to suffice
for Firefox 1.0.x.
- #33 Kannada rendering badly in Debian Linux 1.0.6+v4.12 patch
See this collection of files. The snapshot shows what the text should
look like, rendering correctly in gedit. The txt and html files contain the same text. The html text renders
badly on Linux.
Resolution: It seems this is a problem with the default Debian font, which renders
Kannada badly. To solve the problem, do this in Firefox 1.0.x:
- Make sure the Kedage font is installed.
- Go to Edit / Preferences / General / Fonts & Colours.
- Select 'Western' under 'Fonts for:' since Kannada is not listed.
- Select Kedage under 'Serif' and 'Sans-serif'.
- Check 'Always use my [X] Fonts'.
- Click OK. It should now work.
Outstanding bugs:
- #4 Does not work on Mac
Firefox doesn't work properly for Hebrew on Macintosh, but I think Safari might do.
I would consider fixing it, as I have a Mac, but please email me to let me know there is demand.
There is a Pango port for Macintosh
which could be used.
Someone tells me: "IIRC, the reason that Firefox doesn't do proper Hebrew diacritics on the Mac is that it isn't using ATSUI.
See this bug here:
Bug 121540 - Use ATSUI for text rendering on Mac OS X
You might also be interested in the related bug:
Bug 157967 - Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF
Both bugs together are blocking most Hebrew Mac issues- once they are
solved, Firefox with Hebrew on the Mac will be MUCH more usable."
I'll have a go at integrating one of the patches to be found there.
Note: My Bug #24 fix goes a long way towards fixing Bug 157967.
- #5 Printing is no good
Big gaps in print preview output, and it is also reversed on M-M text.
Printing Hebrew is broken in DeerPark, but the big gaps are not there any more.
- #8 Missing niqqud (vowels) on Windows98SE (Hebrew version)
I see no "niqqud" at all in the
cantillated Bible; that is, the cantillation marks are there, but only
dagesh of the vowel signs appears. Is this a quirk of my Win98SE, or do
you see what I see? [Works OK on Windows 2000.] (Screenshot)
19 Aug 2005: I have discovered that niqqud DOES appear on Win98SE-Hebrew
on the TanakhML page, but only when the Arial or Times New Roman font is selected explicitly.
So, it seems it has something to do with the handling of fonts. It should be possible
to fix this bug somehow.
- #10 Linux: Automatic selection of pango
We need to have it turn pango on or off depending on the
language, since some languages render better with Pango, and some
with Mozilla's default. We can't leave Pango on, because it is slow on Latin
text and this would annoy 95% of users. It could be done easily enough because
I have worked out how to ask Pango what shape engine it would use for a given
language.
- #13 Text selection sometimes selects diacritics and sometimes not
Selection of a character really ought always to select that
character and all its marks. At the moment, marks and letters are treated as
separate, and they appear and disappear at the ends of selected ranges.
This problem is considerably worse on Arabic, because it draws the letter in
the middle of the word like a final one when you select up to the middle of a
word. (See Test Pages below for an example.)
- #14 Justified text wobbles
On Deer Park (and to some extent on Firefox 1.x) justified Hebrew text wobbles slightly to the left and right when you
select it. On Linux this applies to Latin text also, but on Windows,
Latin text does not wobble. For Latin on Linux, this bug existed in Deer Park before my patch.
The effect is similar to bug #13 but it happens for a different reason.
- #15 Ezra SIL does not work on Linux
Linux doesn't seem to like it. You get squares with hex numbers in them
instead of characters. SBL-Hebrew font works nicely on Linux, though.
- #16 Windows rendering imperfections
These problems occur equally on Internet Explorer and Firefox. Not sure
what to do about these. Fixing by tweaking in Firefox would
be the wrong way to do it. Perhaps it would be bearable if the user could
switch that functionality off.
Genesis 1:1 (cantillated version) - qamets and accent mark on top of each other under aleph of last word.
Genesis 1:5 - final kaph + sheva, the sheva should be in the middle of the kaph, not below it.
- #20 Windows Deer Park+4.6: TanakhML page on patch on Windows, dageshes occupy a character of their own
Dageshes appear in the next character after the character they are supposed to be inside of.
Holams do it also. Known to happen with SBL Hebrew font.
This bug seems to only happen when SBL Hebrew font is selected explicitly, that is
you set the browser to override the page's font.
- #22 Linux Deer Park+4.6: Text selection area not quite tall enough to include
vowels.
The selection area is not quite tall enough to cover the vowels on some
fonts (SBL Hebrew is one example). OK on Windows.
- #23 Linux only: Selecting justified Hebrew text is dreadfully sluggish.
(Low priority.)
In fact, the whole thing is slow on Linux. I am pretty sure Pango (which
is slow anyway) is being asked to repeat the same task over and over. There
is much room for improvement in speed.
- #26 DeerPark+Firefox 1.0.x+4.12 patch on Windows: Justified Arabic text in Omnibus test case
doesn't fill the line
If you squash the window horizontally so the Arabic text wraps to the next line, it is
as if the spacing for justify has not been added. Does not occur on Linux.
"The Cow (justified)" test case works beautifully on this version.
- #31 DeerPark+4.12 patch on Windows: Overlapping characters and wrong text selection.
Does not occur on Linux Firefox 1.0.6+4.12 patch. May be the same as bug #26.
I'm currently using the binary you created and placed on the web site (Windows 1.6a1), and found another page that causes problems. It's from a real web site (an online text database), but I edited the page to the minimum which would illustrate the problem. Note the overlapping characters, as well as problems in text selection (when selecting text, the selection actually takes place to the right of the actual cursor location). I'm afraid I don't know enough about the html to tell if they're just doing something terrible and wrong, but in any event, it's out there. I can't give a direct URL for the original page, since it comes out of a database (http://cal1.cn.huc.edu), but if it would help, I can tell you how to navigate there.
Bug #31 test case
- #32 DeerPark+v4.12 patch on Windows: Hindi text breaks up into pieces when selected
Stephen: Visit http://tdil.mit.gov.in/hindi_site/ach-mat.htm and hopefully with
the patch applied the page will look okay. Ensure your computer system has
complex text rendering support for Indic scripts and that you have a Devanagari
font installed.
Now attempt to select some text. You'll notice that as you select the text, it
breaks up into pieces. Now look at
http://www.bbc.co.uk/hindi/sport/story/2005/10/051007_sania_qf.shtml and try to
select the text. As the text is NOT justified, the selecting the text does not
break it up into pieces. This is the desired form of selecting text
(technically the best option would be selecting syllables, but that's the least
of our worries in terms of Indic support on Firefox).
- #34 Bad Telugu rendering in Windows XP
Details here.
- #35 Bad Devanagari rendering in Windows XP with Mozilla Firefox 2.0
Details here.
Bugs I won't fix:
- #6 Firefox ignores css font
Firefox pretty much ignores the css's definition of the font. Really this
should only happen if the user has specifically requested it. There may be
some reason why this is so. Investigate and fix if it really is a bug.
NOT A BUG - In my testing on Windows, Firefox uses the css font if
"Always use my / Fonts" is not checked.
- #9 Looks very bad on Windows98SE (US English version)
Firefox output is not nice on Windows98SE - Internet Explorer is
better but not perfect. Likely the same cause as #8. (Screenshot)
- I did some experimentation, and I found that Windows 98SE has no apparent
native support for RTL, while Firefox's practice of reversing the text instead
does not make Windows render the text properly. The MSDN documentation states
that this support is present in the Hebrew version only. However, strangely, Internet Explorer
works (sort of). Unfortunately I have no way to find out HOW it works.
- I am told the Hebrew version of Windows 98SE is popular in Israel.
I know that Windows 98 is not so popular outside Israel.
So, I will concentrate on the Hebrew version of Windows 98.
People outside Israel will have to use newer versions of Windows. WON'T FIX
- #17 Unicode no-break space not working with justification
When using UNICODE no-break space (00A0) or zero-width no-break space (FEFF) instead of space (0020), justification algorithm gives surprising results: right-sided word is moved to page right margin!
Example on this page (Genesis 3): http://tanakhml2.alacartejava.net/cocoon/tanakhml/d11.php2xml?sfr=1&prq=3&psq=1&lvl=99
I have done some testing on this, and it seems to be the fancy HTML between the
words that is upsetting Firefox, not the characters mentioned. - 20 Jul 2005
This bug is fixed in DeerPark, and I never found the cause, so I will not
attempt to fix this on Firefox 1.x - WON'T FIX
- #18 On tanakhml project page, font for 'Western' is used for Hebrew
On the link for bug #17, if you try to change the font for 'Hebrew', it has no effect.
Setting the font for 'Western' to your font with cantillation marks (such as SBL Hebrew)
works.
It seems this behaviour is switched by <html lang="he"> at the top of
the document, which is not used in the tanakhml page. I will call this a
deficiency of the web page for now, though an argument for this being a Firefox
bug could be made. Anyone want to investigate further? WON'T FIX
- #19 Text is underlined when mouse is over body text of articles on http://www.hebrewtoday.com/
I am informed that this is a bug in the website.
- #28 Firefox 1.0.x+4.10 patch all systems: Title sometimes appears in Arabic, sometimes latin-character garbage
Compare http://www.iraq4all.dk/ (garbage) against
http://www.iraq4allnews.dk/
Thank you David Roth: "This seems to be a problem with the web page, not with Firefox. The <title> on that
page is not encoded as unicode, but rather as ISO-8859-6 (Arabic) or something like it.
At least, when I look at the source, that portion is garbage until I change the encoding
to ISO-8859-6, every character but one comes out looking like Arabic."
- #29 Scrollbars/tabs should be on the left instead of right for Hebrew/Arabic.
This is outside the scope of my work for now.
Test pages:
Hebrew: Here is the Mechon-Mamre text Brashit (the page in my screenshots).
Hebrew: Here is my mish-mash Hebrew test page.
Arabic: The Cow (right-aligned).
Arabic: The Cow (justified)
Arabic: Iraq4allnews.dk news item This shows bug #13 quite well.
Omnibus test case at Bug 297074
Hugo's Arabic test page
Links:
Context-dependent and Directional Text - Complex-Text Languages - a really good article on how the various complex-layout languages are complexly laid out.
Hebrew: The TanakhML project - biblical texts
Hebrew: Mechon-Mamre - biblical texts
Hebrew: Hebrew Today - news for Hebrew learners
Arabic: One Ummah Network Qur'an
Indic: Newspapers in Indic languages I cannot make the Hindi ones work at all (font? encoding?)
Multiple languages: Wikipedia
Firefox bugzilla bugs:
Bug 60546 - [BiDi] Unicode Hebrew/Yiddish Diacritics do not correctly align in some fonts.
Bug 297074 - Make nsRenderingContext::GetWidth optionally return an array of glyph widths
Bug 157967 - Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF
Bug 121540 - Use ATSUI for text rendering on Mac OS X (Macintosh)
Bug 240914 - Unicode combining characters (e.g. Devanagari, Tamil, Malayalam ) ruined with text-align: justify
Pango bugzilla bugs:
Bug 313781: Hebrew vowels rendered wrong because shaper font cache gets polluted (My bug report for bug #21 - CLOSED because it's fixed in pango-1.10.0)
This page is brought to you from Waikanae, New Zealand. Haere mai!