import { FunctionComponent, useEffect } from 'react';
import { Container } from '../../pieces/container';
import { Image } from '../../pieces/image';

import { BlogEntryHeader } from '../blog-entry-header';
import { BlogEntryParagraphBox } from '../blog-entry-paragraph-box';

import headerIcon from './icon.png';
import arabicDebugginImage from './arabic-debugging-leaderboard.png';
import hyundaiRaceResultsImage from './hyundai-race-results-arabic.png';
import facebookLoginImage from './facebook-login-language-layouts.png';
import { WebLink } from '../../pieces/web-link';

export const ArabicTextRendering: FunctionComponent = () => {
  useEffect(() => {
    document.title = "Arabic | Joe Gosselin";
 }, []);

  return (
    <div>
      <Container>
        <BlogEntryHeader
          iconPath={headerIcon}
          title="Arabic"
          description="Updating a game engine to support rendering text for right-to-left languages like Arabic."
          dateString="2/27/2018"
        />
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          Let's start with English first and what we already know. English is read from left-to-right(LTR) and when reading the data from a text file the characters will be in the same order. The English alphabet has 26 lowercase and 26 uppercase characters. There are contractions like "don't" from "do not" but contractions are optional.
        </BlogEntryParagraphBox>
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          Arabic is read right-to-left(RTL) but the order of the data in memory doesn't change. Arabic has many more characters than English. These characters are referred to as "glyphs". Each glyph that displays is dependent upon the glyphs around it. This is joining.
        </BlogEntryParagraphBox>
      </Container>

      <Container className="container">
        <span>Arabic joining glyphs come in four forms: isolated, initial, medial, and final.</span>
          <ul>
            <li>Isolated: displays when not joining on either side.</li>
            <li>Medial: displays when joining in both directions.</li>
            <li>Initial: displays when joining to the left but not right.</li>
            <li>Final: displays when only joining to the right.</li>
          </ul>
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          Joining reminds me of English contractions but joining is not optional. Another difference is that the Unicode values from the text file will not give you the value of the joining glyphs. The values in the text file are for what I like to call the root glyphs. We start with the root glyph and determine which joining glyph to display depending on the root glyphs surrounding it. The Arabic Shaping link below contains data about which glyphs join and in what fashion.
        </BlogEntryParagraphBox>
      </Container>

      <Container centerText>
        <Image
          className="blog-entry-image"
          src={arabicDebugginImage}
          caption="I knew that verifying the results would be a challenge. Without a native speaker available, the next best thing was to copy paragraphs from random Wikipedia pages and check that it appears identically. Once I believed that it was working as expected, we sent some screenshots to our translation service to double check the results."
        />
      </Container>

      <Container centerText>
        <Image
          className="blog-entry-image"
          src={hyundaiRaceResultsImage}
          caption="This is an early screenshot that was captured while trying to correct the column headers."
        />
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          The Arabic Shaping file contains everything you need for determining joining but I recommend that you process it ahead of time and to create a lookup for use within your program/game for better performance. This is exactly what I did. I wrote a small C# utility that opened the ArabicShaping.txt file and generated an INI file for faster lookup at runtime.
        </BlogEntryParagraphBox>
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          At this point things were looking good. The next problem to tackle was mixing this RTL language with characters from a LTR language. It would look funny if an advertisement for Hyundai in Arabic wrote "Hyundai" as "iadnuyH". Since the string "Hyundai" doesn't change its order in memory then we have compensate in how we render it. Also, we wouldn't want to just re-order the characters because it would invalidate any kerning info. My solution was to measure how much space the string was going to occupy, offset our rendering cursor's X-position by that amount, and reverse our rendering direction from RTL to LTR.
        </BlogEntryParagraphBox>
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          What we just covered was one depth of reordering. But what if we encounter text like this where the capitalized letters are Arabic and the lowercase letters are English?
          <br /><br />
          <span style={{ display: 'inline-block', width: '100%', textAlign: 'center', fontStyle: 'italic'}}>Mustafa said, "I HEARD KATIE SAY 'Don't write JACK WAS HERE on the desk' YESTERDAY."</span>
        </BlogEntryParagraphBox>
      </Container>

      <Container>
        <BlogEntryParagraphBox>
          There is more than one depth there which means more than one instance of re-ordering. The Unicode Bidirectional Algorithm describes supporting up to 125 levels of reordering. Could you imagine trying to read something that requires you to jump back and forth within the same sentence 20 times to understand the message? What about 100 times? The situation quickly becomes comical when imagining a situation beyond just a few levels. I was updating a game engine to render Arabic text so I made the call that supporting more than one depth of reordering was unnecessary for our needs. My thinking was that we could always refactor the code later to support more levels of reordering but it wasn't worth the time at that moment.
        </BlogEntryParagraphBox>
      </Container>
      
      <Container centerText>
        <Image
          className="blog-entry-image"
          src={facebookLoginImage}
          caption="Rendering a language from right-to-left is not the whole picture. You may need to support an alternate layout."
        />
      </Container>

      <Container className="container">
        More info:
          <ul>
            <li>
              UNICODE BIDIRECTIONAL ALGORITHM&nbsp;
              <WebLink
                linkText="Link"
                url="https://unicode.org/reports/tr9/"
              />
            </li>
            <li>
              Unicode Utilities: UnicodeSet for Arabic&nbsp;
              <WebLink
                linkText="Link"
                url="https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AArabic%3A%5D&g="
              />
            </li>
            <li>
              Arabic Shaping&nbsp;
              <WebLink
                linkText="Link"
                url="https://www.unicode.org/Public/UNIDATA/ArabicShaping.txt"
              />
            </li>
            <li>
              A Rosetta Stone for a lost language - (Start at 6:40)&nbsp;
              <WebLink
                linkText="Link"
                url="https://www.ted.com/talks/rajesh_rao_computing_a_rosetta_stone_for_the_indus_script#t-385850"
              />
            </li>
          </ul>
      </Container>
    </div>
  );
};
