Sakhr is the pioneer in Arabic natural language processing (NLP) technologies, with 28+ years of research and a corpus of billions of words. This gives Sakhr considerable proprietary advantages and patents in understanding the Arabic language, its construction and its complexity. Sakhr NLP produces outstanding accuracy in its language systems for:
- Machine Translation (MT)
- Speech Recognition (ASR)
- Speech Synthesis or Text-to-Speech (TTS)
- Optical Character Recognition (OCR)
- Search & Text Mining
Sakhr is the only MENA company to hold multiple US patents for Arabic computational linguistics. Through strategic partnerships with leading research institutions, Sakhr is able to advance the state-of-the-art and create innovative technologies, like its bi-directional Arabic-English translator for mobile devices.
Complexities of the Arabic language
Beyond the traditional challenges of natural language processing for English, there are unique complexities for the Arabic language. Sakhr NLP has overcome these complexities in terms of processing and translation:
Lack of diacritical marks
Unlike many other languages, Arabic text is usually presented without vowels, which are indicated by diacritical marks placed above or below the character. Diacritics are no longer included in printed or electronic Arabic text. Therefore, understanding and translating Arabic text requires inputting diacritics to determine the words to be translated (MT), spoken (TTS) or indexed (Search). A combination of 3 words in Arabic can present 200+ alternatives of different meaning.
Free word order
Arabic is a free word order language, where sentence constituents can be swapped without affecting structure or meaning. This results in more ambiguity at the syntactic level, and requires more complex analysis (than English, for example). This also results in semantic ambiguity, with each morphological analysis having more than one meaning.
In regular usage, the Arabic language has scarce punctuation. Although the Arabic language has punctuation marks, written Arabic rarely contains these punctuations. Therefore, paragraph-long sentences that require automatic sentence segmentation prior to analysis are common.
Other issues that add to the complexity of the Arabic language include the right-to-left direction of the text, inflectional writing, cursive writing, presence of extra non-significant characters, and more.