Arabic language poses several challenges faced by Natural Language Processing (NLP), largely due to the fact that Arabic language has a very rich and sophisticated morphological system. Opensooq is covering cover some of the challenges and how to solve them with Solr and will also present the challenges that were handled in Opensooq’s use case.
Opensooq shared its experience of tackling Arabic language with Solr. Arabic poses many challenges for retrieval. When dealing with social media, such as microblogs, and informal web content, such as forums and discussions, introduces further complications. These complications relate to the use of dialects, text decorations, abbreviations, and a Romanized version of Arabic commonly referred to as Arabizi. The retrieval of Arabic content in other modalities such as speech and printed text is affected by the orthographic and phonological properties of Arabic.