After exploring OCR a bit more and ruminating on potential projects for the Arab world, two projects arose as particularly intriguing ideas. The first would mirror the digital project Linguistic Landscapes of Beirut but in the Emirates. As multinational as the Emirates is, linguistic data, drawing from either ambience recordings of conversations (which may be illegal) or available texts (in the form of signs, newspapers, etc.), could cast a light on the nature of the Emirate’s multinationalism. For instance, the much older nations of France and the United States debate whether their nations have become assimilationist with all immigrants conforming to the national ideal, “melting pots” where immigrants assimilate to some extent and influence their new society, or “salad bowls” with distinct cultures. The persistence of native languages in daily conversation or leisure reading (newspapers) would reflect the identities of the foreigners in the Emirates. Such a project may be particularly appealing because it can be crowd-sourced, both in terms of the raw data–photos–and the analysis–determination of the language in the photo. Depending on the results, or possibly even independent of the results, the largest problem would likely be legal, as the Emirates and other Gulf nations don’t tend to enjoy other peoples analyzing their identity or the identities of people within their countries (no offense to them for this decision, of course). Another challenge would be the language for publicity and the interface to provide data. Indeed, the utilization of English or Arabic alone for the interface may prevent those living mostly in their native language and culture from discovering the project and participating, thereby biasing the results.
An alternative idea would be to create a corpus of unknown or hard-to-access Arabic literature. Indeed, from foreign nations, the US for example, foreign texts are not often readily available to begin with and Arabic novels are particularly difficult to find. Consequently, running a project to digitize Arabic texts, as is being done for early English literature through TypeWright, could facilitate the acquisition, analysis and ultimately translation of Arabic texts for other scholars and eventually, inshallah, a non-Arabic speaking public.