A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques
|Title||A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques|
|Year of Publication||2012|
|Degree||M.Sc. Software Engineering|
Wikis are web-based software applications that allow users to collaboratively create and edit web page content, through a Web browser using a simplified syntax. The ease-of-use and “open” philosophy of wikis has brought them to the attention of organizations and online communities, leading to a wide-spread adoption as a simple and “quick” way of collaborative knowledge management. However, these characteristics of wiki systems can act as a double-edged sword: When wiki content is not properly structured, it can turn into a “tangle of links”, making navigation, organization and content retrieval difficult for their end-users. Since wiki content is mostly written in unstructured natural language, we believe that existing state-of-the-art techniques from the Natural Language Processing (NLP) and Semantic Computing domains can help mitigating these common problems when using wikis and improve their users’ experience by introducing new features. The challenge, however, is to find a solution for integrating novel semantic analysis algorithms into the multitude of existing wiki systems, without the need for modifying their engines. In this research work, we present a general architecture that allows wiki systems to benefit from NLP services made available through the Semantic Assistants framework – a service-oriented architecture for brokering NLP pipelines as web services. Our main contributions in this thesis include an analysis of wiki engines, the development of collaboration patterns between wikis and NLP, and the design of a cohesive integration architecture. As a concrete application, we deployed our integration to MediaWiki – the powerful wiki engine behind Wikipedia – to prove its practicability. Finally, we evaluate the usability and efficiency of our integration through a number of user studies we performed in real-world projects from various domains, including cultural heritage data management, software requirements engineering, and biomedical literature curation.