Visual Text Analytics Approach Based Text Categorization


Finding information in textual documents or categorizing the latter are well established, and operate in the heart of any search engine. However, such tasks are hidden to users and work in the background which does not help experts and analysts in a given field to explore hidden knowledge. In this investigation, we propose two visual analytics methods that can be either used in text categorization or text analytics. In particular, we propose two graphical approaches based on Tf-Idf weighting technique, wherein the nodes contain the word frequencies, while the edges represent the word successions and their frequencies. The two methods are evaluated and compared with a baseline system on in-house corpus, i.e. ANTSIX corpus, which contains several Arabic texts collected over different discussion forums related to six topics. The experimental results showed quite interesting performances, reaching about 98% of accuracy.

3ème journées du Laboratoire de Communication parlées et traitement de signal, USTHB