Introduction

It has long been known that using bag-of-words as a document representation for texts can achieve good results in text categorization. Attemps with more advanced approaches using proper nouns, complex nomials and word-senses has not resulted in any conclusive improvements.

In my thesis, I examine the potential benefits of using features taken from the output of syntactic and semantic parsers to enhance a bag-of-words representation. These new representations is then evaluated on reuters large corpus of journalistic articles with a SVM classifier against the bag-of-words baseline.