Abstract:
Utterance segmentation is an important task in conversational analysis process. It consists mainly on dividing discussions into functional segments that would be labelled with the corresponding dialogue acts. In this paper, we proposed a novel discriminative method based on Conditional Random Fields to automatically extract utterance boundaries within Arabic politic debates taken from Aljazeera broadcasts. Despite of the complexity of the used corpus that includes long utterances expressing opinions conflicts and arguments, learning results were very encouraging with a relevant f-score of 97.3%. Dialogue acts (DA) recognition consists mainly of two subtasks as segmentation and annotation. These two steps may be carried separately; segmentation followed by annotation or simultaneously at one joint step. In our work, we typically assumed that the true segmentation boundaries lead to better annotation results. As a consequence, a degradation of the perfor-mance due to imperfect segmentation boundaries is to be expected. Thus, we decided to carry out a sequential approach that separate the two subtasks of dialogue acts recognition framework.