Abstract:
Data contained in Online Social Media (OSM) platforms represents a huge source of information that is widely used by several entities to extract valuable information in many domains. These OSM permit sharing ideas, thoughts and news with concise and precise content in text, image or video format. One of the Artificial Intelligence fields developed around OSM is sentiment analysis that identifies the emotion a certain user can develop towards a certain topic. Implementing a sentiment analysis platform to detect the sentiment in a given comment requires developing Machine and Deep Learning algorithms and collecting a huge amount of data to train. In this paper, we propose a special dataset that can be used to identify the emotion such as happiness, anger, neutrality and sadness and train Machine Learning algorithms. This dataset corresponds to a specific language which is the Tunisian dialect. We also provide the whole pipeline used to clean and preprocess the collected data before feeding Machine Learning models.