Abstract:
This article offers a practical reflection on the use of machine learning (ML) across disciplines, based on the author’s experiences in both natural and social sciences. It highlights the opportunities ML provides for uncovering patterns and generating insights, particularly with small or moderately sized datasets. Emphasis is placed on fundamental principles, including careful data preparation, balancing model complexity with dataset size, and rigorous evaluation on unseen data to ensure generalization. Key challenges,such as data leakage, insufficient sample sizes, misuse of default models, and misconceptions of ML as an automatic solution, are illustrated with examples. The article also demonstrates how interpretability techniques, including SHAP, can enhance understanding of model decisions. Overall, it aims to guide researchers in effectively leveraging ML in scientific investigations while avoiding common pitfalls and unrealistic expectations.
