Application of Large Language Models to Data Extraction from Text with an Unknown Structure

Abstract:

We present the results of applying Large Language Models to extract meteorological data from weather forecasts provided in a variety of formats. Utilized formats vary from short textual notes describing the forecast in the natural language to condensed tabular and numerical representations. All data sources are real meteorological systems and data samples used in the paper represent real weather forecasts. To ease the burden of further processing we assume that each LLM should generate a response in a given XML format. We show that the models tested in the paper succeed, with varying degrees of efficiency, in extracting basic data from the forecasts and in encoding it into an XML structure. Finally, we pinpoint main types of errors encountered in the transformation process.