
LLMs Are Not a Cure-All: Practical Test for Music Classification Based on Metadata
The question was whether current Large Language Models (LLMs) such as GPT-4 or DeepSeek are able to automatically and reliably classify music tracks, specifically salsa songs, based on title, artist, lyrics, and metadata into „Salsa Cubana“ or „Salsa LĂnea“. It was known that the available information (metadata, genre tags, lyrics) is incomplete and partly inconsistent. The test was explicitly designed to determine the practical limits of today’s LLMs in this context. ...








