Polish has unexpectedly emerged as a powerhouse language for artificial intelligence. A new benchmark developed by researchers at the University of Maryland, Microsoft, and UMass Amherst shows that when AI models are pushed into long-context tasks, Polish outperforms 25 other languages, including English and Chinese. The test, called ONERULER, evaluated how well major systems from OpenAI, Google, Meta, Qwen, and DeepSeek could retrieve and synthesize information across documents stretching up to 128,000 tokens.
The results flip long-held assumptions about linguistic dominance in machine learning. English and Chinese may saturate global training data, but abundant data does not guarantee deeper comprehension. Under heavy context loads, models handled Polish with an average accuracy of 88 percent, placing English sixth and Chinese near the bottom. Slavic and Romance languages consistently scored well, hinting that inflected grammar, Latin or Cyrillic scripts, and more regular syntactic patterns may help models track meaning across long passages.
That advantage becomes even clearer in demanding “needle-in-a-haystack” tasks, where systems must surface a single buried detail from a book-length text. Polish not only held its lead but widened it, suggesting that the structure of a language can shape how effectively a model encodes relationships within sprawling inputs. Meanwhile, low-resource languages such as Swahili and Sesotho struggled, and Chinese models showed particular difficulty, revealing how tokenization and writing systems influence model behavior.
The findings arrive at a moment when Poland is investing heavily in national AI efforts, including its own large language model, PLLuM. The study underscores a broader lesson for the field: multilingual diversity is not just a cultural goal, it is a technical one, and languages with smaller global footprints may hold surprising advantages for the next generation of AI.
Source: 10.48550/arXiv.2503.01996
The results flip long-held assumptions about linguistic dominance in machine learning. English and Chinese may saturate global training data, but abundant data does not guarantee deeper comprehension. Under heavy context loads, models handled Polish with an average accuracy of 88 percent, placing English sixth and Chinese near the bottom. Slavic and Romance languages consistently scored well, hinting that inflected grammar, Latin or Cyrillic scripts, and more regular syntactic patterns may help models track meaning across long passages.
That advantage becomes even clearer in demanding “needle-in-a-haystack” tasks, where systems must surface a single buried detail from a book-length text. Polish not only held its lead but widened it, suggesting that the structure of a language can shape how effectively a model encodes relationships within sprawling inputs. Meanwhile, low-resource languages such as Swahili and Sesotho struggled, and Chinese models showed particular difficulty, revealing how tokenization and writing systems influence model behavior.
The findings arrive at a moment when Poland is investing heavily in national AI efforts, including its own large language model, PLLuM. The study underscores a broader lesson for the field: multilingual diversity is not just a cultural goal, it is a technical one, and languages with smaller global footprints may hold surprising advantages for the next generation of AI.
Source: 10.48550/arXiv.2503.01996
Polish has unexpectedly emerged as a powerhouse language for artificial intelligence. A new benchmark developed by researchers at the University of Maryland, Microsoft, and UMass Amherst shows that when AI models are pushed into long-context tasks, Polish outperforms 25 other languages, including English and Chinese. The test, called ONERULER, evaluated how well major systems from OpenAI, Google, Meta, Qwen, and DeepSeek could retrieve and synthesize information across documents stretching up to 128,000 tokens.
The results flip long-held assumptions about linguistic dominance in machine learning. English and Chinese may saturate global training data, but abundant data does not guarantee deeper comprehension. Under heavy context loads, models handled Polish with an average accuracy of 88 percent, placing English sixth and Chinese near the bottom. Slavic and Romance languages consistently scored well, hinting that inflected grammar, Latin or Cyrillic scripts, and more regular syntactic patterns may help models track meaning across long passages.
That advantage becomes even clearer in demanding “needle-in-a-haystack” tasks, where systems must surface a single buried detail from a book-length text. Polish not only held its lead but widened it, suggesting that the structure of a language can shape how effectively a model encodes relationships within sprawling inputs. Meanwhile, low-resource languages such as Swahili and Sesotho struggled, and Chinese models showed particular difficulty, revealing how tokenization and writing systems influence model behavior.
The findings arrive at a moment when Poland is investing heavily in national AI efforts, including its own large language model, PLLuM. The study underscores a broader lesson for the field: multilingual diversity is not just a cultural goal, it is a technical one, and languages with smaller global footprints may hold surprising advantages for the next generation of AI.
Source: 10.48550/arXiv.2503.01996
·194 Ansichten
·0 Bewertungen