ETH Zurich and EPFL to release a LLM developed on public infrastructure

655 points

by

@andy99

|

July 11th, 2025 at 6:45pm

@isusmelj

July 11th, 2025 at 8:30pm

I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

@k__

July 11th, 2025 at 7:32pm

"respecting web crawling opt-outs during data acquisition produces virtually no performance degradation"

Great to read that!

@defraudbah

July 12th, 2025 at 8:09am

ETH Zurich is doing so many amazing things that I want to go study there. Unbelievable how many great people are coming from that university

@bee_rider

July 11th, 2025 at 8:16pm

Is this setting the bar for dataset transparency? It seems like a significant step forward. Assuming it works out, that is.

They missed an opportunity though. They should have called their machine the AIps (AI Petaflops Supercomputer).

@WeirderScience

July 11th, 2025 at 8:05pm

The open training data is a huge differentiator. Is this the first truly open dataset of this scale? Prior efforts like The Pile were valuable, but had limitations. Curious to see how reproducible the training is.

@sschueller

July 12th, 2025 at 6:42pm

Yet, Switzerland was put in the 2. Tier list[1] of countries that can get unlimited access to the top AI chips.

[1] https://www.bluewin.ch/en/news/usa-restricts-swiss-access-to...

[2] https://chplusplus.org/u-s-export-controls-on-ai-chips/

@amelius

July 11th, 2025 at 10:11pm

Yeah, that's what "democratizing AI" means.

@oytis

July 11th, 2025 at 8:08pm

The press release talks a lot about how it was done, but very little about how capabilities compare to other open models.

@kisamoto

July 12th, 2025 at 7:22pm

Any info on context length or comparable performance? Press release is unfortunately lacking on technical details.

Also I'm curious if there was any reason to make such a PR without actually releasing the model (due Summer)? What's the delay? Or rather what was the motivation for a PR?

@seydor

July 12th, 2025 at 5:38am

I wonder if multilingual llms are better or worse compared a single language model

@hubraumhugo

July 11th, 2025 at 8:51pm

Pretty proud to see this at the top of HN as a Swiss (and I know many are lurking here!). These two universities produce world-class founders, researchers, and engineers. Yet, we always stay in the shadow of the US. With our top-tier public infrastructure, education, and political stability (+ neutrality), we have a unqiue opportunity to build something exceptional in the open LLM space.

@rkrisztian

July 13th, 2025 at 1:08am

I'm disappointed. 8B is too low for GPUs with 16 GB VRAM (which is still common in affordable PCs), where most 13B to 16B models could still be easily run, depending on the quantization.

@wood_spirit

July 11th, 2025 at 8:11pm

The article says

“ Open LLMs are increasingly viewed as credible alternatives to commercial systems, most of which are developed behind closed doors in the United States or China”

It is obvious that the companies producing big LLMs today have the incentive to try to enshitify them. Trying to get subscriptions at the same time as trying to do product placement ads etc. Worse, some already have political biases they promote.

It would be wonderful if a partnership between academia and government in Europe can do a public good search and AI that endeavours to serve the user over the company.

@Tepix

July 12th, 2025 at 5:39pm

How does it compare to Teuken and EuroLLM?

@Bengalilol

July 11th, 2025 at 7:49pm

Looking forward to proof test it.

@mukeshyadavnitt

July 12th, 2025 at 7:11am

nice

@westurner

July 12th, 2025 at 2:00am

Use case for science and code LLMs: Superhydrodynamic gravity (SQR / SQG, )

LLMs do seem to favor general relativity but probably would've favored classical mechanics at the time given the training corpora.

Not-yet unified: Quantum gravity, QFT, "A unified model must: " https://news.ycombinator.com/item?id=44289148

Will be interested to see how this model responds to currently unresolvable issues in physics. Is it an open or a closed world mentality and/or a conditioned disclaimer which encourages progress?

What are the current benchmarks?

From https://news.ycombinator.com/item?id=42899805 re: "Large Language Models for Mathematicians" (2023) :

> Benchmarks for math and physics LLMs: FrontierMath, TheoremQA, Multi SWE-bench: https://news.ycombinator.com/item?id=42097683

Multi-SWE-bench: A Multi-Lingual and Multi-Modal GitHub Issue Resolving Benchmark: https://multi-swe-bench.github.io/

Add'l LLM benchmarks and awesome lists: https://news.ycombinator.com/item?id=44485226

Microsoft has a new datacenter that you don't have to keep adding water to; which spares the aquifers.

How to use this LLM to solve energy and sustainability problems all LLMs exacerbate? Solutions for the Global Goals, hopefully

@nektro

July 12th, 2025 at 12:34am

gross use of public infrastructure

@greenavocado

July 11th, 2025 at 8:00pm

Why would you announce this without a release? Be honest.

@contrarian1234

July 12th, 2025 at 7:23am

This seems like the equivalent of a university designing an ICE car...

What does anyone get out of this when we have open weight models already ?

Are they going to do very innovative AI research that companies wouldn't dare try/fund? Seems unlikely ..

Is it a moonshot huge project that no single company could fund..? Not that either

If it's just a little fun to train the next generation of LLM researchers.. Then you might as well just make a small scale toy instead of using up a super computer center