【英文】剑桥AI全景报告：文本生成图像掀起新风暴（114页）

英文研究报告 2022年11月07日 06:34 管理员

Google trained its (pre-trained) LLM PaLM on an additional 118GB dataset of scientifific papers from arXiv and web pages using LaTeX and MathJax. Using chain of thought prompting (including intermediate reasoning steps in prompts rather than the fifinal answer only) and other techniques like majority voting, Minerva improves the SOTA on most datasets by at least double digit pct points. Minerva only uses a language model and doesn’t explicitly encode formal mathematics. It is more flflexible but can only be automatically evaluated on its fifinal answer rather than its whole reasoning, which might justify some score inflflation.

In contrast, OpenAI built a (transformer-based) theorem prover built in the Lean formal environment. Different versions of their model were able to solve a number of problems from AMC12 (26), AIME (6) and IMO (2) (increasing order of diffificulty).Only 66% of machine learning benchmarks have received more than 3 results at different time points, and many are solved or saturated soon after their release. BIG (Beyond the Imitation Game), a new benchmark designed by 444 authors across 132 institutions, aims to challenge current and future language models.DeepMind revisited LM scaling laws and found that current LMs are signifificantly undertrained: they’re not trained on enough data given their large size. They train Chinchilla, a 4x smaller version of their Gopher, on 4.6x more data, and fifind that Chinchilla outperforms Gopher and other large models on BIG-bench.