Leave it to the BIS to once again bring some sanity to GenAI hype.
The BIS found a pattern of LLM failure that made it clear that LLMs fall short of the “rigor and clarity” for use in central banking!
The same applies to retail banking if tasks with reasoning are required!
👉TAKEAWAYS
When posed with a logical puzzle that demands reasoning about the knowledge of others and about counterfactuals, large language models (LLMs) display a distinctive and revealing pattern of failure.
The LLM performs flawlessly when presented with the original wording of the puzzle available on the internet but performs poorly when incidental details are changed, suggestive of a lack of true understanding of the underlying logic.
Our findings do not detract from the considerable progress in central bank applications of machine learning to data management, macro analysis and regulation/supervision. However, they suggest that caution should be exercised in deploying LLMs in contexts that demand rigorous reasoning in economic analysis.
At its heart, LLMs hallucinate because they are simply trained to predict a “statistically plausible” continuation of the input (hence why their outputs superficially sound quite convincing). But what is most statistically plausible at a linguistic level is not necessarily factually correct, especially if it involves computation or logical reasoning of some sort.
Martin Luk, Man Group
👊STRAIGHT TALK👊
The BIS once again steps into the fray to deflate hype!
You have to give them credit for taking on the hype mongers by easily showing how ChatGPT screws up, solving even a basic puzzle!
The results of the LLM solving Cheryl’s birthday puzzle (see PDF or Wikipedia) show LLM's shortcomings.
The LLM solved the puzzle flawlessly when presented in standard form. Once changes were made in names and dates, the LLM failed to find the solution and gave amusing answers!
When the puzzle dates were changed, the GPT4 answered that the month was June, as in the original puzzle, even though the months were changed not to include June! Hallucinations!
This doesn’t mean that LLMs aren’t useful, just not in applications that require reasoning ability.
This means that I wouldn’t expect them to be used for making credit determinations at your bank anytime soon.
The BIS makes it clear that LLMs don’t think:
“The evidence so far is that the current generation of LLMs falls short of the rigour and clarity in reasoning required for the highstakes analyses needed for central banking applications.”
“The main limitation of LLMs derives from their exclusive reliance on language as the medium of knowledge, without the tacit knowledge that goes beyond language. ….they lack the non-linguistic, shared understanding of the world that can only be acquired through active engagement with the real world.”
“These limitations come to the fore when reasoning using counterfactuals. Statements of the form: “p is false, but if it were true, then q would also be true.” Such statements draw on a web of beliefs that draw on tacit knowledge, including that acquired through interactions with the physical world.”
Subscribing is free!
The button says pledge, but Substack adds that not me.
Don’t be afraid to click!