Most of us have at some point faced data of different sorts in a spreadsheet, from expense reports to medical records; this is what is called tabular data. Since the creation of data science, different tools have been developed to analyze and further learn from this information. What was once done by hand nowadays is executed by methods based on artificial intelligence and machine learning, with much higher predicting efficacy.
Though better performing, many of the current methods still struggle with characteristics that are intrinsic to tabular datasets, such as the heterogeneity of the data and missing data itself. To overcome these challenges, a team of researchers, among them, our doctoral researcher, Lennart Purucker, led by our PI, Frank Hutter, developed a model called Tabular Prior-data Fitted Network (TabPFN).
As with pro athletes, what sets the TabPFN algorithm apart is the sheer amount of training that it is subjected to. The transformer-based neural network is trained on millions of synthetic (i.e., artificially created) tabular datasets, and the results speak for themselves. TabPFN managed to outperform all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time. The results have now been published in Nature. You can read the full paper here.
For more information, please see the press release from the University of Freiburg here (German language only).
We congratulate our researchers on this outstanding work and look forward to the future applications of TabPFN across different fields!
Institute of Medical Biometry and Statistics,
Faculty of Medicine and Medical Center –
University of Freiburg