The Commercial Application Of Synthetic Data
Ethan Mollick, Associate Professor at The Wharton School said, “Today’s AI is the worst AI you will ever use.”
The latest research conducted by Forethought reviewed paired studies of two data sources: Human data collected via an online survey panel and Synthetic data generated by GPT-4o’s knowledge base via OpenAI. A primary objective was to evaluate whether GPT-4o could replicate consumer choice data accurately and reliably through Synthetic data generation.
We found that GPT-4o was capable of mimicking broad consumer patterns found in Human data, which is reflected in a high correlation in the rank-ordering of preferences for attributes across Synthetic and Human data. In contexts where the overall pattern is the focus, Forethought expects that for relatively superficial exploratory analysis such as cross-tabulations, LLMs will be become popular and indeed, a default research tool.
However, detailed analysis revealed that Synthetic data struggled with the nuanced behavioural insight essential for accurate business decision-making. Synthetic data lacked variability, which led to the inability to provide meaningful market segments. It was not yet suitable for managerial decisions. Ongoing advancements of LLMs and the application of Retrieval Augmented Generation (RAG), indicate uncharted territory in future GenAI capabilities. Forethought will continue to investigate these advancements, exploring new ways to integrate GenAI driven market research in hybrid applications. Pioneering this new frontier is BrandComms.AI™.