On Sept 15, a group from Harvard Business School released an SSRN preprint of an article from a study conducted using 758 consultants from Boston Consulting Group as randomized test subjects for the question: Will GPT-4 reshape work? The resulting report is hard evidence for both the business case at your firm and for your clients. Consultants using GPT-4 were significantly more productive and produced significantly higher quality results. One of the paper’s lead authors wrote up a good summary of the paper for his Substack. Click through to see some clear graphs.
Here are some key excerpts from the summary:
The Study Setup
To test the true impact of AI on knowledge work, we took hundreds of consultants and randomized whether they were allowed to use AI. We gave those who were allowed to use AI access to GPT-4, the same model everyone in 169 countries can access.
We then [] asked consultants to do a wide variety of work for a fictional shoe company, … selected to accurately represent what consultants do:
creative tasks (“Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”),
analytical tasks (“Segment the footwear industry market based on users.”),
writing and marketing tasks (“Draft a press release marketing copy for your product.”), and
persuasiveness tasks (“Pen an inspirational memo to employees detailing why your product would outshine competitors.”).
Big Impact Results
Consultants using AI finished 12.2% more tasks on average, completed tasks 25.1% more quickly, and produced 40% higher quality results than those without. Those are some very big impacts.
Human Skill Leveller
We also found something else interesting: … AI works as a skill leveler.
The consultants who scored the worst when we assessed them at the start of the experiment had the biggest jump in their performance, 43%.
The top consultants still got a boost, but less of one.
I do not think enough people are considering what it means when a technology raises all workers to the top tiers of performance.… skill levelling is going to have a big impact.
Humans Fall Asleep at the Wheel
[T]here is more … we identified a task that used the blind spots of AI to ensure it would give a wrong, but convincing, answer to a problem that humans would be able to solve.
Indeed, human consultants got the problem right 84% of the time without AI help, but when consultants used the AI, they did worse – only getting it right 60-70% of the time.
When the AI is very good, humans have no reason to work hard and pay attention. They let the AI take over, instead of using it as a tool…. ‘falling asleep at the wheel’ [] can hurt human learning, skill development, and productivity.