Piers Fawkes’ Post

View profile for Piers Fawkes, graphic

AI Strategy, Services & Agents

One of challenges that we're coming up against with our AI projects is a human's ability/skills/speed to judge the output. For example, if we processed 150 sixty-minute SXSW talks with an AI agent that also built an in-depth written trends report about the event - who should or could review it for accuracy? Today, we build human review into many of the AI processes built for clients by our Broadmind team. There are some good reasons to do so - (a) most outputs are designed for humans and so humans intuitively know what framing works best and (b) years of creating this work 'by brain' helps judge the quality of thought from the machine. Other reasons we're being asked to add a human is because people want to learn about AI to skill up or because they've been told to be worried about the quality of AI output. It's very easy for a human to judge the work of an AI agent that runs a single stream process. When an assistant provides their analysis of 1 SXSW talk, it's pretty straight forward for a human to scan it to check if it's spot on. This could be the case for a collection of 5-6 similar topic talks too. When you ask a machine to analyze 150 talks, you're not doing that in 1 prompt. You're probably running 150 analysis processes first, then you start running pattern recognition across that, then you might ask it to check its work, then you ask what the sections of a trend report would be if it was to write one about the talks, then you ask it to summarize those sections and explain which talks refer to which sections, then you ask it to write those sections by referring to the content of the identified talks, then you're running pattern recognition across the content to start building higher-level takeaways, then you'll ask it to compare its output with a previous report, and so on. (That process is not unlike the process a team of humans would take. But it takes about 15 minutes for the machine to do it if I've built an AI agent that has automated the flow.) So... the machine provides a detailed analysis of SXSW in minutes and your boss wants to share that with the world? Which human has the ability, experience, knowledge to judge whether that work is right? This is a problem we've run in to for a project with SAP's Enterprise Architects community. We built a process with AI agents that created a multi-chapter book from the 50 talks and decks. When we asked for experts to review each chapter they get stuck - the cross-topic analysis that the machine has made for each chapter goes beyond the expert's area of expertise. This does make me think about human skilling. We're worried #AI is going to take our jobs - but maybe some of our jobs will shift from makers to quality controllers. Possibly, we will need to provide humans with both the output and summaries of the input and analysis that were made by the #genAI agent along the way. What else could we do to keep humans involved to create better outputs that we can trust? (Thoughts please)

  • No alternative text description for this image

What I’ve come across is that people using AI that has saved them 20 hours of work are not willing to go through 20 minutes of checking. It seems that once the machine has been through it we can only expect perfection. I really think the process of quality outputs in AI has to be linked to stepped processes that you can validate in each progression.

James Colistra

Founder @ Wonderfish | Ex-Forbes | Hot Sauce Ambassador | 2x w3 Award Winner | Retired Tee-Ball Assistant Coach

4w

Interesting, maybe you have multiple experts each review portions of the content that align with their specific areas of expertise? Like a cross-functional team of SMEs performing QA. Also, in the long-term, can the human feedback can be used to further train the AI, creating a better output for the future?

Neil Redding

Near Futurist | Keynote Speaker | Spatial Computing | AI | Convergent Commerce | The Ecosystem Paradigm | Founder and CEO, Redding Futures | Advisor | Investor | Global

3w

In the case of SXSW talks, each has an author/presenter that could review and validate the #AI output related to their talk, with sufficient motivation. Maybe sufficient would be being quoted and promoted correctly, or avoiding misquoting/misrepresentation. The motivation is key though; it probably doesn’t scale without the ability to delegate it…

See more comments

To view or add a comment, sign in

Explore topics