The Pragmatic Engineer
The Pragmatic Engineer
AI tools for software engineers, but without the hype – with Simon Willison (co-creator of Django)
0:00
Current time: 0:00 / Total time: -1:12:43
-1:12:43

AI tools for software engineers, but without the hype – with Simon Willison (co-creator of Django)

Ways to use LLMs efficiently, as a software engineer, common misconceptions about them, and tips/hacks to better interact with GenAI tools. The first episode of The Pragmatic Engineer Podcast

The first episode of The Pragmatic Engineer Podcast is out. Expect similar episodes every other Wednesday. You can add the podcast in your favorite podcast player, and have future episodes downloaded automatically.

Listen now on Apple, Spotify, and YouTube.

Brought to you by:

Codeium: ​​Join the 700K+ developers using the IT-approved AI-powered code assistant.

TLDR: Keep up with tech in 5 minutes

On the first episode of the Pragmatic Engineer Podcast, I am joined by Simon Willison.

Simon is one of the best-known software engineers experimenting with LLMs to boost his own productivity: he’s been doing this for more than three years, blogging about it in the open.

Simon is the creator of Datasette, an open-source tool for exploring and publishing data. He works full-time developing open-source tools for data journalism, centered on Datasette and SQLite. Previously, he was an engineering director at Eventbrite, joining through the acquisition of Lanyrd, a Y Combinator startup he co-founded in 2010. Simon is also a co-creator of the Django Web Framework. He has been blogging about web development since the early 2000s.

In today’s conversation, we dive deep into the realm of Gen AI and talk about the following: 

  • Simon’s initial experiments with LLMs and coding tools

  • Why fine-tuning is generally a waste of time—and when it’s not

  • RAG: an overview

  • Interacting with GPTs voice mode

  • Simon’s day-to-day LLM stack

  • Common misconceptions about LLMs and ethical gray areas 

  • How Simon’s productivity has increased and his generally optimistic view on these tools

  • Tips, tricks, and hacks for interacting with GenAI tools

  • And more!

I hope you enjoy this epsiode.

In this episode, we cover:

(02:15) Welcome

(05:28) Simon’s ‘scary’ experience with ChatGPT

(10:58) Simon’s initial experiments with LLMs and coding tools

(12:21) The languages that LLMs excel at

(14:50) To start LLMs by understanding the theory, or by playing around?

(16:35) Fine-tuning: what it is, and why it’s mostly a waste of time

(18:03) Where fine-tuning works

(18:31) RAG: an explanation

(21:34) The expense of running testing on AI

(23:15) Simon’s current AI stack 

(29:55) Common misconceptions about using LLM tools

(30:09) Simon’s stack – continued 

(32:51) Learnings from running local models

(33:56) The impact of Firebug and the introduction of open-source 

(39:42) How Simon’s productivity has increased using LLM tools

(41:55) Why most people should limit themselves to 3-4 programming languages

(45:18) Addressing ethical issues and resistance to using generative AI

(49:11) Are LLMs are plateauing? Is AGI overhyped?

(55:45) Coding vs. professional coding, looking ahead

(57:27) The importance of systems thinking for software engineers 

(1:01:00) Simon’s advice for experienced engineers

(1:06:29) Rapid-fire questions

Some takeaways:

  • If you are not using LLMs for your software engineering workflow, you are falling behind. So use them! Simon outlined a bunch a of reasons that hold back many devs from using these tools – like ethical concerns, or energy concerns. But LLM tools are here to stay, and those who use them get more productive.

  • It takes a ton of effort to learn how to use these tools efficiently. As Simon puts it: “You have to put in so much effort to learn, to explore and experiment and learn how to use it. And there's no guidance.” Also, in related research we did in The Pragmatic Engineer about AI tools, with about 200 software engineers responding, we saw some similar evidence. Those who have not used AI tools for 6 months, were more likely to be negative in their perception of these. In fact, a very common feedback from engineers not using these tools was “I used it a few times, but it didn’t live up to my expectations, and so I’m not using it any more”

  • Use local models to learn more about LLMs. Running local models has two bigger benefits:

    • Tou figure out how to do these! It’s less complicated than one would think, thanks to tools like HuggingFace. Go and play around with them, and try out a smaller local model.

    • You learn a LOT more about how LLMs work, thanks to local models being less capable. So it feels less “magic”. As Simon said, “ I think it's really useful to have a model hallucinate at you early because it helps you get that better mental model of, of, of what it can do. And the local models hallucinate wildly.”

Where to find Simon Willison:

• X: https://x.com/simonw

• LinkedIn: https://www.linkedin.com/in/simonwillison/

• Website: https://simonwillison.net/

• Mastodon: https://fedi.simonwillison.net/@simon

Referenced:

• Simon’s LLM project: https://github.com/simonw/llm

• Jeremy Howard’s Fast Ai: https://www.fast.ai/

• jq programming language: https://en.wikipedia.org/wiki/Jq_(programming_language)

• Datasette: https://datasette.io/

• GPT Code Interpreter: https://platform.openai.com/docs/assistants/tools/code-interpreter

• Open Ai Playground: https://platform.openai.com/playground/chat

• Advent of Code: https://adventofcode.com/

• Rust programming language: https://www.rust-lang.org/

• Applied AI Software Engineering: RAG: https://newsletter.pragmaticengineer.com/p/rag

• Claude: https://claude.ai/

• Claude 3.5 sonnet: https://www.anthropic.com/news/claude-3-5-sonnet

• ChatGPT can now see, hear, and speak: https://openai.com/index/chatgpt-can-now-see-hear-and-speak/

• GitHub Copilot: https://github.com/features/copilot

• What are Artifacts and how do I use them?: https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them

• Large Language Models on the command line: https://simonwillison.net/2024/Jun/17/cli-language-models/

• Llama: https://www.llama.com/

• MLC chat on the app store: https://apps.apple.com/us/app/mlc-chat/id6448482937

• Firebug: https://en.wikipedia.org/wiki/Firebug_(software)#

• NPM: https://www.npmjs.com/

• Django: https://www.djangoproject.com/

• Sourceforge: https://sourceforge.net/

• CPAN: https://www.cpan.org/

• OOP: https://en.wikipedia.org/wiki/Object-oriented_programming

• Prolog: https://en.wikipedia.org/wiki/Prolog

• SML: https://en.wikipedia.org/wiki/Standard_ML

• Stabile Diffusion: https://stability.ai/

• Chain of thought prompting: https://www.promptingguide.ai/techniques/cot

• Cognition AI: https://www.cognition.ai/

• In the Race to Artificial General Intelligence, Where’s the Finish Line?: https://www.scientificamerican.com/article/what-does-artificial-general-intelligence-actually-mean/

• Black swan theory: https://en.wikipedia.org/wiki/Black_swan_theory

• Copilot workspace: https://githubnext.com/projects/copilot-workspace

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321

• Bluesky Global: https://www.blueskyglobal.org/

• The Atrocity Archives (Laundry Files #1): https://www.amazon.com/Atrocity-Archives-Laundry-Files/dp/0441013651

Rivers of London: https://www.amazon.com/Rivers-London-Ben-Aaronovitch/dp/1625676158/

• Vanilla JavaScript: http://vanilla-js.com/

• jQuery: https://jquery.com/

• Fly.io: https://fly.io/

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.

Discussion about this episode

One of the most useful ways we found to use it in our company is to built an internal chatbot that knows all our documentation:

https://medium.com/taranis-ag/building-an-internal-chatbot-that-knows-all-your-docs-a-step-by-step-guide-d910dfb26041

Super useful, and saves tons of time!

Expand full comment

We know that LLMs don't really understand words, they just do stochastic calculations on tokens which are representations of words and phrases, so essentially it's a math calculator - it knows the rules of the English language without actually knowing what language even is. So I assume it "learns" the rules of a programming language the same way. I can imagine how it learns the rules and patterns of non-mathematical constructs like class definitions, object inheritance, type hinting, basic data structures, method signatures - but since it is so terrible at classical math and logic, how on earth can it "understand" the workings of loops, mutable variables, complex data structures, algorithms, date-time calculations, and any real math that comprises typical application programming? Does it choke on incrementing values or a loop with increment because of its inability to count?

Expand full comment

You probably saw this, but a recap of how LLMs work, from the ChatGPT team: https://blog.pragmaticengineer.com/how-does-chatgpt-work

I have heard LLMs adapted pretty well for maths use cases (several teams are working in this area) for formal reasoning. But also, your point on LLMs being a poor choice for maths calculations is true, here is a hilarious example of wasting tons of compute to try a single enough floating point calculation:

https://www.linkedin.com/posts/luiscaires_dont-expect-gen-ai-on-its-own-to-be-a-technology-activity-7243540643971534848-98CF/?utm_source=share&utm_medium=member_desktop

Clearly, the weak areas of LLMs need to be complemented with more sensible approaches!

Expand full comment

This is awesome. Too many AI tools are advertised as better than they are for clickbait. I appreciate the honest recommendations here

Expand full comment

This was exactly what I needed to hear today.

https://newsletter.pragmaticengineer.com/p/ai-tools-for-software-engineers-simon-willison?utm_source=publication-search

It a such great article

Expand full comment

> It takes a ton of effort to learn how to use these tools efficiently. As Simon puts it: “You have to put in so much effort to learn, to explore and experiment and learn how to use it. And there's no guidance.”

I'm slightly surprised that there don't seem to be more training courses popping up that promise to teach people these skills.

That seems like it should be a reliable way to make money from time invested in learning how to use LLMs as coding assistants.

Expand full comment

The odd thing about software engineers embracing LLMs for writing code is that, sure, anything that can save you time and effort is cool and gets my vote, BUT, you can't ignore the fact that these machines are designed to replace you. We can call it an "augment", "assistant" or "weird intern" all we like, but it's inventors have a goal of making us the assitant and it the master. Are we training our own replacement?

Expand full comment

Jim: the stated goal of the language COBOL was to be able to replace expensive programmers with business people who could all write programs to do what they want. That was around 1960. Turns out that introducing COBOL meant it created... COBOL engineers.

Looking back through the history of computing, there have always been (and always will be) pushes to "replace" expensive devs so people can create programs without them. The latest such push a few years ago was "no code" approaches (press buttons to build apps). But UI-defined programs, things like Microsoft Access, to some extent Excel etc have all done this.

In an interesting change, all these things did not reduce demand for developers (people who can write machine interpretable code to translate business expectations). Though you can argue tools have replaced entire disciplines like webmasters in the 90s.

It's hard to predict how things evolve: and there are many lofty promises made by vendors. This episode was about what works, today, and how.

Expand full comment

I Hope you're right Gergely, but I personally don't think generativeAI can be drawn parallel to any previous technological innovation, it changes the game too much, so I do think there is a real risk that it could extract far more than it creates in terms of jobs. In the short term it seems likely that some companies will reduce developer headcount and some senior devs will remain to assist the machine and act as QA ~ not the same job but at least they'll still have one I suppose.

Expand full comment

That's already what we are doing, cajoling the machine so that it outputs what we want. :-)

But for now the code is the interface. If tomorrow GenAI will be the interface then it won't change much, instead of writing the code we'll configure and fine tune the AI.

What's funny is that for the LLMs to work you need to feed them code. They can't generate the code without having as input a HUGE mass of code. Basically, you need to know how to code, and feed it to them, so that they can code for you. And even if the models are already trained, they are static, and still need to be constantly updated with new code, which is very expensive to do. As said in the podcast, they are good with very few widespread languages, but as soon as you start doing something that is not on their list, they are lost and need to be retrained with original content.

Also, if a cost reduction happens then it will allow for more production. Some small companies, and startups, will be able to create their products without spending millions which will actually increase the demand for developers.

Many different perspectives, for sure, it will be interesting to see what comes next. As of today, I tried many tools, and most of them are useful for only 10% of my tasks, and as soon as they hallucinate I actually waste more time than writing the code by myself.

Expand full comment

Yes, AI code generation really is very limited, despite the hype. I noticed openAI has a new feature where you can port code from one language to another - but in many cases that makes no sense. Eg you can't port a SPA built in React to Python without losing all of the front-end interactivity because of course Python is a server language. In terms of how these tools will affect the future of software development my 2 main concerns are: 1. It will be an excuse to replace lots of developers even though it is not actually up to the task (there will be massive fallout down the road) 2. The job of software development will become a low-paid, meaningless task of just babysitting the AI - tidying up its mistakes. Compare this to what is happening to writers, who are being turned into mere AI editors: https://www.bbc.com/future/article/20240612-the-people-making-ai-sound-more-human

Expand full comment

The trend of people using AI to "take on projects in unfamiliar languages and/or frameworks" worry me. The example here was Go, and there's a reason my first Go book recommendation is "100 Go Mistakes and How to Avoid Them". It highlights a hundred ways in which code will cause problems in production despite looking good to senior engineers who are not Go experts. It actually takes studying the language to write real production code in it at a senior level.

Not sure if another LLM can be trained to catch all such mistakes. It feels somewhat dystopian to me, being schooled by an AI.

One possible consequence: If we keep going like this, we'll soon be hiring folks based on various certifications, to prove they are actually able to deliver quality code.

Expand full comment

Solid first episode! I strong agree with the advice to experiment with different language models to build some intuition for the general strengths and weakness, notably hallucinations, of LLMs.

To this end I'd recommend playing with https://lmarena.ai/ as means of experimenting w/ different LLMs. Each query is answered by two different, randomly chosen LLMs and we the user rate which answer is better. The models are initially anonymous and their identity is only revealed after voting. It is fascinating to regularly see small, largely unknown models match performance of frontier LLMs, although we can also see them fail spectacularly on other queries.

Expand full comment

Hey there,

I noticed that you didn't mention Cursor! It's the best IDE for AI codegen, we've reccommended it to all our readers.

I just subbed to your Substack, good stuff. I went ahead and recommended you to our readers, who I teach about AI code generation. I hope it helps more people find your stuff.

Have a good weekend!

Expand full comment