Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft is preparing to add ChatGPT to Bing (bloomberg.com)
1092 points by mfiguiere on Jan 4, 2023 | hide | past | favorite | 897 comments




Nobody seems to be bringing up the questions inherent to a big company moving on this that will have to follow the "rules":

- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, most sites are farmed content. What happens when even the ok sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023.


If language models take over text content, content creators will flee even quicker into creating video content. There's already a trend where younger people tend to prefer video for being more "genuine", and now it might become a sign of "human made" for a couple years. Also easier to monetize, and easier to build parasocial relationships, so all around a plus for creators. Too bad I prefer text.


I think the push to video and away from text is a net failure for accessibility and usability, at least for reference use cases.

My example: as a woodworker, I'm often curious about the details of a particular joint or the usage of a particular tool. The great amount of content on YouTube is helpful, but it's incredibly inefficient to have to seek through a bunch of filler or unrelated content to get the answer I need.

Of course, that's "increased engagement" so I'm not surprised it's what is commercially more viable.


That sounds remarkably similar to how recipes are shared in blogs. There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged. If I'm looking for a recipe, I want to know the recipe so I can make it. I don't care about what the blogger did last weekend or in college.


> There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged.

It's not about engagement, it's about copyright.

Recipes - in the form of lists of ingredients and the method - are not typically protected.

However, add a huge rambling story about how Grandma handed this recipe down to you when you were five and on holiday with her in $place, hey presto, it's protected.


It's not for engagement. Some sites have now a Jump to recipe button. It's for google that said that if you write normal text they will send you a ton of traffic. What people figured out was that unless you spam the recipe with keywords repeated at least 20 times, the google bot will not understand what the text is about. Maybe google was forced to do this, but that's how it works and it contradicts how they said it works.


I read that the recipes are actually bullshit. Written by content farms eating instant noodles, not anyone remotely involved with a kitchen.


Google* how long to pressure cook white or brown rice and you’ll see widely differing answers. Like shots all over a dartboard. They can’t all be correct — it’s just rice.

I wonder if many of them care more about CPM rates and page visits than actual recipe accuracy.

  *or Bing, DDG, Kagi, etc if you prefer although I haven’t tried.


I would somewhat disagree with that. My household eats rice on a daily basis and the timings for different kinds of rice varies wildly. Basmati, Sona masuri, jasmine, risotto, jeera samba rice have very different water and rice measures. And that's just white rice! Other rice variations are a whole different ball game.


I strongly recommend the books Cooking for Geeks and The Food Lab. In both books, the authors explore a variety of different approaches and show their math.


second order effects of this preference for video is how poorly video content gets indexed.

With text, searching of obscure things is cumbersome but possible. With video its impossible.

Meaning I, as a user cannot take the shortest path to my target content simply because of the medium.

I now default to looking for really old books on my topic of interest, or authoritative sources like textbooks and official documentation and then skim and weed through them to get to a broader understanding. Very often this has led to me on to better questions on that topic.

Online I prefer to look at search results from focussed communities, reddit, HN, StackOverflow, car forums, etc. I just never go to video for anything beyond recipes , quick fixes to broken appliances and kids videos.


(Old post, but you made a good point)

I finally realized what actually bothers me about shopping physically vs online these days is (a) the lack of "sort by price, ascending" & (b) the lack of ability to get a reference or "fair" price for similar items.

Similar, with video the key missing feature is deep search.

It's mind bogglingly sad YouTube didn't focus more on improving this after being acquired: they have all the components to build a solution! And it's a natural outgrowth of Google's dead tree book digitization efforts!

I assume it was harder than just relying on contextual signals (links and comment text) to classify for ad targeting purposes. Also probably why they also incentivized ~10 min videos over longer/shorter.

Which is sufficient for advertisers, but utterly useless for viewers.

It makes me cry that we're missing a future where I could actually get deep links to the portion of all videos that reference potatoes (or whatever).


That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.


May I suggest a simpler and smaller scope? An AI converting speech to text, extracting a bunch of still frames (or short video rolls) as illustrations (where relevant) and making it an ol' good readable article?

Then it can be fed to the search engines and those would do the rest of the job just fine.


I think that will just multiply clickbait and those making the most substantive contributions will be ripped off by SEO/content farmers.


> That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.

Q: Why would your $videoPlatformOfChoice allow a commercial AI bot to scrape boatloads of videos, abstract the information, then serve that information separately somewhere else .. possibly while serving their own ads(!)?


Scraping is legal, plus how will they even know?


Once AI can do all that with video, then we’re at about the point where automated video spam is too high also.


SponsorBlock is the response. It's a crowdsourced extension that labels parts of the video, like sponsor segments, highlights, intro/outro, etc. Very useful, you can skip through useless segments.


I prefer text too but I feel like that's mostly because the videos are not information dense on purpose. They expand to whatever the youtube algorithm prefers at the time, which is about 10 minutes now. Ironically, tiktoks are more information dense but the search is completely useless.


I’m finding more and more that the information density isn’t there because the video content is actually just an infomercial for a “course”.


I think we're very close to the point that even video won't be confirmable to be genuine. If it could even really be said to be so now. (Instagram/TikTok are the most performative/contrived content platforms these days)


Nope, there are already several services transcribing the audio content of video so expect that to be ingested too. You’ve seen the video suggestions with timestamps in google search right?


Oh, I'm aware ofhow well video transcription works. Once the lower-hanging fruit are dealt with, video content will absolutely flow into language models. But still, the video component is a key differentiator that AI can't easily mimick right now (at least not to a level where we can't tell). So users that want a personal opinion instead of a GPT-generated text are likely to turn to consuming videos.


So regressing to a fully oral culture... Odd times


The digital world is the native environment for the AI race we're creating. In that world us biological humans are relatively slow and inferior. And if this "handing the intelligence baton to machines" trend continues then "regression" to our more native communication forms feels natural and inevitable.


That's some interesting insight. Thank you. When I read your comment, I was envisioning us all sitting around fires in caves with animal skin togas talking about the latest HN post (which presumably was Carl scribbling down something on the rock wall).


But one that can be catalogued and relayed by robots.


Good, the less I have to see of their clickbait and the more time my competitors waste watching videos the better. Video has its uses and when it's good it's very very good, but most of the time it's terrible dreck that steals people's time using cheap emotional manipulation.

I've been thinking about training an ML model to detect those 'Pick Me!' poster frames that highlight the e-celeb presenter making some kind of dramatic reaction face and just filter them out of search results. This is partly what happens when SEO types combine with black box algorithms; the lowest common denominator content starts to swamp everything else, a kind of weaponized reversion to the mean.


there are already custom AI avatars and text to speech, there are already people using GPT to create text and then using other services to create the audio and dynamic videos at scale


Exactly. Several of the highly ranked YouTube videos that were recommended to me recently were clearly made by some AI doing a mashup of imagery with some text spoken by some text-to-speech algorithm.


could it somehow get access to the subtitles and then use them to answer queries?

also i hope this comes to ecosia, would like to experiment and try it at least


> " could it somehow get access to the subtitles and then use them to answer queries?"

It's not even necessary - computers are already excellent at understanding spoken words. Have you tried automatic captioning recently? Half the inputs to my phone are already voice, not text.

Video is a harder problem, but it's not too far behind.


Exactly, and many bots exist today to mine user videos for the automated subtitle information. In other words, there's no escaping GPT from learning from any kind of medium.


These questions are constant. I do think you bring up relevant issues, but they aren't quite showstoppers.

Websites allow SE crawlers because (a) whatever traffic they get is better than not traffic (b) because allowing crawlers is default and doesn't cost anything and (c) google/bing don't negotiate. They are one, sites are many.

This has already played out in news. News outlets wanted Google to pay for content. Google (initially) responded by allowing them to opt out of Google. Over the years, they have negotiated a little bit. Courts, in some places, forced Google to negotiate... It's news and politicians care about news specifically. Overall though, there have not been meaningful moments where people got pissed off with Google and blocked crawlers. Not newspapers and not anyone else. Site owners being mad doesn't affect google or Bing.

What does matter to search engines is walled gardens. Facebook pioneered this, and this does matter to Google. There is, in a lot of cases, a lot less content to index and serve users. All those old forums, for example.

These are search problems, and GPT-based search will inherit them. ChatGPT will have the same problem recommending the best air fryer as normal search does. GPT is a different way of presenting information... it's not presenting different information.

RE: Lawsuits. Again, history. Youtube, for example, started off with rampant copyright infringement. But, legal systems were primitive. Lawyers and legislatures didn't know what to do. Claimants were extremely dispersed, and would have had to pioneer case law. Ultimately, copyright took >10 years to really apply online and by that point youtube and other social media was entrenched.

The law lags. In practice, early movers are free to operate flawlessly and they get to shut the door after them. Now that Google is firmly entrenched, copyright law serves as one of their trenches.


Incidentally, law seems like an incredibly powerful potential application for ChatGPT.


This is an extremely important point. Something like ChatGPT without attribution can completely kill the open web. Every company will keep their information in closed walled garden if no traffic is flowing through them . I don't see a scenario where something like stackoverflow can exist if no one goes to the site.


I think StackOverflow will exist and do well. 1st, it is source of information for ChatGPT itself so if there would be no new content then AI is going to implode too. 2nd, very often I skip top answer because it has some edge cases or simply is outdated. The answer comments often highlight such issues. I don't think ChatGPT could be trusted without verification, not in serious programming work.


I see Stack overflow as one of the problems here.

StackOverflow went along way in killing the Tech Blog, and the number of "right" but poor answers on Stack sites are at an all time high

Often the "best" answer on those sites is buried or even downvoted in favor of an answer that "just works" but may have security issues, maintainability issues, is out dated, etc.

In alot of area's I find Stack answer to be of low quality if you happen to have any indepth knowelege of that area.


Indeed.

They should be renamed to ShitOverflow, because that's how bad the quality is a lot of the time.


On the first point, that is no guarantee that users will stay on the site. The AI is currently only using data from 2021 and earlier as far as I'm aware, and does so without feeling out of date. Before we see any significant signs of the AI imploding due to lack of new information, SO might well be long gone


What this is going to allow is a way to flatten org-mode, which will massively expand the amount of people willing to use it. Put anything you wish into your own data collection, and you can instantly pull it up with a prompt. That service would then allow anonymized queries of other peoples data.

If we don't get AGI, the LLM that are starting now and don't have fresh data from people's queries won't be able to get going. The internet will quickly become stale. This will be sped up by the spam that the LLM will be used to create.

Walking through this scenario I don't see anyway for this not to end in a network effect monopoly where one or two services wins.


Maybe we can return to people sharing information/websites purely for the passion of sharing what they love, rather than the greed fueled mess we have today.


Oh gosh, maybe we'll actually have to pay for things, and we'll find that the market for the fifth random blog trying to make money off of free information using ads doesn't really exist. What a terrible world this will obviously be.

No. The weird thing is this idea that because you put ads on your site, you deserve money. Your ads are making the Internet worse. You probably don't realize this, because you most-likely use an ad blocker, which means you want people too dumb to use ad blockers to subsidize the web that you can use for free, but the current web is working well for approximately no one.

Would I pay $5 a month for StackOverflow if it didn't show up for everything I Google? Most likely. Would this be a better world? almost certainly. We tried the thing with ads. It sucks. I welcome our new AI search overlords.


Why would you want power centralized? Big corporations are never your friend.


Power is also centralized when most supposedly independent actors buy ads from the same large advertisers, and utterly depend on their income from those ads to do whatever they're doing.


Websites will optimise for AI eyes rather than Human eyes. Advertisers will pay to embed information in the websites that is read by AI, which subtlety makes the advertisers' products more valuable in the eyes of the AI. Then the AI would ultimately spit out information to users that are biased towards the advertiser's products.


That sounds like an incredibly difficult sell to the advertisers.


It isn't. IDK in the anglosphere, but in the hispanic world this already being done, and for years. It's platforms where you buy articles from websites (even some newspapers), and even more, you can share the cost of an article between a number of advertisers.

Of course the impact of this has been inmense and the spanish internet is filled with the same crap as the anglo internet, and trustable sites are piled under tons of noise.

I had to map a bunch of communities in spanish and post it in my blog because they don't appear in the search results anymore. Just to remind myself that they're out there.

I'm planning to do the same with blogs.

I guess we're going to rediscover directories and the problems associated with them, but currently the 'open internet' is a mess.

ChatGPT tools will just change how money flows and the incentives. Lots of spammers will get out of business, but many others will thrive. No ads, just deception.


This already exists in the US. All of the “PR news” sites are just paid PR releases. They make the product/company look good while spreading it over many sites to boost SEO and recognition but would also cover this.


We already know that advertisers aren’t willing to pay that much for “subliminal” advertising. People have been trying to do product placement in movies and shows forever and it’s never really taken off.


The entire concept of an Influencer is just a front for product placement. The difference nowadays is that people are actively looking for the commercials and ignoring the movie.


Product placement is everywhere. Next time you watch a movie or show, look for the clothing brands, computer brands, car brands, wine brands, etc. everywhere.

And think about sponsorships. From soccer to nascar, sports is covered with branding.


That's not subliminal, you're describing sponsorships (i.e. manufactured social proof).


"Subliminal" and "sponsorship" are totally orthogonal. One refers to the presentation, the other the business arrangement.


This seems factually incorrect. It's hard to find consistent historical numbers but what I can find implies pretty steady double digit growth over the last decade or two.

If you have good sources that say otherwise, I'd love to see them.


> "Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023."

ChatGPT doesn't include anything written after 2021. I certainly wouldn't use it to find an air fryer. The results will be from over a year ago. I would want to see what the newest air fryer options are and it would be really important to have to up to date pricing.

AFAIK there is not a way to update a large language model in real time. You have to train on the entire dataset to do a meaningful update, just like with most forms of neural networks. For ChatGPT that takes days and costs hundreds of thousands of dollars. Every time.

It's great for explanations of concepts, and programming, and a few other things. But with the huge caveat that all of the information you're looking at is from one year ago and may have changed in that time. This really limits the utility of ChatGPT for me.


OpenAI is already working on solving this

https://openai.com/blog/webgpt/


Neat! I've seen so many discussions of the cost of continually retraining ChatGPT with new knowledge (and the energy efficiency of that, etc.) but had a similar thought that you can probably use a GPT-like approach to do "next word prediction" for a command-based web crawler to gather up to date data and then use the GPT-we-already-have to combine/integrate found content using the classic next word prediction.

Sometimes I feel that what makes humans cool is that we (well, some of us!) have good internal scoring on when we lack knowledge and must go in search of it which makes us go down different branches of next-action-in-line.


Someone pointed out that the energy cost of training GPT is roughly on par with a single transcontinental flight. If so, I don't think this is a limiting factor in any meaningful sense - you could spend that much energy daily, and it would still be a drop in the bucket overall for any moderately large business.


The bottleneck would be the number of workers on sites like mechanical turk available to create the datasets. Might take a few more years before amazon and facebook get enough third world countries to the point they can exploit their labour online to create daily training sets.


I would imagine trying new datasets on daily basis wouldn’t be trivial ?


That’s a very solvable problem though. If Microsoft decides to integrate ChatGPT with bing, they have the resources to retrain the model on a more recent data set, and even do it somewhat regularly


You don't even need to retrain if you use retrieval transformers. That is the real revolution waiting to happen. Deepmind already unlocked it with RETRO, but I don't know why a public version hasn't been released - hooked into the live internet.


OpenAI have webgpt too https://openai.com/blog/webgpt/


> Where is that information going to come from, exactly?

Manufacturers, with quality ranging from excellent to trash.

Consider trying to buy a 1K resistor at Digikey using their parametric search. Possible, but tedious and time consuming because you need a lot of domain knowledge to know what you want, and the technological range of "things with 1K of resistance" is extremely vast. At least its possible because the mfgrs are honest when Digikey imports their data.

Consider the opposite, consumer goods. 500 watt PC power supplies with random marketing number stickers on the same chassis ranging from 500 to 1200 watts. Consumer level air compressors and consumer level vacuum cleaners than plug into household wall outlets claiming "8 horsepower" or whatever insane marketing nonsense. Clothes with vanity sizing so a "medium" tag fits like a real world XXL. Every processed food in a store with a "keto" label is high carb sugar added garbage, much like happened with "organic" label in the old days (the employees at the farm, distributor, warehouse, and/or retail store level take the same produce out of one bin and put it in two places with different prices)

I think it will help when purchasing technical engineering type products but be an epic fail at inherently misleading consumer goods.


If you're trying to search for a specific resistor without the prerequisite domain knowledge, how will you be able to vet whether or not the answer given by a language model meets your needs?

Imagining that language models like gpt will ever be able to index up-to-date information is literally trying to apply the concept of "artificial intelligence" to a probabilistic language model. It's incompatible with what it's actually doing.


Maybe manufacturers could upload their design docs and ChatGPT could learn exactly what the object does and what its performance parameters are.


Put SEO into the picture and things get hairier. Incredibly realistic spam is about to go through the roof, so search engines will have an insanely harder time distinguishing between useful content and spam.

Making money from search traffic to your (presumably useful) site is going to get harder in a bunch of ways, due to generative models.


I don't see why this would be a copyright violation anymore than somebody learning something from multiple sources and reformulating what they learned into an answer to a question. As long as it isn't explicitly reciting its training data, there shouldn't be an issue of copyright.


> Barely anyone is making money writing reviews today, most sites are farmed content.

I'm sure ChatGPT will be able to write a bunch of terrible SEO prose that precedes the actual air fryer review (or worse, recipe) about how the author's grandma had an air fryer when she was young and remembered the great times with her grandma (etc), for roughly 95% of the text!

In all seriousness, being able to swerve all that terrible SEO content on reviews will always be welcome!


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

> fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use. If it were spitting out exact copies of things it had read, it would probably be pretty easy to train that behavior out of it.

> I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, it's mostly farmed content. What happens when even those sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything fresh.

I do agree with this, though. The LLMification of search is going to squeeze revenue for content creators of all kinds to literally nothing, at least if that content isn't paywalled. Which probably means that that's exactly where we're headed.


> I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

> I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

That case was limited to the CFAA, but you seem to get the gist of what I'm saying when I specified it's different when it's Microsoft doing the scraping. If Bing starts ignoring robots.txt and data still start showing up in their results, all the early 2000s lawsuits are going to be opened back up.

> It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use.

Unless there's a reason for them to be considered fair use, derivative works are going to lose a copyright suit. And what's the fair use argument? If I'm the only one on the internet saying something and suddenly ChatGPT can talk about the same thing and I'm losing money as a result, there's no fair use argument there. Search engines won those early lawsuits by being transformative (index vs content), minimal, and linking to their source. None of that would apply here.


What GP means is that ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative. Instead, it resembles what you'd get if you asked a (credulous and slightly dumb) human to read a selection of documents and then summarize them. These kinds of summaries are absolutely not copyright violations, even if the source document can actually be identified.


> ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative.

Isn't this exactly what a court case would be trying to clarify? If so wouldn't assuming this be begging the question?


There exist other laws, jurisprudence, and even entirely different judicial systems besides those currently used in the USA!


Sadly, seem like the decision in that case was changed. From your link:

> In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.


It wasn't changed, it's just that there's more than one issue at hand: the earlier decision was that hiQ didn't violate CFAA, the later one was that it did violate LinkedIn's EULA. The November 2022 ruling specifically states that hiQ "accepted LinkedIn’s User Agreement in running advertising and signing up for LinkedIn subscriptions" - keep in mind that LinkedIn profiles haven't been public for a while in a sense that logging in is required to view them, and thus to scrape them.

Hence why OP is saying that this all will lead to increase in paywalls and such, and a reduction in truly public content.


My guess is your first point is exactly why Google hasn't done this yet. Their 'knowledge boxes' are already crossing a line that in general they felt nervous about crossing historically, but they don't go very far.

Google on the whole historically did not want to alienate publishers (and the advertisers that hang out on publisher content) and has avoided being in the content production business for this reason.


IMO this is the big problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions. My greatest hope for the future is that somehow we can incentivize people to produce "good" information for AI based assistants and move away from the rage/shock based advertising model that most of the internet currently uses. Personally I would rather pay a few cents for a query that produces valuable results and doesn't put me in a bad mood than pay with my time and attention like we do today. AI systems will absolutely need to be able to identify the training sources with every result (even if it is coming from several sources) and those sources should be compensated. IMO that's the only fair model for both image and text generation that is based on authors and artists work.


> problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions.

I think this problem is orthogonal to the internet as medium, though I’ll concede that it has proven to be the biggest amplifier of this dynamic.

Correct (or correct as far as humans know, or most likely correct, etc.) costs money to create. False or completely made up information costs nothing, plus has the potential upside of sensationalism, thus further increasing its ROI.

Agree with your point about developing more incentives for correct information and penalties for false.


It's not just that there's no incentive for that, but there's a very strong incentive to do the exact opposite:

https://www.youtube.com/watch?v=rE3j_RHkqJc


How I’d like to weather this storm:

1) Everyone cryptographically signs their work for identity confirmation.

2) there exists a blockchain whose sole purpose is to allow content creators to establish copyright date on a digital piece of work.

3) a public that uses the two items above what evaluating the reputation of an artist.


This seems to make a lot of sense. The artists themselves also have an incentive to be blockchain validators/miners, thereby reducing the need for token payout, and the subsequent speculation that comes with tokenization (I think).


You don't need blockchain for cryptographic timestamp


I've got some reading to do[1]. Thank you for the head's up.

[1] https://en.wikipedia.org/wiki/Trusted_timestamping


How does that prevent anyone from using ChatGPT to generate new (supposedly human-written) content?


It doesn't; however, the signature in your hypothetical doesn't correspond to a known/trusted author.


A language model that provides answers with sources (which could be found using a traditional search engine that searches the corpus that the language model is trained on) would be very useful and would also allow it to link directly to the source material. The trouble would be in finding the exact sources since the output of the language model is unlikely to be verbatim but current search engines can deal with imprecise queries fairly well so it's not an intractable problem. A very well curated data set would help this immensely.

I'd be super interested in a language model that was able to synthesize knowledge drawn from a large corpus of books and then cite relevant sections from various titles.


>- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

They will send at least some visitors, which is better than the zero visitors you will get from bing if you block it.

>- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

Yes and microsoft has lawyers, who have presumably determined that the cost of fighting these frivolous lawsuits is not overwhelming.


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

You tell me! It's your site. If you want money maybe you should charge for your content? And honestly, the web that Google presents is just so terrible that I don't want to visit your site, unfortunately. And, maybe it's a price worth paying.


>Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

Google already does this with featured snippets


Which were already highly unpopular with websites, but at least have some attribution.


So we’re speculating that the Bing chatGPT implementation will crawl public websites, answer queries strictly from its training data, and present unattributed snippets?

That does sound both flawed as a search engine and objectionable to site operators. In addition to not being announced or even rumored to work that way.

So, maybe the implementation is different from that model?


Their plan is to use Neuralink to pull all the information from people's brains.


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

How can you refuse? The only way I know would be to require an account, but even then they could bypass it.


Major search engines honor robots.txt


If it becomes standard to have such a file and it effects their bottom line, could they disregard it?


If the file that were previously honored as consent to use the copyright material is subsequently ignored, wouldn't the content creators take the indexers to court for copyright infringement?


Yeah, not helpful. But long winded.

There are many air fryers on the market and the best one for you will depend on your needs and preferences. Some factors to consider when selecting an air fryer include size, price, features, and overall performance. Some popular air fryers to consider include the Philips Airfryer, the Ninja Foodi, and the Cosori Air Fryer. It might be helpful to read online reviews and compare the features of different models to find the one that works best for you.


I've found the greatest success with ChatGPT when I use it as a learning / exploration tool. If there is a topic I don't know much about, I can state the question in a fairly stupid way and ChatGPT will give me vocabulary options to explore.

For example, you could describe a probabilistic process to it and ask it what kind of distribution / process it is. Then, based on the extensive words you get back, you can continue your research on Google.

As such I think search engine integration is a really great idea, looking something like the follows

-> user: Hey searh engine, I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?

-> engine: It sounds like you are describing a Bernoulli process. In a Bernoulli process, there are only two possible outcomes for each trial: success or failure. The probability of success is constant from trial to trial, and the trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.

Here are some results on how to calculate probability of consecutive successes in a bernouli trial (result list follows)

(Note: if you try to ask this from ChatGPT it will not actually give you a correct answer for the calculation itself as there are some subtleties in the problem. But search results of "bernoulli process" will tend to contain very reliable information on the topic)

Edit: You could even just say "could you give me good search queries to use for the following problem" and use the results of that.


My experience is that gpt gives me a very good looking answer, but when doing a cross check, it’s often slight wrong or out right wrong


This honestly sounds like the same experience one gets when talking to humans :)

I'm only half joking. Even experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.


That is the claim made by AI proponents every time it fails, such as for self-driving cars - humans make mistakes too. Humans make math mistakes, but I wouldn't be satisfied with a calculator that does.

ChatGPT is a tool; it's value depends on how well I can trust it. Humans are not tools.

> experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.

Another typical argument - everyone makes mistakes, therefore my mistakes aren't relevant. Everyone can do math, but there's a big difference between my math and Timothy Gowers. Everyone lies and everyone tells the truth at times, but the meaningful difference is in degree - some do it all the time, with major consequences, take no responsibility, and cause lots of harm. That's different than the person committed to integrity.


To speak as a proponent, it's not about the er... "relative relevance" so much as the utility.

there are things about a chat model that you can't say about humans, like, it's not really ethical to keep a human stuffed in your pocket to be your personal assistant at your whim.

I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it. stuff like:

- brainstorming - rubber duck debugging - casually discussing a topic - exploring ideas / knowledge - study groups (as in, having other semi-knowledge entities around to bounce ideas off of, ask questions, etc)

when it comes to self driving cars, well, that's a bit of a different story and really is more a discussion about ethics and law and those standards. I, and others like you speak of are held of the opinion that the expectation for autonomous vehicles is a bit high given the rates of human failure, but there's plenty of arguments to be made that automating and scaling a thing means you should hold it to a higher standard anyway. I don't think there's a correct answer on this one - it's complex enough to be a battery if opinion. You mention the potential for harm, and certainly that applies here.

I'm less worried about chatgpt being wrong. Much less likely to flatten me at an intersection.


> I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it.

Maybe, but look at it this way: Do you work in business? If so, step back and reread that - it seems a lot like a salesperson finding a roundabout way to say, 'my product doesn't actually work'.


It’s either useful or it isn’t. Comparing AI to either human intelligence or rules-based computing tools is incoherent. Fucking stop it! What we are really talking about are the pros and cons of experiential, tacit knowledge. Humans can do this. Humans can also compute sums. Computers are really good at computing sums. It turns out they can work with experiential knowledge as well. Whodathunk.

What we should be saying is this: there will always be benefits of experiential knowledge and there will always be faults with experiential knowledge, regardless of man vs. machine.


ChatGPT is just your average Reddit user.

Even when it's wrong, it's confidently wrong.


Perhaps because it is trained on Redditors and co.


I know this is a joke, but I think it's important to recognize that it's because the ChatGPT language model does not have the ability to introspect and decide how accurate its knowledge is in a given domain. No amount of training on new input data can ensure it provides accurate responses.


That applies to humans as well.


No it doesn't, humans can recognise when they don't know something, current language models usually can't (yet)

Their training objective, which is to predict the next piece of text in their training data, does not incentivise them to respond that they don't know something, as there no relation in the training data between the AI not knowing something and the correct next text being "I don't know" or similar


I'd sure hope not. Reddit comments are a masterclass in disguising ethos and pathos as logos.

I'd expect that the boring reality is that it's trained on highly ethos/logos text (academic works) and thus always presents itself as such, even when its weights cause an invalid assertion.


Reddit is exhausting. One big feedback loop. People will say anything to get good karma or avoid saying certain things to avoid being down voted. If there is even just a slight majority in the way the group thinks, it will soon become the dominate opinion.

For example, there was a voice actor that lied about being paid a pitiful sum of money for a gig. Everyone took her side initially (as one should _if_ it were true) but the people saying "well, this just seems odd" were being more or less attacked and told their opinions were awful.

The quality of discussions I have on HN and niche forums are 100x better than reddit.


TBF, the same can happen here to a lesser extent. False or misleading stories blow up quickly because “$BIGCORP bad”.


It's trained on Twitter data so I assume Reddit data as well.

Honestly feels like they're both pretty important datasets to ingest if trying to build a model on human speech, I reckon social medias, comment sections and co have the most natural human conversational text online.


Similar to the original comment, it could help with exploratory type of work. It helps me shift things from “things I don’t know about/unaware of” to “things I know I don’t know of”.


"Sometimes right, always plausible"


Would it be effective to ask GPT to provide a confidence rating about how sure it is about an answer, or would it be likely to just say that it is confident in its correctness when it is wrong?


"Confidence" is an unfortunate term that shouldn't be confused with a human logic interpretation of "confidence".

In most ML cases (and ChatGPT likely), "confidence" would generally just correlate how closely the query matches data and patterns it's seen in its dataset and inversely correlate with how many conflicting matches and patterns it sees.

Humans are subject to the same problem of course. If you asked how confident a person living many centuries ago was that the Earth was flat, they'd probably say "very confident" because there was nothing in their training data / lived experience to conflict with that view. But they'd be wrong.

But humans still have a significant advantage in that they report lack of confidence when they sense logical inconsistencies and violations of reasoning to a level that ML models can't (at least not yet).

Maybe a fan-out of the possible ways it could answer would be interesting, but really we more need a disclaimer next to every answer that says "this thing that's answering in fully formed language does not have human reasoning capability and can't be trusted (yet)"


Odd/ironic fact: people didn't believe the Earth was flat back then. That's a modern confusion. They believed the Sun revolved around the Earth. Wikipedia has a whole article dedicated to this common belief:

https://en.wikipedia.org/wiki/Myth_of_the_flat_Earth

"The earliest clear documentation of the idea of a spherical Earth comes from the ancient Greeks (5th century BC). The belief was widespread in the Greek world when Eratosthenes calculated the circumference of Earth around 240 BC. This knowledge spread with Greek influence such that during the Early Middle Ages (~600–1000 AD), most European and Middle Eastern scholars espoused Earth's sphericity.[3] Belief in a flat Earth among educated Europeans was almost nonexistent from the Late Middle Ages onward ... Historian Jeffrey Burton Russell says the flat-Earth error flourished most between 1870 and 1920, and had to do with the ideological setting created by struggles over biological evolution"

I asked ChatGPT the same question and it prevaricated:

"There is evidence that some people in medieval times believed the Earth was flat, while others believed it was round. The idea that the Earth is round, or more accurately, an oblate spheroid, has been around since ancient times. The ancient Greeks, for example, knew that the Earth was a sphere. However, the idea that the Earth is flat also has a long history and can be traced back to ancient civilizations as well. During the Middle Ages, the idea that the Earth was round was not widely accepted, and there was significant debate about the shape of the Earth. Some people continued to believe in the idea that the Earth was flat, while others argued for a round Earth. It is important to note that the medieval period was a time of great intellectual and scientific change, and ideas about the shape of the Earth and other scientific concepts were still being developed and debated."

But from what I know, it's wrong, at least as far as we know the historical record (of course there may have been peasants who believed otherwise but their views weren't recorded). The fact that the Earth is a sphere is obvious to anyone who watched a ship sail over the horizon, which is an experience people had from ancient times.


I've asked it to give me a confidence rating for its replies to my questions but it states that it can't give one


I had luck getting it to give me one when providing answers in JSON form.

For example:

> I'm going to share some information, I want you to classify it in the following JSON-like format and provide a responses that match this typescript interface:

    > {
    >   "isXXXXX": boolean;
    >   "certainty": number;
    > }
> where certainty is a number between 0 and 1.

However, I got either 0 or 1 for the certainty every time. Not sure if it was because they were either cut-and-dry cases (certainty 1) or not-enough-information (certainty 0).

I'm actually trying to think of a good example of text I could ask it to intuit information from and give me a certainty


Even if it gives you a number there. Does that number actually tell you what you mean or is it merely filling in the blanks with random information? I suspect the latter.

For example, ask it to subtract 2 20-digit numbers. It will come up with an answer X where the first couple of digits are correct, and everything after that is wrong.

It gets better.

Ask it to correct itself. It will come up with a different wrong answer Y.

If you then ask it to explain why the answer is right, it will give you an explanation. At the end of the explanation it states the answer is X again, and then in the very next line concludes by telling you that is why the answer Y is correct. :)


I saw a screenshot a few days ago where someone asked it for five fun facts about the number 2023. In the same response, it said it’s a composite number (3 times 673) and prime (specifically the 41st). Both are wrong; it’s a composite number of 7 times 289).


I think this is the big question lots of people are working on right now

It's apparently really hard to objectively measure/report the "truthiness" of LLM results

Allowing an LLM to "improvise" and be a bit fast-and-lose is unfortunately a necessary ingredient in how they currently work.


Then there's the question of how should we interpret it? Should we ask for the confidence rating of the confidence rating? The language models lack the ability to verify/falsify claims, they just do words correlation.


I asked it for movie quotes from a specific movie: 7 out of 10 were from the movie but 3 weren’t but sounded plausible


I asked it 'What hapoened to Gandalf after he fell down from the bridge, fighting the Balrog? Did he die?'

The answer was a story about Gandalf beeing hurt badly, beeing rescued by some random dwarfs, and so on.

I asked ChatGPT in which book this is described, and it told me that you can read about it on both The Hobbit and The Lord of the Rings.

So it makes up fun stories. This makes me wondering how much of its explantions about physics (which I don't understand completely) are made-up.


Transformer models suffer from "hallucinations". It can be terrible at giving quotes or references. It's a known limitation with this tech that the industry is working to overcome.


Sounds like every human I’ve ever met


Except we don’t hype humans the same way we hype ChatGPT


It seems like we do in the threads about chatgpt hype. From what I’ve read every human can do advanced mathematics flawlessly and recall every nuance of every subject with perfect fidelity, write clearly and cogently, and it’s all managed through channeling ether and soul spirit fire through emotions that AIs and Vulcans can’t possess.


I think you're reading way too much into criticisms of ChatGPT as implying humans are immune to the same criticisms. And then transforming them into complete hyperbole.


But a human isn't there for you 24 hours a day and for every thing you want to ask for, at the very least.


CEOs and various intellectuals very much are hyped up.

Then we realize they’re like anyone else and they’re massively demonized


I've certainly met people as confidently incorrect as chatgpt but they are the exception rather than the rule.


Totally. I asked it to describe an obscure lithography technique (rapid electron area masking), and it gave a reasonable summary but at the end claimed it was widely used in industry...it's not used at all.


I asked it about the strong nuclear force and it said the force gets weaker with distance—quite fundamentally wrong (color confinement).


If you are just looking for related vocabulary words, correctness is not a concern.


Is there any reason that ChatGPT is better than a thesuarus?


Well, "I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?" isn't going to be in a thesaurus.


garbage in, garbage out.


No. That doesn't stand up anymore in this case.


Why would ChatGPT be the exception to the rule? What architecture are they using that somehow is immune to unwanted trends in the training data?

If you're going to make an outlandish claim like that, I'd like to see some arguments to back it up.


Well, it's more like a puree of garbage and quality stuff, so you are never quite sure what you'll get in each bite...


Garbage interspersed in the input, garbage interspersed in the output.


ChatGPT can give you bad answers to good questions or good answers to nonsense questions. With ChatGPT it's more like "sometimes garbage".


The garbage is in the source material used to create the model, not the questions.


More likely due to lack of "good" data than to existence of "bad" data. ChatGPT is know for its ability to "hallucinate" answers for questions that it wasn't trained for.


Same comment still applies. ChatGPT sometimes gives good and bad answers.


In fact ChatGPT doesn't know anything about true and false. It's just generating text that most closely resembles text it's seen on similar subjects.

E.g. ask it about the molecular description for anything. It'll start with something fundamental like the CH3N4 etc then describe the bonds. But the bonds will be a mishmash of many chemical descriptions thrown together. Because similar questions had that kind of answer.

The worst part is, it blurts forth with perfect confidence. I liken it to a blowhard acquaintance that will make up crap about any technical subject they have a few words for, as if they are an expert. It's funny except when somebody relies on it as truth.

I don't think GPT3 at its heart is an expert at anything. Except generating likely-looking text. There's no 'superego' involved anywhere that audits the output for truthfulness. And certainly no logical understanding of what it's saying.


I love ChatGPT for simple tasks. It is currently wreaking havoc on some communities tho. Including one I created on reddit.

https://www.reddit.com/r/pinescript/comments/1029r7p/please_...

People have taken to asking ChatGPT to create entire scripts to trade money. When they don't work, they go into chatrooms or forums and ask "why doesn't this work" without saying it was made by ChatGPT. It causes people to open the post, read it a bit and only maybe after a minute or two of wasted time, realize the script is complete nonsense.


I'd argue that level of ambiguity counts as garbage out, although I'm confident it will get better.


Why? chatgpt has certainly consumed seo spam and company marketing materials as part of it's model. Even if a human went through it, there still exists a bias towards this information. After all, this material is specifically written to fool humans.

I've played with chatgpt enough to notice that for some queries it's fundamentally doing an auto-summarize of such content.

Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

My prediction is that those with money will have power to influence their chat bot. Consequently, they'll have access a higher-quality and wider corpus of information. There will not be any restrictions on how chatgpt would answer due to for example, woke agendas. Also, players such as Goldman Sachs would feed their model content generated by their analyst that consumers would not have access to. This already happens but chatgpt will make this information so much more potent.

Furthermore, as this technology continues to improve it will increase the productivity of our population and ultimately generate higher GDP. I'm super excited.


> Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

It currently has the ability to do this. It'll make the citations up, of course – but that behaviour is inherent to the architecture; a system that didn't do that would have to work differently at a fundamental level.

> chatgpt will make this information so much more potent.

How do you imagine this would work?

> and ultimately generate higher GDP.

Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Tangentially: why is "increase GDP" a good ultimate goal to have in the first place?


Citing from a well-defined corpus and making citations up look like very different things at a fundamental level.


>> chatgpt will make this information so much more potent.

> How do you imagine this would work?

Don't overthink it. It's just the nature of the tool. Imagine you're a detective trying to investigate a crime,

- "list the plates of blue hondas in this area at this time, that have a missing rear bumper and a scratched driver side door" - "send a notifications to all gas stations along this route and notify them of a blue honda"

And, if you're a Goldman Sachs analyst, you can just use natural language to gather information. "i have this scenario, list companies that will benefit" would be an abstract question that you'd ask it. Obivously, the system isn't this good yet but you get the idea. You'd just have to ask more fine grained questions and use some of your domain knowledge to fill the gap until it does become this good.

>> and ultimately generate higher GDP.

> Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Google (or chat gpt) would do a better job than me answering this,

"Increases in productivity allow firms to produce greater output for the same level of input, earn higher revenues, and ultimately generate higher Gross Domestic Product."

The reason you want to increase gdp... the following quote was derived from one of Herbert Hoover’s memoirs.

"[Engineering] It is a great profession. There is the satisfaction of watching a figment of the imagination emerge through the aid of science to a plan on paper. Then it moves to realization in stone or metal or energy. Then it brings jobs and homes to men. Then it elevates the standards of living and adds to the comforts of life. That is the engineer’s high privilege."

By increasing GDP, you elevate the standard of living and add to the comfort of life.


> "list the plates of blue Hondas in this area at this time, that have [...]"

I think this shows a significant misunderstanding of what chatgpt does fundamentally. It will never be able to do this unless also fed a description, location, and time of cars in a certain area as context beforehand(either as training data or a prompt). In either case you have access to the data and just need to do a simple search, so chatgpt is providing negative value since it's capable of providing results that don't exist in the dataset.

Similarly for your Goldman Sachs example, you're imagining that chatgpt is greater than it is. It is capable of providing something that would likely follow a given text on the internet at its time of training(aka it's training set) somewhere. It can't reason about new information or situations since it's incapable of reasoning. To believe that it could generate business strategies is to believe that effective business strategies don't require any intuition or reasoning to progress, just statistical recombination of existing strategies.

> By increasing GDP, you elevate the standard of living and add to the comfort of life.

How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.


>>How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.

You are asking a question that is outside of scope here. GDP per capita has been used as a proxy for standard of living for quite some time now.


That proxy only works as long as nobody's optimising for it.

> Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. — Charles Goodhart

GDP (£) per capita in London has doubled since 1998. Has the standard of living "doubled" for the median person? What about the standard of living for the poorest 1%? Has the productivity boost due to automation translated into correspondingly shorter working hours, or correspondingly larger compensation for work done?

What questions do you actually mean to ask, when you talk about GDP?


If something stops being an economic transaction it moves out of GDP. So if ChatGPT reduces Google ad clicks then it doesn’t seem like it would increase it, even though it does increase customer surplus (stuff you get for free).


For me it's a weird mixup in my brain of "interactive Google".

I know the results I'm going to get back are basically the same as if I went to Google, ran a query, it returns me their top 3-5 scraped "blog articles" based on relevancy, and then I ran it through one of those condensing/summarizing bots.

I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

I wonder if this kind of technology will remain free for the foreseeable future. Google has to be coming up with something shortly, right? It's interactive search engine results "on steroids" (I think? I can't tell me if brain is tricking me to be biased that it's cooler/more useful than it is. Everybody I tell about it non-tech isn't that impressed/feels it's spammy/crufty/formulaic).


I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

Because it's like a smart human giving you their best guess. It will never tell you it doesn't know or give you something completely offbase like Google does.

It's friendlier than Google but less accurate.


Is it safe to say in your opinion that Google and ChatGPT are basically trained on the same information?

Google crawls the web/scrapes it/indexes it.

ChatGPT crawls the web (not sure if they have access to Google's internal scrape results, I doubt it), "trains" a model on it, serves it back to you in a "human friendly readable summarized format".

It's just Google from the perspective of "it's going to return the same information Google has" but instead of a search index trying to guess what's relevant it's an interactive language model designed to basically summarize the same underlying blog posts. Is that your opinion/understanding as well?


No, not at all. ChatGPT is trained on the same source information, but when you ask a question there's no guarantee it's answer is directly from an actual source, it's always a newly generated "thought".

Google is a photocopier. It gives you an exact copy of what it finds. Google doesn't create, just references and links to original sources.

Google is a library, but not an author.

ChatGPT is an author, but not a library.

However, ChatGPT has read every book in the library, so when you ask a question it writes you a story from it's memory based on what it thinks* you want. ChatGPT can write stories about books in the library, and it will probably be right (but maybe not).

*Remember the game Plinko from Price is Right? Basically ChatGPT takes your question, drops all the words through its super complicated plinko machine (neural network) and gives you the result.

If you ask it for the names of US presidents, it should give you the same answer as Google - even though it came up with it via the plinko method.

If you ask it for a story about a singing rock, the process is the same as the presidents list. It drops your request into the network and gives you the result. It's not smart, just wildly complicated. It's also never going to be a photocopier (but it might act like one for certain inputs).

----

The brain breaking part is that when you ask ChatGPT for...

"Write me a song about a singing rock"

It changes each word into a number-token, then those number-tokens go through the plinko machine. The result is a different set of number-tokens which it converts back to readable words. Inside ChatGPT it doesn't "know" anything. Rock is a number. Singing is a number. Write is a number.

But it knows the relationships between those numbers, and what other numbers are nearby the area of the network devoted to songs, so it pulls in words and related concepts like a human would.

But it's just numbers with no understanding.

Because it's numbers and not understanding, it can be wrong, either completely or subtly.

Edit: Asking for the list of us presidents has "David D. Eisenhower (1849-1850)" as number 12 (who isn't a person who was ever president). The rest look right, but ChatGPT is subtlety wrong in this case.


Do you see the future being ChatGPT results but with citations? Or is that basically impossible given how it's a "trained model"?


No, ChatGPT doesn't know it's own sources. It's just a trained model. Once the model has been created it's fixed - it can be recreated unlimited times, but it will never tell you the sources for it's output.

Maybe if the network nodes have a source attached to them...

But thinking out loud...

That's not how the number-tokens work. It's at a word level... so "a list of us presidents" is broken down into individual number-tokens for each word, and you can't provide a source for each word.

---

I'm not sure how you combine Google and ChatGPT.

Chat is creative/combinatorial and Google is "just the facts".

ChatGPT and Google are going to have problems going forward. How do both of them determine if the information they find on the internet is from a meat-brain and not a metal-brain.

Happy to be proven wrong.


Maybe by fact-checking its answer?

Question -> "creative" output -> Google -> Summary of links -> Comparison -> confidence level (or re-write) + links that were used for checking

Not so different than how we work in a high-level. I believe that openAI has published a paper called webGPT that has a workflow like this (although not sure its exactly the same)


"condensing/summarizing bots" - never heard of these, will have a play around, thanks.



Actually I find it much better than that for exploratory purposes than just getting search terms. The ability to just keep asking questions for clarification is something that the web was meant to provide with web links, but rarely does a good job of it. If it can simply act as a domain expert that I can talk to, it would be a huge win.


But it's not a domain expert: it's a language model designed and trained to produce language that could plausibly have been written by a human on the internet. At best it functions as a well-informed amateur, at worst it hallucinates nonsense but writes it in a way that is very convincing.


To be fair, you just summarized human discourse especially on places like Reddit and HN.


This makes me think of a quote from one of Dijkstra's lectures:

"In the long run I expect computing science to transcend its parent disciplines, mathematics and logic, by effectively realizing a significant part of Leibniz's Dream of providing symbolic calculation as an alternative to human reasoning. (Please note the difference between "mimicking" and "providing an alternative to": alternatives are allowed to be better.)"

When talking about a tool thats supposedly greater than humans, why should the shortcomings of humans be relevant? The tools we create to surpass our own capabilities should be greater than our own capabilities, not stunted by the same issues.


This comment isn’t helpful or a retort sorry.


It doesn't matter, it can still be extremely helpful.

For instance, I had fragmented memories of a movie, described what I knew about it (about a boy who lived in a trainstation), and it helped me find a couple movies and then narrow in on the one I was looking for.

These types of queries can be super painful with modern search engines but was easy with ChatGPT and a pleasant experience.

I think people are thinking of this AI in the wrong way - where it is "an expert". To me, I like to think of it as a companion that helps us shape and refine our thoughts and ideas.


It may not be a domain expert now, but it easily could be.

For example if you took all the Linux kernel code, the code review comments, the docs, and several of the top books and blogs on kernel development — suddenly you have a system that may be great for new kernel developers to ask questions of. Especially in a community that often isn’t kind to people Jose asking “dumb questions”.


Could it? There's lots of commentary about how it could easily be this or that, but I don't work in ML and have no clue whether it is actually easy or not to tweak ChatGPT to work in such ways.

For example, in my experience ChatGPT isn't very "smart". It has a lot of knowledge but it can't infer any facts from that knowledge. When you ask it to write a program it has no real idea of what it actually does, and you can easily get it to add features it already added, or tell it something is a bug when it really isn't.

This doesn't sound like the stuff you could make a domain expert out of, at least, not out of the box.


I'm not asking it to infer much, but to piece together a fair bit and understnad what I'm asking. For example, here's a chat I had with ChatGPT:

"How does common subexpression elimination work?"

It answered it correctly and gave some basic code examples to demonstrate the concept. Then I followed up with:

"But can it do the elimination if the variables are flipped, but semantically equivalent, like 'y + x' in the example above?"

It again gave what I would consider a correct answer. Note 'x + y' was what was eliminated previously, so I reversed it here.

Then I asked: "Are there cases where the compiler might fail to eliminate common subexpressions?"

And again a good answer.

Now all of this could be found on the web somewhere, but for example the second question didn't show an obvious answer on Google when I searched for it. I'm sure I could find it, but I know this field well. If I was someone new to the field I'd probably spend a lot of time parsing useless articles to find something that answered the question in a way I could understand.

I'm less concerned about it writing code (which is cool). For me, the ability to help me learn an area quickly is far more useful. It doesn't need novel answers, but the ability to understand what I'm asking and answer it. I think it's really close to being able to do this now.


That solves the "garbage training data" problem, but it doesn't solve the "it's just a language model" problem.

If you fine tuned ChatGPT on all the sources you mention, you now have a model that produces results that could plausibly have been written by a domain expert on the Linux kernel, but you don't have a domain expert. It will still hallucinate, because that's a fundamental feature of generative AI, it will just hallucinate much more convincingly.


I get what you're saying, but I'm just not convinced that it will continue to be a huge problem. In some sense, if the state of art is where we're at today with language models, then sure. But I think it'll get better -- in part because I'm not sure humans just aren't souped up language models with some weird optimization functions...


Being a well-informed amateur in everything is pretty impressive though. ChatGPT will be extremely useful if it ever figures out how to say "I don't know".


I don't believe there's any way for a LLM operating alone to recognize when it doesn't know something, because it has no concept of knowing something.

Its one job is to predict the next word in a body of text. That's it. It's good enough at it that for at least half the people here it passes the Turing test with flying colors, but the only kind of confidence level it has is confidence in its prediction of the next word, not confidence in the information conveyed.

If we were to take a language model and hook it up to another system that does have a concept of "knowing" something, I could see us getting somewhere useful—essentially a friendly-to-use search engine over an otherwise normal database.


> Its one job is to predict the next word in a body of text.

“Predicting the next word” and “writing” are the same thing; you’re just saying it writes answers in text. There’s nothing about that preventing it from reasoning, and its training goal was more than just “predict the next word” anyway.


I don't know if I buy this. It feels like your confidence in what you say is closely tied to "knowing". I'm sure there is more research to do here, but I'm not sure if there is a need to "tie" it to some other system. As it stands today there are definitely things ChatGPT doesn't know and will tell you so. For example, I asked it, why did Donald Trump spank his kids -- and it said, "I do not have information about the parenting practices of Donald Trump".

That said, there are a lot of things it does get wrong, it would be nice for it be better at those. But I do think that, maybe much like humans, there will always be statements it makes, which are not true.


"I'm sorry, but I am a text-based AI language model..."


This is a good point.

Something I also enjoy about it is the uniform interface. Each answer is presented the same, there's no parsing layout from different sites, or popup modals to dismiss, or long winded intro to get to the answer you're looking for. Of course you can't quite trust what you're told, so this is a bit moot.


I've found its good for getting me started. Need to do a presentation on something? Type it into chatGPT. It will generate what is basically an okay outline. You can expand on what you like, cut what you dont.

For me getting started is typically the most difficult part (thanks adhd) so this is a huge help.


This is definitely the best use case for these models I've heard. Often when I'm researching a field I'm not familiar with the hardest part is just knowing the vocabulary necessary to express what I want to ask.


Ask Jeeves 2.0!


ChatGPT fabricates lots of stuff, it's deceptive for common queries, but for programming related output, it's easily verifiable and delivers as an extremely valuable search tool. I can easly ask ChatGPT to explain stuff e.g. eBPF details without wasting time looking up the manuals. I hope Bing dominates Google and stackoverflow in this.


It's easily verifiable, but it may still waste time. I've had many cases where ChatGPT makes up functions that do exactly what I need, but then I find out these functions don't actually exist. This may not happen very often for super popular languages like Python or Javascript where training data is huge, but it happens all the time for the long-tail of languages. In those cases, it would've been faster for me to do a regular search.

I do agree with the overall point though. If you understand when to use it and when it's more likely to give you nonsensical answers, it can save a huge amount of time. But when I ask it about a topic that I don't know enough about to immediately verify the answer myself I'm forced to double check the answers for validity, which kind of defeats the purpose.

The best queries to ChatGPT are cases where I know what the answer should look like, I just forgot the syntax or some details. Bash scripts or Kubernetes manifests are examples here, I know them, I just keep forgetting the keywords because I only touch them every few weeks.

And don't get me started about asking ChatGPT about more general topics in e.g. economics or finance. What you get is a well-written summary of popular news and reddit opinions, which is dangerous if it's presented as "the truth" - The big mistake here is that the training procedure assumes that the amount of data correlates with correctness, which isn't true for many topics that involve politics or similar kinds of incentives where people and news spread what conveniently benefits them and gets clicks.


Wasting time and having to be constantly vigilant is exhausting and a slippery slope that makes it easier to fall for deceptive content and settling for "I don't know, it's probably close enough" instead of insisting on precision and accuracy.

Humans take a lot of shortcuts (such as believing more easily the same facts presented with a confident tone) and the "firehose of bs" exploits it: this was already the case before generative AI, but AI amplifies the industrial-scale imbalance between the time needed to generate partially incorrect data and the amount of time/energy required to validate.


Agreed that it is a slippery slope. Programming is understanding - like writing or teaching is understanding. To really understand something, we must construct it ourselves. We will be inclined to skip this step. This comment sums it up well:

> Salgat 8 days ago

> The problem with ML is that it's pattern recognition, it's an approximation. Code is absolute, it's logic that is interpreted very literally and very exactly. This is what makes it so dangerous for coding; it creates code that's convincing to humans but with deviations that allow for all sorts of bugs. And the worst part is, since you didn't write the code, you may not have the skills (or time) to figure out if those bugs exist

https://news.ycombinator.com/item?id=34140585


> To really understand something, we must construct it ourselves.

I think the real power of these bots will be to lead us down this path, as opposed to it doing everything for us. We can ask it to justify and explain its solution and it will do its best. If we're judicious with this we can use it to build our own understanding and just trash the AI's output.


How is that worse than having to look at every online post's date to estimate whether the solution is out of date? Or two StackOverflow results where one is incorrectly marked as duplicate and in the other the person posting the answer is convinced that the question is wrong.

ChatGPT can completely cut out the online search and give an answer directly about things like compiler errors, and elaborate further on any detail in the answer. I think that 2-3 further GPT generations down the line it will be worth the time for some applications.

The problem I see is less the overall quality of responses but people overestimating on where it can be used productively. But that will always be a problem with new tech, see Tesla drivers who regularly take a nap in the car because it didn't crash yet.


Unless the responses in those old online forums where intentionally malicious, they might be reasonably helpful even if not 100%.

While ChatGPT spews out complete nonsense most of the time. And the dangerous part is that that nonsense looks very reasonable. It gets very frustrating after some time, because at first you are always happy that it gave you a nice solution, but then it's not usable at all.


I'm a glass-half-empty sort of person: in my experience, even perfectly good answers for a different version can be problematic, and sometimes harmful.


Unless the training of ChatGPT has a mechanism to excise the influence of now out-of-date training input, it will become increasingly more likely to give an outdated response as time goes by. Does its training have this capability?


Yes.

The trick is to use it as an LLM and not a procedural, transactional data set.

For instance, “how do I create a new thread in Python”. Then ask “how do I create a new thread in Python 3.8”. The answers will (probably) be different.

Any interface to chatgpt or similar can help users craft good prompts this way. It just takes thinking about the problem a little differently.

One wildly inefficient but illustrative approach is to use chatgpt itself to optimize the queries. For the Python threading example, I just asked it “ A user is asking a search engine ‘how do I create threads in Python’. What additional information will help ensure the results are most useful to the user?”.

The results:

> The user's current level of programming experience and knowledge of Python

> The specific version of Python being used

> The desired use case for the threads (e.g. parallel processing, concurrent execution)

> Any specific libraries or modules the user wants to use for thread creation

> The operating system the user is running on (as this may affect the availability of certain threading options)

So if you imagine something like Google autocomplete, but running this kind of optimization advice while the user builds their query, the AI can help guide the user to being specific enough to get the most relevant results.


I understand this works well in many practical cases, but it seems to depend on a useful fraction of the training material making the version distinction explicit, which is particularly likely with Python questions since the advent of Python 3.

One concern I have goes like this: I seriously doubt that current LLMs are capable of anything that could really be called an understanding of the significance of the version number[1], but I would guess that it characterizes the various Python-with-versions strings it has seen as being close[2] so I can imagine it synthesizing an answer that is mostly built from facts about Python2.7. With a simple search engine, you can go directly to checking the source of the reply, and dig deeper from there if necessary, but with an LLM, that link is missing.

[1] The fact that it listed the version as being a factor in reply to your prompt does not establish that it does, as that can be explained simply by the frequency with which it has encountered sentences stating its importance.

[2] If only on account of the frequency with which they appear in similar sentences (though the whole issue might be complicated by how terms like 'Python3.8' are tokenized in the LLM's training input.)


It's all imperfect, for sure. For for instance see this old SO question [1], which does not specify python version. I pasted the text of the question and top answer into GPT-3 and prefaced it with the query "The following is programming advice. What is the langauge and version it is targeted at, and why?"

GPT-3's response:

> The language and version targeted here is Python 3, as indicated by the use of ThreadPoolExecutor from the concurrent.futures module. This is a module added in Python 3 and can be installed on earlier versions of Python via the backport in PyPi. The advice is tailored to Python 3 due to the use of this module.

That's imperfect, but I'm not trying to solve for Python specifically... just saying that the LLM itself holds the data a query engine needs to schematize a query correctly. We don't ChatGPT to understand the significance of version numbers in some kind of sentient way, we just need it to surface that "for a question like X, here is the additional information you should specify to get a good answer". And THAT, I am pretty sure, it can do. No understanding required.

1. https://stackoverflow.com/questions/30812747/python-threadin...


I don't think the issue is whether current LLMs have sufficient data, but whether they will be able to use it sufficiently well to make an improvement.

The question you posed GPT-3 here is a rather leading one, unlikely to be asked except by an entity knowing that the version makes a significant difference in this context, and I am wondering how you envisage this being integrated into Bing.

One way I can imagine is that if the user's query specified a python version, a response like that given by GPT-3 in this case might be used in ranking the candidate replies for relevance: reject it if the user asked about python 2, promote it if python 3 was asked for.

Another way I can imagine for Bing integration is that perhaps the LLM can be prompted with something like "what are the relevant issues in answering <this question> accurately?" in order to interact with the user to strengthen the query.

In either case, Bing's response to the user's query would be a link to some 3rd-party work rather than an answer created by the LLM, so that would answer my biggest concern over being able to check its veracity, though its usefulness would depend on the quality of the LLM's reply to its prompts.

On the other hand, the article says "Microsoft is betting that the more conversational and contextual replies to users’ queries will win over search users by supplying better-quality answers beyond links", apparently saying that they envision giving the user a response created by the LLM, which brings the question of verifiability back to center stage. Did you have some other form of Bing-LLM interaction in mind?


The problem I have with ChatGPT is that it doesn't give me any context to its answer or provide actual resources. Cite your darn sources already.


I am foreseeing a future in which programming language designers match the most sought after functions in google/bing/chatgpt and then implement those that do not yet exist because apparently there is a real need for those.


You can also call it an artificially created need. Many functions exist, but have a different name.


Yes, I had the same thought. LLM’s might be instrumental in new language design. If it can understand the most common structures being used, it makes sense to build libraries, macros, or language features.


I agree. ChatGPT is really really bad. It just makes up stuff and wraps its fabrications in an air of authority.

A "bullshit sandwich" if you will.

When one tells people this we get the reply "but so do random blogs! or reddit comments!". Well yes, but they're just random blogs and reddit comments, often peppered with syntactic and spelling mistakes, non sequiturs, and other absurdities. Nobody would take them seriously.

ChatGPT is very different. It doesn't say "this random redditor says this, and this other random redditor says the exact opposite, so IDK, I'm just a machine, please make up your mind".

What it says is "this is the absolute truth that I, a 'large language model', have been able to extract from the vast amount of information I have been trained on. You can rely on it with confidence."

I'm sorry to sound hyperbolic but this cannot end well.


I like bouncing my code problems off ChatGPT, it can give me an answer and I don't feel bad if I forgot something simple. The issue is I've had it give me completely wrong code only for it to be like "I'm sorry" and provide a second incorrect response.


ChatGPT doesn't say anything of the sort. In fact, it will vehemently insist that what it says is not necessarily true or accurate if you challenge it.


I'm sorry but this is demonstrably false. I have posted examples of this on HN before. Yes, if you tell ChatGPT that it's wrong, in some cases it says "I'm sorry" and tries again (and produces some other random guess). But if you ask it "are you sure?" it invariably affirms that yes, it's sure and it's in the right.


Hm, you're right. I'm pretty sure that it wasn't so gung-ho when I played with it earlier, but now even very explicit instructions along the lines of "you should only answer "yes" if it is absolutely certain that this is the correct answer" still give this response. Ditto for prompts like "is it possible that your answer was incorrect?"


I agree, chatGPT3 shines when the operator has domain knowledge. Otherwise it's a hit or miss.


Using a purpose built (or trained I guess) model for code generation would likely have better results. GitHub copilot is useful for this reason. I find ChatGPT for code is mainly useful if you want to instruct it in natural language to make subsequent changes to the output.


If you follow up about the nonexistent function, it will often implement it for you.

The other thing that I've had success with is asking for references for the information, which will often link you to the relevant docs.


If you ask, there's a good chance ChatGPT can create that function for you. Just tell it: "That function `xyz()` doesn't exist in the library, can you write it for me?"


It does this for Python and JS too


I had a lot of fun with ChatGPT’s wholly fabricated but entirely legitimate-sounding descriptions of different Emacs packages (and their quite detailed elisp configuration options) for integrated cloud storage, none of which exist.

I’m not sure that fabricated nonsense would actually make Bing’s results any worse than they are today.

“It’s okay I don’t mind verifying all these answers myself” is an odd sort of sentiment, and also inevitably going to prove untrue in one sense or another.


Well if it could generate the code, you wouldn’t necessarily care if they existed before your query.


If it generated the code, I would have to audit that code for correctness/safety/etc.

Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.

There really should be more horror at the imminent brief and temporary stint of humans as editors, code reviewers, whatever, over generative AI mechanisms (temporary because that will be either automated or rendered moot next). I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.


> Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.

Well yeah, I'm right there with you. But that feels a lot like any software, open or closed source. Human programmers on average are better than AI programming today, but human programmers aren't improving as fast as AI is. Ten years from now, AI code will be able to destroy your data in far more unpredictable and baroque ways than some recent CS grad.

> I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.

This is a really interesting thought. Are we seeing work evaporate, or just move up the stack? Is it still work if everyone is just issuing natural language instructions to AI? I think so, assuming you need the AI's output in order to get a paycheck which you need to live.

Then again, as a very long time product manager, I'm relatively unfazed by the current state of AI. The hundreds of requirements docs I've written over decades of work were all just prompt engineering for human developers. The exact mechanism for converting requirements to product is an implementation detail ;)


It does such a good job at giving answers that sound right, and are almost correct.

I could imagine losing many hours from a ChatGPT answer. And if you have to go through the trouble to verify everything it says to make sure it's not just making crap up, then imo it loses much value as a tool.


It shows how form matters more than substance. Say real information in some poor structure and people will think you're wrong

Say incorrect stuff authoritatively and people will think you're right.

It happens to me all the time. I can't structure accurate information in a better way then some bullshit artist can spit off what they imagine to be real so everyone walks away believing in their haughty nonsense.

ChatGPT exploits that phenomena which is why it sounds like some overly confident oblivious dumb dumb all the time. That's the training set.

Almost once a week I'll go through a reddit thread and find someone deep in the negatives who has clearly done their homework and is extraordinarily more informed than anyone else but the problem is everyone else commenting is probably either drunk or a teenager or both so it doesn't matter.

Stuff is hard and people are mostly wrong. That's why PhDs take years and bars for important things are set so high


But so do people: I spent an hour yesterday trying regexps that multiple people on Stackoverflow confirmed would definitely do what I needed, and guess what? They did not do what I needed.

Same with copilot. Sometimes it's ludicrously wrong in ways that sound good. I still have to do my job and make sure they are right. But it's right or right enough to save me significant effort at least 75% of the time. Right enough to at least point me in the right direction or inspire me at least 90% of the time.


Self Reply: I just now thought to use Copilot to get my regex and wow! I described it in a comment and it printed me one that was only two characters off, and now I have what I needed yesterday. I'd since solved the problem without a regex.


It's not perfect, but sometimes its amazing. In your case, not only did it provide the right solution, but it was about as fast as theoretically possible. About as fast as if you already knew the answer.

I had a similar experience with a shell command. Searched google, looked at a few posts, wasnt exactly what I needed but close. Modified it a few times and got it working. Went to save the command in a markdown file and when I explained what the command did, copilot made a suggestion for it. It was correct and also much simpler.

It went from taking 5-10 minutes to stumble through something just so I could do the thing I really wanted to do, to finding the answer instantly all from within the IDE. Can keep you in flow.


and then one day https://mobile.twitter.com/Dereklowe/status/1599035870308618... happens and people die.


One day what happens? A person uses it to encourage topical application of a toxic material and publishes the results?

How is ChatGPT enabling this? All of that is very possible without ChatGPT. The damaging part is deciding to do it.


They released a zero day for a security hole in the human brain. That's what ChatGPT is. The security hole is well known and described perhaps the most understandable format is the book Thinking Fast And Slow which describes, ah, if I try to explain I will surely botch but perhaps put it this way: how things that appear more credible will be deemed credible because of the "fast" processes in our brains.

In this particular case, ChatGPT will write something nonsensical which people will accept more easily because of the way it is written. This is inevitable and extremely dangerous.


Humans are still a lot better at writing something nonsensical that people will accept easily because of the way it's written.

Conversely, I just ask ChatGPT to extol the virtues of leaded gasoline, and instead I got a lecture on exactly why and how it's extremely harmful.


> Humans are still a lot better at writing something nonsensical that people will accept easily because of the way it's written.

Some are but not many. And then there's the amount. That's the crux of the matter. Have you seen that Aza Raskin interview where he posited one could ask the AI to write a thousand papers citing previous research against vaccines and then another thousand pro-vaccines? No human can do that.


You know people are already injecting themselves with bleach and horse dewormer without needing an AI generated list of instructions right?

People are just as good at making up convincing sounding nonsense.


> People are just as good at making up convincing sounding nonsense.

Perhaps as you just did, as I can find no one actually "injecting themselves with bleach."

The overall point stands: the difference between reading something dumb and doing that dumb thing is what it means to have agency. I personally don't think we should optimize the world 100% to prevent people who read something stupid from doing that stupid thing.

Or, if that's the path we're going to take, maybe we should first target things like the show Ridiculousness before we start talking about AI. After all, someone might do something dumb they see on TV!


> Perhaps as you just did, as I can find no one actually "injecting themselves with bleach."

Ingesting, injecting, that’s pretty similar. Nobody needs to make anything up there.

https://www.justice.gov/usao-sdfl/pr/leader-genesis-ii-churc...


People have absolutely injected themselves with what's known as "Miracle Mineral Solution", which is essentially bleach. It's more frequently drunk, of course.


I dunno, verifying and adjusting an otherwise complete answer is a lot more rote than originating what that answer would be, and I think that has value.


>It does such a good job at giving answers that sound right, and are almost correct.

For sure. But you have to compare against alternatives. What would that be? Posting to stack overflow and maybe getting a helpful reply within 48 hours.

> I could imagine losing many hours from a ChatGPT answer.

Dont trust it. Verify it.

We expect to ask a question and get a good answer. In reality we should leverage how cheap the answers are.


I agree. Also, sometimes the line between 'almost correct' and 'complete bullshit' is very thin.


The insidious part about chatGPT getting things wrong is that it is a superb bullshitter.

It gives you answers with 100% confidence and believable explanations. But sometimes the answers are still completely wrong.


Knowing little about how ChatGPT actually works, is there perhaps a variable that could be exposed, something that would represent the model's confidence in the solution provided?


I'd say you can't do that, because ChatGPT has no internal model for how the things it is explaining work; so there can't be any measure of closeness to the topic described, as would be the case for classification AIs.

ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.

As a metaphor, think of an intern who every morning is asked to buy all the newspapers in paper form, cut out the news sentence by sentence, and put all the pieces of paper in piles grouped according to the words they contain.

Then, the director requests to write a news item on the increase in interest rates. The intern goes to the pile where all the snippets about interest rates are placed, will randomly get a bunch of them, and write a piece by linking the fragments together.

The intern has a PhD in English, so it is easy for them to adjust the wording to ensure consistency; and the topics more talked about will appear more often in the snippets, so the ones chosen are more likely to deal with popular issues. Yet the ideas expressed are a collection of concepts that might have made sense in their original context, but have been decontextualized and put together pell-mell, so there's no guarantee that they're saying anything useful.


> ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.

No, it does not work that way. That’s how base GPT3 works. ChatGPT works via RLHF and so we don’t “know” how it decides to answer queries. That’s kind of the problem.


Explainable AI specifically Language Models will be a very interesting field to follow then.


something something sufficiently advanced markov chains something something GAI


I don't think so. It doesn't understand what it says, it basically does interpolation between text it copy-pastes in a very impressive manner. Still it does not "understand" anything, so it cannot have any kind of confidence.

Take Stable Diffusion for instance: it can interpolate a painting from that huge dataset it has, and sometimes output a decent result that may look like what a good artist would do. But it doesn't have any kind of "creative process". If it tells you "I chose this theme because it reflects this deep societal problem", it will just be pretending.

It may not matter if all you want is a nice drawing, but when it's about, say, engineering, that's quite different.


It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.

Tuning the temperature and topP parameters you can also make the model avoid low probability completions (useful for less creative use cases where you need exact answers).


> It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.

A proxy for confidence in what exactly?

Language models represent closeness of words, so a high probability would only express that those words are put together frequently in the corpus of text; not that their meanings are at all relevant to the problem at hand. Am I wrong?


In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.

This is because I'm operating under the assumption that sequences of words that appear often in the training set are more likely to represent something correct (otherwise you might as well train on random words). This only holds if the training set is big enough that you can estimate correctly (e.g. if the training set is small a very rare/wrong phrase may appear very often).

Maybe confidence was the wrong word, but for this kind of questions I would trust a high-probability answer way more than a low one. For questions belonging to very specific subjects, where training material is scarce, the model might have very skewed probabilities so they become less useful.


> In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.

Maybe, as long as you're aware that this is the same kind of correctness that you get from looking at Google's first search results (the old kind of organic pages, not the "knowledge graph", which uses an different process - precisely to avoid being spammed by SEO) i.e. "correctness by popularity".

This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system. And you know what they say about big enough lies that you keep repeating millions of times.


I agree, and furthermore, a search engine is constrained to pick its responses from what's already out there.

This line of thought is a distraction, anyway. The likelehood that GPT-3 will do as well as a search engine on topics where there is an unambiguous and well-known answer does little to address the more general concern.


> This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system.

I understand the problem, but what better way do we currently have to measure its connection to reality? At least from a practical point of view it seems that LLMs have achieved way better performance than other methods in this regard, so repeatedness doesn't look like that bad a metric. Or rather, it's the best I think we currently have.


> I understand the problem, but what better way do we currently have to measure its connection to reality?

We can consider its responses to a broader range of questions than those having an unambiguous and well-known answer. Its propensity for making up 'facts', and for fabricating 'explanations' that are incoherent or even self-contradictory shows that any apparent understanding of the world being represented in the text is illusory.


This resonates with me. We have all worked with someone who is a superb bullshitter, 100% confident in their responses, yet they are completely wrong. Only now, we have codified that person into chatGPT.


That might be the problem. Too many bullshitters who like posting online and chatGPT has been trained on them.


I doubt it. Even if it was trained with 100% accurate information chatGPT would still prefer an incorrect decisive answer to admitting it doesn't know.


TBH, a lot of SEO-optimized results are the same, although I think the conversational makes people assign even more authority to chatGPT.


SEO optimized sites can also be identified and avoided. There's various indicators of the quality of a site, to the point where I'm positive most people on HN can know to stay away or bail from one of those sites without even being consciouly aware of what gave them that sense of SEO.


General Purpose Bullshitting Technology. I've always found LLMs most useful as assistants when working on things I'm already familiar with, or as don't-trust-always-verify high temperature creatives. I think that attempts to sanitize their outputs to be super safe and "reliable sources" will trend public models towards blandness.


Have you tried a query like this ?

Add documentation to this method : [paste a method in any language]

For me the results have been impressive. It’s even more impressive if you are not English speaking because it explains what the code does but also translates your domain terms in your own language.

More than code generation I see a really concrete application in having autogenerated and up to date documentation of public methods. It could be generated directly in your code or only by your IDE to help you in absence of human written documentation.

Other interesting things it can does is basic code review by proposing a « better » code and explaining what and why it changed something.

It can also try to rewrite a given code in another language. I haven’t tried a lot of things due to the limitations in response size but for what I tested, it looks like it is able to convert the bulk of the work.

While I’m not really convinced by code generation itself (a la copilot) I truly think that GPT can be a really powerful tool for IDE editors if used cleverly, especially to add meaning to unclear, decade old codebases from which original contributors are long gone.

And knowing that what is hard is not writing but reading code, I see GPT to be a lot more useful here than helping writing 10 lines in a keystroke.


Is this really valuable documentation?

A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

You can do javadoc style params, and there's some value there,but not much.


> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

You are right. It’s the rule when you write the doc.

But when you are let alone in an unknown codebase, having your IDE summarize the "what" in the auto completion popup could be really useful. Especially in codebases with wrong naming conventions.


> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

The "why" is important for inline comments, but for function and method comments I think the biggest is neither "why" nor "how", but "what". As in, "what does this method do?" especially with regards to edge cases.

I tried a few methods just now; it gives okay-ish docs. Lots of people don't write great comments in the first place, so it's about on-par. Sometimes it got some of those edge cases wrong though; e.g. a "listFiles()" which filters our directories and links isn't documented as such, but then again, many people wouldn't document it properly either.


For some AWS automation scripts I wrote, I was able to ask, “why would you use this” and the answer it gave me was impressive.


Just tried this out, and this is great! As you say, code-generation is iffy, but for documentation this is something that can really help with.


Maybe its better in some programming languages, but my experience with verilog/systemVerilog output is that it generates a design with flaws almost every time (but very confidently). If you try to correct it with prompting it comes up with reasonable sounding responses about what its fixing then just creates more wild examples.

One pretty consistent way to see this is to ask for various very simple designs like a n-bit adder, it will almost always do something logically incorrect or syntactically incorrect with the carry in or carry out


ChatGPT has acted as an advanced rubber duck for me. It outputs a lot of bullshit but so often it gives me the prompt or way of thinking needed to move on.

And it’s so much faster than posting on stack overflow or some irc. It doesn’t abuse you for asking dumb questions either.


That's an interesting approach to consider, thanks!


When it works it is great. I've been using it instead of Google a lot too, but when it makes mistakes it requires someone familiar with a subject to detect it. I'm not sure if it is ready to be used as as a search engine by everyone.

For example recently I asked it for the best way to search in an mbox file on arch Linux. It proceeded to recommend a number of tools including mboxgrep. When I asked how to install it on arch it gave me a standard response using the package manager, but mboxgrep is not an arch package. It isn't even an aur package. It requires fetching the source and building it by yourself(if I remember correctly one has to use an older version of gcc too). None if it was mentioned by chatgpt.

This is not the first time BTW, there was another software it recommended that Debian doesn't know about, when I asked it another time.


The key is that it is way faster and has a broader set of knowledge than a human. Being an editor is often easier and more productive than being both a single generator and editor


> Being an editor is often easier

This 100%.

ChatGPT can play an interesting role by separating duties in a process of productivity. ChatGPT can generate tons of true/false suggestions very fast and understandable by humans. Sometimes this helps a lot.


On a related note, I've personally observed that it also helps a lot with:

1. Generating (or simply repeating) obvious ideas in a domain that I am not an expert in

2. (With some prompting) Generating creative ideas in a domain that I am familiar with

3. Generating obvious ideas in a domain I'm familiar with when I'm too tired to think or preoccupied

Not only do you get a productivity boost by being an editor but it also complements human energy cycles


The downside is the risk of atrophying one's own mental ability to generate such suggestions if excessively relied upon. Given my druthers, I'd prefer to be a generator of text ChatGPT would want to absorb than to be a consumer of the mystery meat it is regurgitating.


I tested chatgpt with some domain specific stuff and found it so wrong on the fundamentals that I immediately lost trust in any of its output for learning. I would not trust it to explain anything eBPF related reliably. You are more likely to get something that is extremely wrong or, worse, subtly wrong.


I found ChatGPTs answers relatively accurate for explaining programming related queries, feeding it documentation and asking questions related to that, etc. But I've also tried to use it for travel and health related queries. For travel queries, it confidently tells me the wrong information, "Do most restaurants in Chiang Mai accept credit cards?" got "Yes, most restaurants in Chiang Mai accept credit cards!", which is completely false. Also got wildly inaccurate information about the quality of drinking water. And for health related queries, it tells me the same weasel-worded BS that I get on health spam blogs. I tried to dig out more information regarding sources of both travel and health related information, but ChatCPT simply said it doesn't know the details of the sources of information.

I think a new implementation of ChatGPT is worth exploring though, one that cites sources and gives links to further information, and also one that has the ability to somehow validate it's responses for accuracy.


ChatGPT is adamant that "Sunday" has a T in it.

It could generate a python script that counted the days of the week with the letter T, but still insisted that Sunday had a T

Edit: Scratch that. I just tried again and now it says that saturday doesn't have a T in it


It doesn’t consistently know words have individual letters since it’s trained using byte pair encodings. This is one reason earlier versions of it couldn’t generate rhymes.


When ChatGPT serve me broken code, I would paste the errors back in and ChatGPT would try to make corrections. I don't see why ChatGPT couldn't do that itself with the right compiler, saving me from being a copy and paste clerk.


I asked it how to convert a cell value to an unix timestamp in google sheets and it told to use the "UNIX_TIMESTAMP" and even provided an example.

The function does not exist, it's entirely made up.

What's weirder is that when I told it that the answer was wrong it provided a different solution that was correct.


> wasting time looking up the manuals

God forbid!


I think we should let this C era meme die, the manuals are often terrible. I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).

Python lang docs are "kind-of-OK" but when someone raves about them I'm left scratching my head. Information is not always well-organized, examples are hit-and-miss, parameter and return types not always clear, etc.

Referencing docs as a programmer is generally a nightmare and a time sink, and it's the one use case where ChatGPT is slowly becoming indispensable crutch for me. I can ask for very specific examples that are not included in the docs, or that cannot be included in the docs, for example combinatorial in nature: "how can I mock this AWS SDK library by patching it with a context manager"? Occasionally it will hallucinate, but even if it gets it 8/10 times right - and it's higher than that in practice - it will prove revolutionary at least for this use case.


> I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).

I agree that pretty much all AWS documentation is woeful, and it's a travesty that the service is so expensive yet its documentation is so poor. I would gladly dump AWS and never use it again, as I hate paying top-dollar to decipher the AWS doc team's mistakes (not to mention that they are unresponsive to bug reports and feedback).

My point was made more in jest, and supposed to point out the irony of the communities' changing expectations of what documentation should be like. I predict that in a few years we'll be circling back to prioritizing writing software documentation well. (Kind of like how everybody was hating on XML for the past 20 years and it's now having a renaissance because it actually does what it's supposed to well very well.)


I'm amazed by how divisive it is. I've also been using it to significantly increase my productivity, be that documenting things or having it mutate code via natural language or various other tasks. I feel that if you keep in mind that hallucination is something that can happen, then you can somewhat mitigate that by prompting it in certain ways. E.g. asking for unit tests to verify generated functions, among other things.

I find this tool so useful, that I scratch my head when I read about how dismissive some people are of it.


I think one of the reasons why Python got such a reputation for good docs is because its primary competitors back in the day were Perl and Ruby. Ruby has horrible documentation to this day, and Perl has extensive docs that are difficult to navigate; in comparison with either, Python was definitely superior.


I second this about chatGPT fabrications. When I was going through Tim Roughdarden's YT courses. I almost always have to double check its answers.


> I hope Bing dominates Google and stackoverflow in this

Google will probably build the same thing. Stackoverflow can suffer though...


I believe the exact opposite. If one could prove that text has not been generated by an AI, that would have immense value. StackOverflow has a built-in validation process ("mark as the solution"), which says that some human found that it solved the problem. Doesn't mean it's correct, but still, that's something.

I really wonder what impact ChatGPT will have on search engines. I could imagine that the first 4 pages of Google/Bing results end up being autogenerated stuff, and it will just make it harder to find trustworthy information.


But paradoxically, OpenAI can't let StackOverflow disappear, because in the end I suppose it's one of ChatGPT's main sources for programming content.


For now, but perhaps we are at a level where enough knowledge is there that future solutions can be inferred from the past ones and documentation/code of libraries available on the internet.


>hope Bing dominates Google and stackoverflow in this.

Where do you think it got the information?


where are they going to get a steady fresh firehose of data comparable to stackoverflows? who are the magical entities that will be feeding them all these inputs for bing to claim all the fame?


It's very useful for writing and explaining regular expressions.

But the holy grail would be if it could write all my unit tests...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: