Microsoft is preparing to add ChatGPT to Bing

neonate · on Jan 4, 2023

magicalist · on Jan 4, 2023

Nobody seems to be bringing up the questions inherent to a big company moving on this that will have to follow the "rules":

- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, most sites are farmed content. What happens when even the ok sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023.

wongarsu · on Jan 4, 2023

If language models take over text content, content creators will flee even quicker into creating video content. There's already a trend where younger people tend to prefer video for being more "genuine", and now it might become a sign of "human made" for a couple years. Also easier to monetize, and easier to build parasocial relationships, so all around a plus for creators. Too bad I prefer text.

______ · on Jan 4, 2023

I think the push to video and away from text is a net failure for accessibility and usability, at least for reference use cases.

My example: as a woodworker, I'm often curious about the details of a particular joint or the usage of a particular tool. The great amount of content on YouTube is helpful, but it's incredibly inefficient to have to seek through a bunch of filler or unrelated content to get the answer I need.

Of course, that's "increased engagement" so I'm not surprised it's what is commercially more viable.

all2 · on Jan 4, 2023

That sounds remarkably similar to how recipes are shared in blogs. There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged. If I'm looking for a recipe, I want to know the recipe so I can make it. I don't care about what the blogger did last weekend or in college.

logifail · on Jan 4, 2023

> There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged.

It's not about engagement, it's about copyright.

Recipes - in the form of lists of ingredients and the method - are not typically protected.

However, add a huge rambling story about how Grandma handed this recipe down to you when you were five and on holiday with her in $place, hey presto, it's protected.

GoToRO · on Jan 11, 2023

It's not for engagement. Some sites have now a Jump to recipe button. It's for google that said that if you write normal text they will send you a ton of traffic. What people figured out was that unless you spam the recipe with keywords repeated at least 20 times, the google bot will not understand what the text is about. Maybe google was forced to do this, but that's how it works and it contradicts how they said it works.

lowbloodsugar · on Jan 4, 2023

I read that the recipes are actually bullshit. Written by content farms eating instant noodles, not anyone remotely involved with a kitchen.

webinvest · on Jan 5, 2023

Google* how long to pressure cook white or brown rice and you’ll see widely differing answers. Like shots all over a dartboard. They can’t all be correct — it’s just rice.

I wonder if many of them care more about CPM rates and page visits than actual recipe accuracy.

  *or Bing, DDG, Kagi, etc if you prefer although I haven’t tried.

harshalizee · on Jan 6, 2023

I would somewhat disagree with that. My household eats rice on a daily basis and the timings for different kinds of rice varies wildly. Basmati, Sona masuri, jasmine, risotto, jeera samba rice have very different water and rice measures. And that's just white rice! Other rice variations are a whole different ball game.

Breza · on Jan 10, 2023

I strongly recommend the books Cooking for Geeks and The Food Lab. In both books, the authors explore a variety of different approaches and show their math.

gofreddygo · on Jan 5, 2023

second order effects of this preference for video is how poorly video content gets indexed.

With text, searching of obscure things is cumbersome but possible. With video its impossible.

Meaning I, as a user cannot take the shortest path to my target content simply because of the medium.

I now default to looking for really old books on my topic of interest, or authoritative sources like textbooks and official documentation and then skim and weed through them to get to a broader understanding. Very often this has led to me on to better questions on that topic.

Online I prefer to look at search results from focussed communities, reddit, HN, StackOverflow, car forums, etc. I just never go to video for anything beyond recipes , quick fixes to broken appliances and kids videos.

ethbr0 · on Jan 11, 2023

(Old post, but you made a good point)

I finally realized what actually bothers me about shopping physically vs online these days is (a) the lack of "sort by price, ascending" & (b) the lack of ability to get a reference or "fair" price for similar items.

Similar, with video the key missing feature is deep search.

It's mind bogglingly sad YouTube didn't focus more on improving this after being acquired: they have all the components to build a solution! And it's a natural outgrowth of Google's dead tree book digitization efforts!

I assume it was harder than just relying on contextual signals (links and comment text) to classify for ad targeting purposes. Also probably why they also incentivized ~10 min videos over longer/shorter.

Which is sufficient for advertisers, but utterly useless for viewers.

It makes me cry that we're missing a future where I could actually get deep links to the portion of all videos that reference potatoes (or whatever).

anigbrowl · on Jan 4, 2023

That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.

drdaeman · on Jan 4, 2023

May I suggest a simpler and smaller scope? An AI converting speech to text, extracting a bunch of still frames (or short video rolls) as illustrations (where relevant) and making it an ol' good readable article?

Then it can be fed to the search engines and those would do the rest of the job just fine.

anigbrowl · on Jan 4, 2023

I think that will just multiply clickbait and those making the most substantive contributions will be ripped off by SEO/content farmers.

logifail · on Jan 4, 2023

> That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.

Q: Why would your $videoPlatformOfChoice allow a commercial AI bot to scrape boatloads of videos, abstract the information, then serve that information separately somewhere else .. possibly while serving their own ads(!)?

anigbrowl · on Jan 4, 2023

Scraping is legal, plus how will they even know?

6gvONxR4sf7o · on Jan 4, 2023

Once AI can do all that with video, then we’re at about the point where automated video spam is too high also.

Beaver117 · on Jan 4, 2023

SponsorBlock is the response. It's a crowdsourced extension that labels parts of the video, like sponsor segments, highlights, intro/outro, etc. Very useful, you can skip through useless segments.

belltaco · on Jan 4, 2023

I prefer text too but I feel like that's mostly because the videos are not information dense on purpose. They expand to whatever the youtube algorithm prefers at the time, which is about 10 minutes now. Ironically, tiktoks are more information dense but the search is completely useless.

bronco21016 · on Jan 4, 2023

I’m finding more and more that the information density isn’t there because the video content is actually just an infomercial for a “course”.

forbiddenvoid · on Jan 4, 2023

I think we're very close to the point that even video won't be confirmable to be genuine. If it could even really be said to be so now. (Instagram/TikTok are the most performative/contrived content platforms these days)

throwuwu · on Jan 4, 2023

Nope, there are already several services transcribing the audio content of video so expect that to be ingested too. You’ve seen the video suggestions with timestamps in google search right?

wongarsu · on Jan 4, 2023

Oh, I'm aware ofhow well video transcription works. Once the lower-hanging fruit are dealt with, video content will absolutely flow into language models. But still, the video component is a key differentiator that AI can't easily mimick right now (at least not to a level where we can't tell). So users that want a personal opinion instead of a GPT-generated text are likely to turn to consuming videos.

mola · on Jan 4, 2023

So regressing to a fully oral culture... Odd times

jsphweid · on Jan 4, 2023

The digital world is the native environment for the AI race we're creating. In that world us biological humans are relatively slow and inferior. And if this "handing the intelligence baton to machines" trend continues then "regression" to our more native communication forms feels natural and inevitable.

taftster · on Jan 4, 2023

That's some interesting insight. Thank you. When I read your comment, I was envisioning us all sitting around fires in caves with animal skin togas talking about the latest HN post (which presumably was Carl scribbling down something on the rock wall).

jredwards · on Jan 4, 2023

But one that can be catalogued and relayed by robots.

anigbrowl · on Jan 4, 2023

Good, the less I have to see of their clickbait and the more time my competitors waste watching videos the better. Video has its uses and when it's good it's very very good, but most of the time it's terrible dreck that steals people's time using cheap emotional manipulation.

I've been thinking about training an ML model to detect those 'Pick Me!' poster frames that highlight the e-celeb presenter making some kind of dramatic reaction face and just filter them out of search results. This is partly what happens when SEO types combine with black box algorithms; the lowest common denominator content starts to swamp everything else, a kind of weaponized reversion to the mean.

ren_engineer · on Jan 4, 2023

there are already custom AI avatars and text to speech, there are already people using GPT to create text and then using other services to create the audio and dynamic videos at scale

xhkkffbf · on Jan 4, 2023

Exactly. Several of the highly ranked YouTube videos that were recommended to me recently were clearly made by some AI doing a mashup of imagery with some text spoken by some text-to-speech algorithm.

ikt · on Jan 4, 2023

could it somehow get access to the subtitles and then use them to answer queries?

also i hope this comes to ecosia, would like to experiment and try it at least

throwaway09223 · on Jan 4, 2023

> " could it somehow get access to the subtitles and then use them to answer queries?"

It's not even necessary - computers are already excellent at understanding spoken words. Have you tried automatic captioning recently? Half the inputs to my phone are already voice, not text.

Video is a harder problem, but it's not too far behind.

jadbox · on Jan 4, 2023

Exactly, and many bots exist today to mine user videos for the automated subtitle information. In other words, there's no escaping GPT from learning from any kind of medium.

dalbasal · on Jan 4, 2023

These questions are constant. I do think you bring up relevant issues, but they aren't quite showstoppers.

Websites allow SE crawlers because (a) whatever traffic they get is better than not traffic (b) because allowing crawlers is default and doesn't cost anything and (c) google/bing don't negotiate. They are one, sites are many.

This has already played out in news. News outlets wanted Google to pay for content. Google (initially) responded by allowing them to opt out of Google. Over the years, they have negotiated a little bit. Courts, in some places, forced Google to negotiate... It's news and politicians care about news specifically. Overall though, there have not been meaningful moments where people got pissed off with Google and blocked crawlers. Not newspapers and not anyone else. Site owners being mad doesn't affect google or Bing.

What does matter to search engines is walled gardens. Facebook pioneered this, and this does matter to Google. There is, in a lot of cases, a lot less content to index and serve users. All those old forums, for example.

These are search problems, and GPT-based search will inherit them. ChatGPT will have the same problem recommending the best air fryer as normal search does. GPT is a different way of presenting information... it's not presenting different information.

RE: Lawsuits. Again, history. Youtube, for example, started off with rampant copyright infringement. But, legal systems were primitive. Lawyers and legislatures didn't know what to do. Claimants were extremely dispersed, and would have had to pioneer case law. Ultimately, copyright took >10 years to really apply online and by that point youtube and other social media was entrenched.

The law lags. In practice, early movers are free to operate flawlessly and they get to shut the door after them. Now that Google is firmly entrenched, copyright law serves as one of their trenches.

jredwards · on Jan 4, 2023

Incidentally, law seems like an incredibly powerful potential application for ChatGPT.

neel8986 · on Jan 4, 2023

This is an extremely important point. Something like ChatGPT without attribution can completely kill the open web. Every company will keep their information in closed walled garden if no traffic is flowing through them . I don't see a scenario where something like stackoverflow can exist if no one goes to the site.

imhoguy · on Jan 4, 2023

I think StackOverflow will exist and do well. 1st, it is source of information for ChatGPT itself so if there would be no new content then AI is going to implode too. 2nd, very often I skip top answer because it has some edge cases or simply is outdated. The answer comments often highlight such issues. I don't think ChatGPT could be trusted without verification, not in serious programming work.

phpisthebest · on Jan 4, 2023

I see Stack overflow as one of the problems here.

StackOverflow went along way in killing the Tech Blog, and the number of "right" but poor answers on Stack sites are at an all time high

Often the "best" answer on those sites is buried or even downvoted in favor of an answer that "just works" but may have security issues, maintainability issues, is out dated, etc.

In alot of area's I find Stack answer to be of low quality if you happen to have any indepth knowelege of that area.

OOPMan · on Jan 4, 2023

Indeed.

They should be renamed to ShitOverflow, because that's how bad the quality is a lot of the time.

emoII · on Jan 4, 2023

On the first point, that is no guarantee that users will stay on the site. The AI is currently only using data from 2021 and earlier as far as I'm aware, and does so without feeling out of date. Before we see any significant signs of the AI imploding due to lack of new information, SO might well be long gone

ItsMonkk · on Jan 4, 2023

What this is going to allow is a way to flatten org-mode, which will massively expand the amount of people willing to use it. Put anything you wish into your own data collection, and you can instantly pull it up with a prompt. That service would then allow anonymized queries of other peoples data.

If we don't get AGI, the LLM that are starting now and don't have fresh data from people's queries won't be able to get going. The internet will quickly become stale. This will be sped up by the spam that the LLM will be used to create.

Walking through this scenario I don't see anyway for this not to end in a network effect monopoly where one or two services wins.

Workaccount2 · on Jan 4, 2023

Maybe we can return to people sharing information/websites purely for the passion of sharing what they love, rather than the greed fueled mess we have today.

ctoth · on Jan 4, 2023

Oh gosh, maybe we'll actually have to pay for things, and we'll find that the market for the fifth random blog trying to make money off of free information using ads doesn't really exist. What a terrible world this will obviously be.

No. The weird thing is this idea that because you put ads on your site, you deserve money. Your ads are making the Internet worse. You probably don't realize this, because you most-likely use an ad blocker, which means you want people too dumb to use ad blockers to subsidize the web that you can use for free, but the current web is working well for approximately no one.

Would I pay $5 a month for StackOverflow if it didn't show up for everything I Google? Most likely. Would this be a better world? almost certainly. We tried the thing with ads. It sucks. I welcome our new AI search overlords.

skinnymuch · on Jan 4, 2023

Why would you want power centralized? Big corporations are never your friend.

int_19h · on Jan 4, 2023

Power is also centralized when most supposedly independent actors buy ads from the same large advertisers, and utterly depend on their income from those ads to do whatever they're doing.

jackbrookes · on Jan 4, 2023

Websites will optimise for AI eyes rather than Human eyes. Advertisers will pay to embed information in the websites that is read by AI, which subtlety makes the advertisers' products more valuable in the eyes of the AI. Then the AI would ultimately spit out information to users that are biased towards the advertiser's products.

sweezyjeezy · on Jan 4, 2023

That sounds like an incredibly difficult sell to the advertisers.

spaniard89277 · on Jan 4, 2023

It isn't. IDK in the anglosphere, but in the hispanic world this already being done, and for years. It's platforms where you buy articles from websites (even some newspapers), and even more, you can share the cost of an article between a number of advertisers.

Of course the impact of this has been inmense and the spanish internet is filled with the same crap as the anglo internet, and trustable sites are piled under tons of noise.

I had to map a bunch of communities in spanish and post it in my blog because they don't appear in the search results anymore. Just to remind myself that they're out there.

I'm planning to do the same with blogs.

I guess we're going to rediscover directories and the problems associated with them, but currently the 'open internet' is a mess.

ChatGPT tools will just change how money flows and the incentives. Lots of spammers will get out of business, but many others will thrive. No ads, just deception.

skellera · on Jan 4, 2023

This already exists in the US. All of the “PR news” sites are just paid PR releases. They make the product/company look good while spreading it over many sites to boost SEO and recognition but would also cover this.

woeirua · on Jan 4, 2023

We already know that advertisers aren’t willing to pay that much for “subliminal” advertising. People have been trying to do product placement in movies and shows forever and it’s never really taken off.

napoleongl · on Jan 4, 2023

The entire concept of an Influencer is just a front for product placement. The difference nowadays is that people are actively looking for the commercials and ignoring the movie.

brookst · on Jan 4, 2023

Product placement is everywhere. Next time you watch a movie or show, look for the clothing brands, computer brands, car brands, wine brands, etc. everywhere.

And think about sponsorships. From soccer to nascar, sports is covered with branding.

Spivak · on Jan 4, 2023

That's not subliminal, you're describing sponsorships (i.e. manufactured social proof).

brookst · on Jan 4, 2023

"Subliminal" and "sponsorship" are totally orthogonal. One refers to the presentation, the other the business arrangement.

shkkmo · on Jan 4, 2023

This seems factually incorrect. It's hard to find consistent historical numbers but what I can find implies pretty steady double digit growth over the last decade or two.

If you have good sources that say otherwise, I'd love to see them.

hooande · on Jan 4, 2023

> "Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023."

ChatGPT doesn't include anything written after 2021. I certainly wouldn't use it to find an air fryer. The results will be from over a year ago. I would want to see what the newest air fryer options are and it would be really important to have to up to date pricing.

AFAIK there is not a way to update a large language model in real time. You have to train on the entire dataset to do a meaningful update, just like with most forms of neural networks. For ChatGPT that takes days and costs hundreds of thousands of dollars. Every time.

It's great for explanations of concepts, and programming, and a few other things. But with the huge caveat that all of the information you're looking at is from one year ago and may have changed in that time. This really limits the utility of ChatGPT for me.

jacooper · on Jan 4, 2023

OpenAI is already working on solving this

https://openai.com/blog/webgpt/

sho_hn · on Jan 4, 2023

Neat! I've seen so many discussions of the cost of continually retraining ChatGPT with new knowledge (and the energy efficiency of that, etc.) but had a similar thought that you can probably use a GPT-like approach to do "next word prediction" for a command-based web crawler to gather up to date data and then use the GPT-we-already-have to combine/integrate found content using the classic next word prediction.

Sometimes I feel that what makes humans cool is that we (well, some of us!) have good internal scoring on when we lack knowledge and must go in search of it which makes us go down different branches of next-action-in-line.

int_19h · on Jan 4, 2023

Someone pointed out that the energy cost of training GPT is roughly on par with a single transcontinental flight. If so, I don't think this is a limiting factor in any meaningful sense - you could spend that much energy daily, and it would still be a drop in the bucket overall for any moderately large business.

Xelynega · on Jan 5, 2023

The bottleneck would be the number of workers on sites like mechanical turk available to create the datasets. Might take a few more years before amazon and facebook get enough third world countries to the point they can exploit their labour online to create daily training sets.

bamboozled · on Jan 4, 2023

I would imagine trying new datasets on daily basis wouldn’t be trivial ?

wepple · on Jan 4, 2023

That’s a very solvable problem though. If Microsoft decides to integrate ChatGPT with bing, they have the resources to retrain the model on a more recent data set, and even do it somewhat regularly

typon · on Jan 4, 2023

You don't even need to retrain if you use retrieval transformers. That is the real revolution waiting to happen. Deepmind already unlocked it with RETRO, but I don't know why a public version hasn't been released - hooked into the live internet.

IanCal · on Jan 4, 2023

OpenAI have webgpt too https://openai.com/blog/webgpt/

VLM · on Jan 4, 2023

> Where is that information going to come from, exactly?

Manufacturers, with quality ranging from excellent to trash.

Consider trying to buy a 1K resistor at Digikey using their parametric search. Possible, but tedious and time consuming because you need a lot of domain knowledge to know what you want, and the technological range of "things with 1K of resistance" is extremely vast. At least its possible because the mfgrs are honest when Digikey imports their data.

Consider the opposite, consumer goods. 500 watt PC power supplies with random marketing number stickers on the same chassis ranging from 500 to 1200 watts. Consumer level air compressors and consumer level vacuum cleaners than plug into household wall outlets claiming "8 horsepower" or whatever insane marketing nonsense. Clothes with vanity sizing so a "medium" tag fits like a real world XXL. Every processed food in a store with a "keto" label is high carb sugar added garbage, much like happened with "organic" label in the old days (the employees at the farm, distributor, warehouse, and/or retail store level take the same produce out of one bin and put it in two places with different prices)

I think it will help when purchasing technical engineering type products but be an epic fail at inherently misleading consumer goods.

Xelynega · on Jan 5, 2023

If you're trying to search for a specific resistor without the prerequisite domain knowledge, how will you be able to vet whether or not the answer given by a language model meets your needs?

Imagining that language models like gpt will ever be able to index up-to-date information is literally trying to apply the concept of "artificial intelligence" to a probabilistic language model. It's incompatible with what it's actually doing.

Dwolb · on Jan 4, 2023

Maybe manufacturers could upload their design docs and ChatGPT could learn exactly what the object does and what its performance parameters are.

6gvONxR4sf7o · on Jan 4, 2023

Put SEO into the picture and things get hairier. Incredibly realistic spam is about to go through the roof, so search engines will have an insanely harder time distinguishing between useful content and spam.

Making money from search traffic to your (presumably useful) site is going to get harder in a bunch of ways, due to generative models.

danielrhodes · on Jan 4, 2023

I don't see why this would be a copyright violation anymore than somebody learning something from multiple sources and reformulating what they learned into an answer to a question. As long as it isn't explicitly reciting its training data, there shouldn't be an issue of copyright.

_huayra_ · on Jan 4, 2023

> Barely anyone is making money writing reviews today, most sites are farmed content.

I'm sure ChatGPT will be able to write a bunch of terrible SEO prose that precedes the actual air fryer review (or worse, recipe) about how the author's grandma had an air fryer when she was young and remembered the great times with her grandma (etc), for roughly 95% of the text!

In all seriousness, being able to swerve all that terrible SEO content on reviews will always be welcome!

darawk · on Jan 4, 2023

> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

> fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use. If it were spitting out exact copies of things it had read, it would probably be pretty easy to train that behavior out of it.

> I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, it's mostly farmed content. What happens when even those sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything fresh.

I do agree with this, though. The LLMification of search is going to squeeze revenue for content creators of all kinds to literally nothing, at least if that content isn't paywalled. Which probably means that that's exactly where we're headed.

magicalist · on Jan 4, 2023

> I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

> I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

That case was limited to the CFAA, but you seem to get the gist of what I'm saying when I specified it's different when it's Microsoft doing the scraping. If Bing starts ignoring robots.txt and data still start showing up in their results, all the early 2000s lawsuits are going to be opened back up.

> It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use.

Unless there's a reason for them to be considered fair use, derivative works are going to lose a copyright suit. And what's the fair use argument? If I'm the only one on the internet saying something and suddenly ChatGPT can talk about the same thing and I'm losing money as a result, there's no fair use argument there. Search engines won those early lawsuits by being transformative (index vs content), minimal, and linking to their source. None of that would apply here.

jaspax · on Jan 4, 2023

What GP means is that ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative. Instead, it resembles what you'd get if you asked a (credulous and slightly dumb) human to read a selection of documents and then summarize them. These kinds of summaries are absolutely not copyright violations, even if the source document can actually be identified.

Xelynega · on Jan 5, 2023

> ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative.

Isn't this exactly what a court case would be trying to clarify? If so wouldn't assuming this be begging the question?

krono · on Jan 4, 2023

There exist other laws, jurisprudence, and even entirely different judicial systems besides those currently used in the USA!

toteno · on Jan 4, 2023

Sadly, seem like the decision in that case was changed. From your link:

> In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.

int_19h · on Jan 4, 2023

It wasn't changed, it's just that there's more than one issue at hand: the earlier decision was that hiQ didn't violate CFAA, the later one was that it did violate LinkedIn's EULA. The November 2022 ruling specifically states that hiQ "accepted LinkedIn’s User Agreement in running advertising and signing up for LinkedIn subscriptions" - keep in mind that LinkedIn profiles haven't been public for a while in a sense that logging in is required to view them, and thus to scrape them.

Hence why OP is saying that this all will lead to increase in paywalls and such, and a reduction in truly public content.

cmrdporcupine · on Jan 4, 2023

My guess is your first point is exactly why Google hasn't done this yet. Their 'knowledge boxes' are already crossing a line that in general they felt nervous about crossing historically, but they don't go very far.

Google on the whole historically did not want to alienate publishers (and the advertisers that hang out on publisher content) and has avoided being in the content production business for this reason.

WhiteNoiz3 · on Jan 4, 2023

IMO this is the big problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions. My greatest hope for the future is that somehow we can incentivize people to produce "good" information for AI based assistants and move away from the rage/shock based advertising model that most of the internet currently uses. Personally I would rather pay a few cents for a query that produces valuable results and doesn't put me in a bad mood than pay with my time and attention like we do today. AI systems will absolutely need to be able to identify the training sources with every result (even if it is coming from several sources) and those sources should be compensated. IMO that's the only fair model for both image and text generation that is based on authors and artists work.

andsoitis · on Jan 4, 2023

> problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions.

I think this problem is orthogonal to the internet as medium, though I’ll concede that it has proven to be the biggest amplifier of this dynamic.

Correct (or correct as far as humans know, or most likely correct, etc.) costs money to create. False or completely made up information costs nothing, plus has the potential upside of sensationalism, thus further increasing its ROI.

Agree with your point about developing more incentives for correct information and penalties for false.

int_19h · on Jan 4, 2023

It's not just that there's no incentive for that, but there's a very strong incentive to do the exact opposite:

https://www.youtube.com/watch?v=rE3j_RHkqJc

schaefer · on Jan 4, 2023

How I’d like to weather this storm:

1) Everyone cryptographically signs their work for identity confirmation.

2) there exists a blockchain whose sole purpose is to allow content creators to establish copyright date on a digital piece of work.

3) a public that uses the two items above what evaluating the reputation of an artist.

p0pcult · on Jan 4, 2023

This seems to make a lot of sense. The artists themselves also have an incentive to be blockchain validators/miners, thereby reducing the need for token payout, and the subsequent speculation that comes with tokenization (I think).

marwis · on Jan 4, 2023

You don't need blockchain for cryptographic timestamp

schaefer · on Jan 4, 2023

I've got some reading to do[1]. Thank you for the head's up.

[1] https://en.wikipedia.org/wiki/Trusted_timestamping

codethief · on Jan 4, 2023

How does that prevent anyone from using ChatGPT to generate new (supposedly human-written) content?

p0pcult · on Jan 4, 2023

It doesn't; however, the signature in your hypothetical doesn't correspond to a known/trusted author.

throwaway4aday · on Jan 4, 2023

A language model that provides answers with sources (which could be found using a traditional search engine that searches the corpus that the language model is trained on) would be very useful and would also allow it to link directly to the source material. The trouble would be in finding the exact sources since the output of the language model is unlikely to be verbatim but current search engines can deal with imprecise queries fairly well so it's not an intractable problem. A very well curated data set would help this immensely.

I'd be super interested in a language model that was able to synthesize knowledge drawn from a large corpus of books and then cite relevant sections from various titles.

sebzim4500 · on Jan 4, 2023

>- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

They will send at least some visitors, which is better than the zero visitors you will get from bing if you block it.

>- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

Yes and microsoft has lawyers, who have presumably determined that the cost of fighting these frivolous lawsuits is not overwhelming.

waynesonfire · on Jan 4, 2023

> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

You tell me! It's your site. If you want money maybe you should charge for your content? And honestly, the web that Google presents is just so terrible that I don't want to visit your site, unfortunately. And, maybe it's a price worth paying.

hankchinaski · on Jan 4, 2023

>Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

Google already does this with featured snippets

sct202 · on Jan 4, 2023

Which were already highly unpopular with websites, but at least have some attribution.

brookst · on Jan 4, 2023

So we’re speculating that the Bing chatGPT implementation will crawl public websites, answer queries strictly from its training data, and present unattributed snippets?

That does sound both flawed as a search engine and objectionable to site operators. In addition to not being announced or even rumored to work that way.

So, maybe the implementation is different from that model?

rmbyrro · on Jan 4, 2023

Their plan is to use Neuralink to pull all the information from people's brains.

bioemerl · on Jan 4, 2023

> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

How can you refuse? The only way I know would be to require an account, but even then they could bypass it.

rafale · on Jan 4, 2023

Major search engines honor robots.txt

bioemerl · on Jan 4, 2023

If it becomes standard to have such a file and it effects their bottom line, could they disregard it?

Xelynega · on Jan 5, 2023

If the file that were previously honored as consent to use the copyright material is subsequently ignored, wouldn't the content creators take the indexers to court for copyright infringement?

asah · on Jan 4, 2023

Yeah, not helpful. But long winded.

There are many air fryers on the market and the best one for you will depend on your needs and preferences. Some factors to consider when selecting an air fryer include size, price, features, and overall performance. Some popular air fryers to consider include the Philips Airfryer, the Ninja Foodi, and the Cosori Air Fryer. It might be helpful to read online reviews and compare the features of different models to find the one that works best for you.

spion · on Jan 4, 2023

I've found the greatest success with ChatGPT when I use it as a learning / exploration tool. If there is a topic I don't know much about, I can state the question in a fairly stupid way and ChatGPT will give me vocabulary options to explore.

For example, you could describe a probabilistic process to it and ask it what kind of distribution / process it is. Then, based on the extensive words you get back, you can continue your research on Google.

As such I think search engine integration is a really great idea, looking something like the follows

-> user: Hey searh engine, I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?

-> engine: It sounds like you are describing a Bernoulli process. In a Bernoulli process, there are only two possible outcomes for each trial: success or failure. The probability of success is constant from trial to trial, and the trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.

Here are some results on how to calculate probability of consecutive successes in a bernouli trial (result list follows)

(Note: if you try to ask this from ChatGPT it will not actually give you a correct answer for the calculation itself as there are some subtleties in the problem. But search results of "bernoulli process" will tend to contain very reliable information on the topic)

Edit: You could even just say "could you give me good search queries to use for the following problem" and use the results of that.

m3kw9 · on Jan 4, 2023

My experience is that gpt gives me a very good looking answer, but when doing a cross check, it’s often slight wrong or out right wrong

wmeredith · on Jan 4, 2023

This honestly sounds like the same experience one gets when talking to humans :)

I'm only half joking. Even experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.

wolverine876 · on Jan 4, 2023

That is the claim made by AI proponents every time it fails, such as for self-driving cars - humans make mistakes too. Humans make math mistakes, but I wouldn't be satisfied with a calculator that does.

ChatGPT is a tool; it's value depends on how well I can trust it. Humans are not tools.

> experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.

Another typical argument - everyone makes mistakes, therefore my mistakes aren't relevant. Everyone can do math, but there's a big difference between my math and Timothy Gowers. Everyone lies and everyone tells the truth at times, but the meaningful difference is in degree - some do it all the time, with major consequences, take no responsibility, and cause lots of harm. That's different than the person committed to integrity.

pksebben · on Jan 5, 2023

To speak as a proponent, it's not about the er... "relative relevance" so much as the utility.

there are things about a chat model that you can't say about humans, like, it's not really ethical to keep a human stuffed in your pocket to be your personal assistant at your whim.

I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it. stuff like:

- brainstorming - rubber duck debugging - casually discussing a topic - exploring ideas / knowledge - study groups (as in, having other semi-knowledge entities around to bounce ideas off of, ask questions, etc)

when it comes to self driving cars, well, that's a bit of a different story and really is more a discussion about ethics and law and those standards. I, and others like you speak of are held of the opinion that the expectation for autonomous vehicles is a bit high given the rates of human failure, but there's plenty of arguments to be made that automating and scaling a thing means you should hold it to a higher standard anyway. I don't think there's a correct answer on this one - it's complex enough to be a battery if opinion. You mention the potential for harm, and certainly that applies here.

I'm less worried about chatgpt being wrong. Much less likely to flatten me at an intersection.

wolverine876 · on Jan 5, 2023

> I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it.

Maybe, but look at it this way: Do you work in business? If so, step back and reread that - it seems a lot like a salesperson finding a roundabout way to say, 'my product doesn't actually work'.

williamcotton · on Jan 5, 2023

It’s either useful or it isn’t. Comparing AI to either human intelligence or rules-based computing tools is incoherent. Fucking stop it! What we are really talking about are the pros and cons of experiential, tacit knowledge. Humans can do this. Humans can also compute sums. Computers are really good at computing sums. It turns out they can work with experiential knowledge as well. Whodathunk.

What we should be saying is this: there will always be benefits of experiential knowledge and there will always be faults with experiential knowledge, regardless of man vs. machine.

adventured · on Jan 4, 2023

ChatGPT is just your average Reddit user.

Even when it's wrong, it's confidently wrong.

BbzzbB · on Jan 4, 2023

Perhaps because it is trained on Redditors and co.

WaffleIronMaker · on Jan 4, 2023

I know this is a joke, but I think it's important to recognize that it's because the ChatGPT language model does not have the ability to introspect and decide how accurate its knowledge is in a given domain. No amount of training on new input data can ensure it provides accurate responses.

je42 · on Jan 5, 2023

That applies to humans as well.

circuit10 · on Jan 5, 2023

No it doesn't, humans can recognise when they don't know something, current language models usually can't (yet)

Their training objective, which is to predict the next piece of text in their training data, does not incentivise them to respond that they don't know something, as there no relation in the training data between the AI not knowing something and the correct next text being "I don't know" or similar

z3c0 · on Jan 4, 2023

I'd sure hope not. Reddit comments are a masterclass in disguising ethos and pathos as logos.

I'd expect that the boring reality is that it's trained on highly ethos/logos text (academic works) and thus always presents itself as such, even when its weights cause an invalid assertion.

robswc · on Jan 4, 2023

Reddit is exhausting. One big feedback loop. People will say anything to get good karma or avoid saying certain things to avoid being down voted. If there is even just a slight majority in the way the group thinks, it will soon become the dominate opinion.

For example, there was a voice actor that lied about being paid a pitiful sum of money for a gig. Everyone took her side initially (as one should _if_ it were true) but the people saying "well, this just seems odd" were being more or less attacked and told their opinions were awful.

The quality of discussions I have on HN and niche forums are 100x better than reddit.

colejohnson66 · on Jan 5, 2023

TBF, the same can happen here to a lesser extent. False or misleading stories blow up quickly because “$BIGCORP bad”.

BbzzbB · on Jan 5, 2023

It's trained on Twitter data so I assume Reddit data as well.

Honestly feels like they're both pretty important datasets to ingest if trying to build a model on human speech, I reckon social medias, comment sections and co have the most natural human conversational text online.

simplect · on Jan 4, 2023

Similar to the original comment, it could help with exploratory type of work. It helps me shift things from “things I don’t know about/unaware of” to “things I know I don’t know of”.

justinpombrio · on Jan 4, 2023

"Sometimes right, always plausible"

babyshake · on Jan 4, 2023

Would it be effective to ask GPT to provide a confidence rating about how sure it is about an answer, or would it be likely to just say that it is confident in its correctness when it is wrong?

deet · on Jan 4, 2023

"Confidence" is an unfortunate term that shouldn't be confused with a human logic interpretation of "confidence".

In most ML cases (and ChatGPT likely), "confidence" would generally just correlate how closely the query matches data and patterns it's seen in its dataset and inversely correlate with how many conflicting matches and patterns it sees.

Humans are subject to the same problem of course. If you asked how confident a person living many centuries ago was that the Earth was flat, they'd probably say "very confident" because there was nothing in their training data / lived experience to conflict with that view. But they'd be wrong.

But humans still have a significant advantage in that they report lack of confidence when they sense logical inconsistencies and violations of reasoning to a level that ML models can't (at least not yet).

Maybe a fan-out of the possible ways it could answer would be interesting, but really we more need a disclaimer next to every answer that says "this thing that's answering in fully formed language does not have human reasoning capability and can't be trusted (yet)"

mike_hearn · on Jan 5, 2023

Odd/ironic fact: people didn't believe the Earth was flat back then. That's a modern confusion. They believed the Sun revolved around the Earth. Wikipedia has a whole article dedicated to this common belief:

https://en.wikipedia.org/wiki/Myth_of_the_flat_Earth

"The earliest clear documentation of the idea of a spherical Earth comes from the ancient Greeks (5th century BC). The belief was widespread in the Greek world when Eratosthenes calculated the circumference of Earth around 240 BC. This knowledge spread with Greek influence such that during the Early Middle Ages (~600–1000 AD), most European and Middle Eastern scholars espoused Earth's sphericity.[3] Belief in a flat Earth among educated Europeans was almost nonexistent from the Late Middle Ages onward ... Historian Jeffrey Burton Russell says the flat-Earth error flourished most between 1870 and 1920, and had to do with the ideological setting created by struggles over biological evolution"

I asked ChatGPT the same question and it prevaricated:

"There is evidence that some people in medieval times believed the Earth was flat, while others believed it was round. The idea that the Earth is round, or more accurately, an oblate spheroid, has been around since ancient times. The ancient Greeks, for example, knew that the Earth was a sphere. However, the idea that the Earth is flat also has a long history and can be traced back to ancient civilizations as well. During the Middle Ages, the idea that the Earth was round was not widely accepted, and there was significant debate about the shape of the Earth. Some people continued to believe in the idea that the Earth was flat, while others argued for a round Earth. It is important to note that the medieval period was a time of great intellectual and scientific change, and ideas about the shape of the Earth and other scientific concepts were still being developed and debated."

But from what I know, it's wrong, at least as far as we know the historical record (of course there may have been peasants who believed otherwise but their views weren't recorded). The fact that the Earth is a sphere is obvious to anyone who watched a ship sail over the horizon, which is an experience people had from ancient times.

jay3ss · on Jan 4, 2023

I've asked it to give me a confidence rating for its replies to my questions but it states that it can't give one

pcthrowaway · on Jan 5, 2023

I had luck getting it to give me one when providing answers in JSON form.

For example:

> I'm going to share some information, I want you to classify it in the following JSON-like format and provide a responses that match this typescript interface:

    > {
    >   "isXXXXX": boolean;
    >   "certainty": number;
    > }

> where certainty is a number between 0 and 1.

However, I got either 0 or 1 for the certainty every time. Not sure if it was because they were either cut-and-dry cases (certainty 1) or not-enough-information (certainty 0).

I'm actually trying to think of a good example of text I could ask it to intuit information from and give me a certainty

oakpond · on Jan 5, 2023

Even if it gives you a number there. Does that number actually tell you what you mean or is it merely filling in the blanks with random information? I suspect the latter.

For example, ask it to subtract 2 20-digit numbers. It will come up with an answer X where the first couple of digits are correct, and everything after that is wrong.

It gets better.

Ask it to correct itself. It will come up with a different wrong answer Y.

If you then ask it to explain why the answer is right, it will give you an explanation. At the end of the explanation it states the answer is X again, and then in the very next line concludes by telling you that is why the answer Y is correct. :)

colejohnson66 · on Jan 5, 2023

I saw a screenshot a few days ago where someone asked it for five fun facts about the number 2023. In the same response, it said it’s a composite number (3 times 673) and prime (specifically the 41st). Both are wrong; it’s a composite number of 7 times 289).

drcode · on Jan 4, 2023

I think this is the big question lots of people are working on right now

It's apparently really hard to objectively measure/report the "truthiness" of LLM results

Allowing an LLM to "improvise" and be a bit fast-and-lose is unfortunately a necessary ingredient in how they currently work.

lobocinza · on Jan 5, 2023

Then there's the question of how should we interpret it? Should we ask for the confidence rating of the confidence rating? The language models lack the ability to verify/falsify claims, they just do words correlation.

twobitshifter · on Jan 4, 2023

I asked it for movie quotes from a specific movie: 7 out of 10 were from the movie but 3 weren’t but sounded plausible

snickerer · on Jan 5, 2023

I asked it 'What hapoened to Gandalf after he fell down from the bridge, fighting the Balrog? Did he die?'

The answer was a story about Gandalf beeing hurt badly, beeing rescued by some random dwarfs, and so on.

I asked ChatGPT in which book this is described, and it told me that you can read about it on both The Hobbit and The Lord of the Rings.

So it makes up fun stories. This makes me wondering how much of its explantions about physics (which I don't understand completely) are made-up.

rmbyrro · on Jan 5, 2023

Transformer models suffer from "hallucinations". It can be terrible at giving quotes or references. It's a known limitation with this tech that the industry is working to overcome.

fnordpiglet · on Jan 4, 2023

Sounds like every human I’ve ever met

m3kw9 · on Jan 4, 2023

Except we don’t hype humans the same way we hype ChatGPT

fnordpiglet · on Jan 4, 2023

It seems like we do in the threads about chatgpt hype. From what I’ve read every human can do advanced mathematics flawlessly and recall every nuance of every subject with perfect fidelity, write clearly and cogently, and it’s all managed through channeling ether and soul spirit fire through emotions that AIs and Vulcans can’t possess.

mtlmtlmtlmtl · on Jan 4, 2023

I think you're reading way too much into criticisms of ChatGPT as implying humans are immune to the same criticisms. And then transforming them into complete hyperbole.

maksimur · on Jan 4, 2023

But a human isn't there for you 24 hours a day and for every thing you want to ask for, at the very least.

pigsty · on Jan 5, 2023

CEOs and various intellectuals very much are hyped up.

Then we realize they’re like anyone else and they’re massively demonized

sebzim4500 · on Jan 4, 2023

I've certainly met people as confidently incorrect as chatgpt but they are the exception rather than the rule.

enslavedrobot · on Jan 4, 2023

Totally. I asked it to describe an obscure lithography technique (rapid electron area masking), and it gave a reasonable summary but at the end claimed it was widely used in industry...it's not used at all.

ericbarrett · on Jan 5, 2023

I asked it about the strong nuclear force and it said the force gets weaker with distance—quite fundamentally wrong (color confinement).

Taek · on Jan 4, 2023

If you are just looking for related vocabulary words, correctness is not a concern.

foobarbecue · on Jan 4, 2023

Is there any reason that ChatGPT is better than a thesuarus?

post-it · on Jan 4, 2023

Well, "I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?" isn't going to be in a thesaurus.

waynesonfire · on Jan 4, 2023

garbage in, garbage out.

junon · on Jan 4, 2023

No. That doesn't stand up anymore in this case.

lolinder · on Jan 4, 2023

Why would ChatGPT be the exception to the rule? What architecture are they using that somehow is immune to unwanted trends in the training data?

If you're going to make an outlandish claim like that, I'd like to see some arguments to back it up.

blacksmith_tb · on Jan 4, 2023

Well, it's more like a puree of garbage and quality stuff, so you are never quite sure what you'll get in each bite...

Xelynega · on Jan 5, 2023

Garbage interspersed in the input, garbage interspersed in the output.

ALittleLight · on Jan 4, 2023

ChatGPT can give you bad answers to good questions or good answers to nonsense questions. With ChatGPT it's more like "sometimes garbage".

dpkirchner · on Jan 4, 2023

The garbage is in the source material used to create the model, not the questions.

lobocinza · on Jan 5, 2023

More likely due to lack of "good" data than to existence of "bad" data. ChatGPT is know for its ability to "hallucinate" answers for questions that it wasn't trained for.

ALittleLight · on Jan 4, 2023

Same comment still applies. ChatGPT sometimes gives good and bad answers.

JoeAltmaier · on Jan 4, 2023

In fact ChatGPT doesn't know anything about true and false. It's just generating text that most closely resembles text it's seen on similar subjects.

E.g. ask it about the molecular description for anything. It'll start with something fundamental like the CH3N4 etc then describe the bonds. But the bonds will be a mishmash of many chemical descriptions thrown together. Because similar questions had that kind of answer.

The worst part is, it blurts forth with perfect confidence. I liken it to a blowhard acquaintance that will make up crap about any technical subject they have a few words for, as if they are an expert. It's funny except when somebody relies on it as truth.

I don't think GPT3 at its heart is an expert at anything. Except generating likely-looking text. There's no 'superego' involved anywhere that audits the output for truthfulness. And certainly no logical understanding of what it's saying.

robswc · on Jan 4, 2023

I love ChatGPT for simple tasks. It is currently wreaking havoc on some communities tho. Including one I created on reddit.

https://www.reddit.com/r/pinescript/comments/1029r7p/please_...

People have taken to asking ChatGPT to create entire scripts to trade money. When they don't work, they go into chatrooms or forums and ask "why doesn't this work" without saying it was made by ChatGPT. It causes people to open the post, read it a bit and only maybe after a minute or two of wasted time, realize the script is complete nonsense.

dpkirchner · on Jan 4, 2023

I'd argue that level of ambiguity counts as garbage out, although I'm confident it will get better.

waynesonfire · on Jan 4, 2023

Why? chatgpt has certainly consumed seo spam and company marketing materials as part of it's model. Even if a human went through it, there still exists a bias towards this information. After all, this material is specifically written to fool humans.

I've played with chatgpt enough to notice that for some queries it's fundamentally doing an auto-summarize of such content.

Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

My prediction is that those with money will have power to influence their chat bot. Consequently, they'll have access a higher-quality and wider corpus of information. There will not be any restrictions on how chatgpt would answer due to for example, woke agendas. Also, players such as Goldman Sachs would feed their model content generated by their analyst that consumers would not have access to. This already happens but chatgpt will make this information so much more potent.

Furthermore, as this technology continues to improve it will increase the productivity of our population and ultimately generate higher GDP. I'm super excited.

wizzwizz4 · on Jan 4, 2023

> Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

It currently has the ability to do this. It'll make the citations up, of course – but that behaviour is inherent to the architecture; a system that didn't do that would have to work differently at a fundamental level.

> chatgpt will make this information so much more potent.

How do you imagine this would work?

> and ultimately generate higher GDP.

Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Tangentially: why is "increase GDP" a good ultimate goal to have in the first place?

kgwgk · on Jan 4, 2023

Citing from a well-defined corpus and making citations up look like very different things at a fundamental level.

waynesonfire · on Jan 4, 2023

>> chatgpt will make this information so much more potent.

> How do you imagine this would work?

Don't overthink it. It's just the nature of the tool. Imagine you're a detective trying to investigate a crime,

- "list the plates of blue hondas in this area at this time, that have a missing rear bumper and a scratched driver side door" - "send a notifications to all gas stations along this route and notify them of a blue honda"

And, if you're a Goldman Sachs analyst, you can just use natural language to gather information. "i have this scenario, list companies that will benefit" would be an abstract question that you'd ask it. Obivously, the system isn't this good yet but you get the idea. You'd just have to ask more fine grained questions and use some of your domain knowledge to fill the gap until it does become this good.

>> and ultimately generate higher GDP.

> Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Google (or chat gpt) would do a better job than me answering this,

"Increases in productivity allow firms to produce greater output for the same level of input, earn higher revenues, and ultimately generate higher Gross Domestic Product."

The reason you want to increase gdp... the following quote was derived from one of Herbert Hoover’s memoirs.

"[Engineering] It is a great profession. There is the satisfaction of watching a figment of the imagination emerge through the aid of science to a plan on paper. Then it moves to realization in stone or metal or energy. Then it brings jobs and homes to men. Then it elevates the standards of living and adds to the comforts of life. That is the engineer’s high privilege."

By increasing GDP, you elevate the standard of living and add to the comfort of life.

Xelynega · on Jan 5, 2023

> "list the plates of blue Hondas in this area at this time, that have [...]"

I think this shows a significant misunderstanding of what chatgpt does fundamentally. It will never be able to do this unless also fed a description, location, and time of cars in a certain area as context beforehand(either as training data or a prompt). In either case you have access to the data and just need to do a simple search, so chatgpt is providing negative value since it's capable of providing results that don't exist in the dataset.

Similarly for your Goldman Sachs example, you're imagining that chatgpt is greater than it is. It is capable of providing something that would likely follow a given text on the internet at its time of training(aka it's training set) somewhere. It can't reason about new information or situations since it's incapable of reasoning. To believe that it could generate business strategies is to believe that effective business strategies don't require any intuition or reasoning to progress, just statistical recombination of existing strategies.

> By increasing GDP, you elevate the standard of living and add to the comfort of life.

How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.

v4dok · on Jan 5, 2023

>>How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.

You are asking a question that is outside of scope here. GDP per capita has been used as a proxy for standard of living for quite some time now.

wizzwizz4 · on Jan 5, 2023

That proxy only works as long as nobody's optimising for it.

> Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. — Charles Goodhart

GDP (£) per capita in London has doubled since 1998. Has the standard of living "doubled" for the median person? What about the standard of living for the poorest 1%? Has the productivity boost due to automation translated into correspondingly shorter working hours, or correspondingly larger compensation for work done?

What questions do you actually mean to ask, when you talk about GDP?

astrange · on Jan 5, 2023

If something stops being an economic transaction it moves out of GDP. So if ChatGPT reduces Google ad clicks then it doesn’t seem like it would increase it, even though it does increase customer surplus (stuff you get for free).

MuffinFlavored · on Jan 4, 2023

For me it's a weird mixup in my brain of "interactive Google".

I know the results I'm going to get back are basically the same as if I went to Google, ran a query, it returns me their top 3-5 scraped "blog articles" based on relevancy, and then I ran it through one of those condensing/summarizing bots.

I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

I wonder if this kind of technology will remain free for the foreseeable future. Google has to be coming up with something shortly, right? It's interactive search engine results "on steroids" (I think? I can't tell me if brain is tricking me to be biased that it's cooler/more useful than it is. Everybody I tell about it non-tech isn't that impressed/feels it's spammy/crufty/formulaic).

jvm___ · on Jan 4, 2023

I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

Because it's like a smart human giving you their best guess. It will never tell you it doesn't know or give you something completely offbase like Google does.

It's friendlier than Google but less accurate.

MuffinFlavored · on Jan 4, 2023

Is it safe to say in your opinion that Google and ChatGPT are basically trained on the same information?

Google crawls the web/scrapes it/indexes it.

ChatGPT crawls the web (not sure if they have access to Google's internal scrape results, I doubt it), "trains" a model on it, serves it back to you in a "human friendly readable summarized format".

It's just Google from the perspective of "it's going to return the same information Google has" but instead of a search index trying to guess what's relevant it's an interactive language model designed to basically summarize the same underlying blog posts. Is that your opinion/understanding as well?

jvm___ · on Jan 4, 2023

No, not at all. ChatGPT is trained on the same source information, but when you ask a question there's no guarantee it's answer is directly from an actual source, it's always a newly generated "thought".

Google is a photocopier. It gives you an exact copy of what it finds. Google doesn't create, just references and links to original sources.

Google is a library, but not an author.

ChatGPT is an author, but not a library.

However, ChatGPT has read every book in the library, so when you ask a question it writes you a story from it's memory based on what it thinks* you want. ChatGPT can write stories about books in the library, and it will probably be right (but maybe not).

*Remember the game Plinko from Price is Right? Basically ChatGPT takes your question, drops all the words through its super complicated plinko machine (neural network) and gives you the result.

If you ask it for the names of US presidents, it should give you the same answer as Google - even though it came up with it via the plinko method.

If you ask it for a story about a singing rock, the process is the same as the presidents list. It drops your request into the network and gives you the result. It's not smart, just wildly complicated. It's also never going to be a photocopier (but it might act like one for certain inputs).

----

The brain breaking part is that when you ask ChatGPT for...

"Write me a song about a singing rock"

It changes each word into a number-token, then those number-tokens go through the plinko machine. The result is a different set of number-tokens which it converts back to readable words. Inside ChatGPT it doesn't "know" anything. Rock is a number. Singing is a number. Write is a number.

But it knows the relationships between those numbers, and what other numbers are nearby the area of the network devoted to songs, so it pulls in words and related concepts like a human would.

But it's just numbers with no understanding.

Because it's numbers and not understanding, it can be wrong, either completely or subtly.

Edit: Asking for the list of us presidents has "David D. Eisenhower (1849-1850)" as number 12 (who isn't a person who was ever president). The rest look right, but ChatGPT is subtlety wrong in this case.

MuffinFlavored · on Jan 4, 2023

Do you see the future being ChatGPT results but with citations? Or is that basically impossible given how it's a "trained model"?

jvm___ · on Jan 4, 2023

No, ChatGPT doesn't know it's own sources. It's just a trained model. Once the model has been created it's fixed - it can be recreated unlimited times, but it will never tell you the sources for it's output.

Maybe if the network nodes have a source attached to them...

But thinking out loud...

That's not how the number-tokens work. It's at a word level... so "a list of us presidents" is broken down into individual number-tokens for each word, and you can't provide a source for each word.

---

I'm not sure how you combine Google and ChatGPT.

Chat is creative/combinatorial and Google is "just the facts".

ChatGPT and Google are going to have problems going forward. How do both of them determine if the information they find on the internet is from a meat-brain and not a metal-brain.

Happy to be proven wrong.

v4dok · on Jan 5, 2023

Maybe by fact-checking its answer?

Question -> "creative" output -> Google -> Summary of links -> Comparison -> confidence level (or re-write) + links that were used for checking

Not so different than how we work in a high-level. I believe that openAI has published a paper called webGPT that has a workflow like this (although not sure its exactly the same)

_a_a_a_ · on Jan 4, 2023

"condensing/summarizing bots" - never heard of these, will have a play around, thanks.

MuffinFlavored · on Jan 4, 2023

https://www.reddit.com/r/autotldr/comments/31b9fm/faq_autotl... from 2015

kenjackson · on Jan 4, 2023

Actually I find it much better than that for exploratory purposes than just getting search terms. The ability to just keep asking questions for clarification is something that the web was meant to provide with web links, but rarely does a good job of it. If it can simply act as a domain expert that I can talk to, it would be a huge win.

lolinder · on Jan 4, 2023

But it's not a domain expert: it's a language model designed and trained to produce language that could plausibly have been written by a human on the internet. At best it functions as a well-informed amateur, at worst it hallucinates nonsense but writes it in a way that is very convincing.

hombre_fatal · on Jan 4, 2023

To be fair, you just summarized human discourse especially on places like Reddit and HN.

Xelynega · on Jan 5, 2023

This makes me think of a quote from one of Dijkstra's lectures:

"In the long run I expect computing science to transcend its parent disciplines, mathematics and logic, by effectively realizing a significant part of Leibniz's Dream of providing symbolic calculation as an alternative to human reasoning. (Please note the difference between "mimicking" and "providing an alternative to": alternatives are allowed to be better.)"

When talking about a tool thats supposedly greater than humans, why should the shortcomings of humans be relevant? The tools we create to surpass our own capabilities should be greater than our own capabilities, not stunted by the same issues.

bamboozled · on Jan 4, 2023

This comment isn’t helpful or a retort sorry.

LouisSayers · on Jan 5, 2023

It doesn't matter, it can still be extremely helpful.

For instance, I had fragmented memories of a movie, described what I knew about it (about a boy who lived in a trainstation), and it helped me find a couple movies and then narrow in on the one I was looking for.

These types of queries can be super painful with modern search engines but was easy with ChatGPT and a pleasant experience.

I think people are thinking of this AI in the wrong way - where it is "an expert". To me, I like to think of it as a companion that helps us shape and refine our thoughts and ideas.

kenjackson · on Jan 4, 2023

It may not be a domain expert now, but it easily could be.

For example if you took all the Linux kernel code, the code review comments, the docs, and several of the top books and blogs on kernel development — suddenly you have a system that may be great for new kernel developers to ask questions of. Especially in a community that often isn’t kind to people Jose asking “dumb questions”.

bcrosby95 · on Jan 4, 2023

Could it? There's lots of commentary about how it could easily be this or that, but I don't work in ML and have no clue whether it is actually easy or not to tweak ChatGPT to work in such ways.

For example, in my experience ChatGPT isn't very "smart". It has a lot of knowledge but it can't infer any facts from that knowledge. When you ask it to write a program it has no real idea of what it actually does, and you can easily get it to add features it already added, or tell it something is a bug when it really isn't.

This doesn't sound like the stuff you could make a domain expert out of, at least, not out of the box.

kenjackson · on Jan 4, 2023

I'm not asking it to infer much, but to piece together a fair bit and understnad what I'm asking. For example, here's a chat I had with ChatGPT:

"How does common subexpression elimination work?"

It answered it correctly and gave some basic code examples to demonstrate the concept. Then I followed up with:

"But can it do the elimination if the variables are flipped, but semantically equivalent, like 'y + x' in the example above?"

It again gave what I would consider a correct answer. Note 'x + y' was what was eliminated previously, so I reversed it here.

Then I asked: "Are there cases where the compiler might fail to eliminate common subexpressions?"

And again a good answer.

Now all of this could be found on the web somewhere, but for example the second question didn't show an obvious answer on Google when I searched for it. I'm sure I could find it, but I know this field well. If I was someone new to the field I'd probably spend a lot of time parsing useless articles to find something that answered the question in a way I could understand.

I'm less concerned about it writing code (which is cool). For me, the ability to help me learn an area quickly is far more useful. It doesn't need novel answers, but the ability to understand what I'm asking and answer it. I think it's really close to being able to do this now.

lolinder · on Jan 4, 2023

That solves the "garbage training data" problem, but it doesn't solve the "it's just a language model" problem.

If you fine tuned ChatGPT on all the sources you mention, you now have a model that produces results that could plausibly have been written by a domain expert on the Linux kernel, but you don't have a domain expert. It will still hallucinate, because that's a fundamental feature of generative AI, it will just hallucinate much more convincingly.

kenjackson · on Jan 4, 2023

I get what you're saying, but I'm just not convinced that it will continue to be a huge problem. In some sense, if the state of art is where we're at today with language models, then sure. But I think it'll get better -- in part because I'm not sure humans just aren't souped up language models with some weird optimization functions...

sebzim4500 · on Jan 4, 2023

Being a well-informed amateur in everything is pretty impressive though. ChatGPT will be extremely useful if it ever figures out how to say "I don't know".

lolinder · on Jan 4, 2023

I don't believe there's any way for a LLM operating alone to recognize when it doesn't know something, because it has no concept of knowing something.

Its one job is to predict the next word in a body of text. That's it. It's good enough at it that for at least half the people here it passes the Turing test with flying colors, but the only kind of confidence level it has is confidence in its prediction of the next word, not confidence in the information conveyed.

If we were to take a language model and hook it up to another system that does have a concept of "knowing" something, I could see us getting somewhere useful—essentially a friendly-to-use search engine over an otherwise normal database.

astrange · on Jan 5, 2023

> Its one job is to predict the next word in a body of text.

“Predicting the next word” and “writing” are the same thing; you’re just saying it writes answers in text. There’s nothing about that preventing it from reasoning, and its training goal was more than just “predict the next word” anyway.

kenjackson · on Jan 4, 2023

I don't know if I buy this. It feels like your confidence in what you say is closely tied to "knowing". I'm sure there is more research to do here, but I'm not sure if there is a need to "tie" it to some other system. As it stands today there are definitely things ChatGPT doesn't know and will tell you so. For example, I asked it, why did Donald Trump spank his kids -- and it said, "I do not have information about the parenting practices of Donald Trump".

That said, there are a lot of things it does get wrong, it would be nice for it be better at those. But I do think that, maybe much like humans, there will always be statements it makes, which are not true.

collinmanderson · on Jan 5, 2023

"I'm sorry, but I am a text-based AI language model..."

suby · on Jan 4, 2023

This is a good point.

Something I also enjoy about it is the uniform interface. Each answer is presented the same, there's no parsing layout from different sites, or popup modals to dismiss, or long winded intro to get to the answer you're looking for. Of course you can't quite trust what you're told, so this is a bit moot.

Taylor_OD · on Jan 4, 2023

I've found its good for getting me started. Need to do a presentation on something? Type it into chatGPT. It will generate what is basically an okay outline. You can expand on what you like, cut what you dont.

For me getting started is typically the most difficult part (thanks adhd) so this is a huge help.

yoyohello13 · on Jan 4, 2023

This is definitely the best use case for these models I've heard. Often when I'm researching a field I'm not familiar with the hardest part is just knowing the vocabulary necessary to express what I want to ask.

troyvit · on Jan 4, 2023

Ask Jeeves 2.0!

est · on Jan 4, 2023

ChatGPT fabricates lots of stuff, it's deceptive for common queries, but for programming related output, it's easily verifiable and delivers as an extremely valuable search tool. I can easly ask ChatGPT to explain stuff e.g. eBPF details without wasting time looking up the manuals. I hope Bing dominates Google and stackoverflow in this.

loveparade · on Jan 4, 2023

It's easily verifiable, but it may still waste time. I've had many cases where ChatGPT makes up functions that do exactly what I need, but then I find out these functions don't actually exist. This may not happen very often for super popular languages like Python or Javascript where training data is huge, but it happens all the time for the long-tail of languages. In those cases, it would've been faster for me to do a regular search.

I do agree with the overall point though. If you understand when to use it and when it's more likely to give you nonsensical answers, it can save a huge amount of time. But when I ask it about a topic that I don't know enough about to immediately verify the answer myself I'm forced to double check the answers for validity, which kind of defeats the purpose.

The best queries to ChatGPT are cases where I know what the answer should look like, I just forgot the syntax or some details. Bash scripts or Kubernetes manifests are examples here, I know them, I just keep forgetting the keywords because I only touch them every few weeks.

And don't get me started about asking ChatGPT about more general topics in e.g. economics or finance. What you get is a well-written summary of popular news and reddit opinions, which is dangerous if it's presented as "the truth" - The big mistake here is that the training procedure assumes that the amount of data correlates with correctness, which isn't true for many topics that involve politics or similar kinds of incentives where people and news spread what conveniently benefits them and gets clicks.

wildpeaks · on Jan 4, 2023

Wasting time and having to be constantly vigilant is exhausting and a slippery slope that makes it easier to fall for deceptive content and settling for "I don't know, it's probably close enough" instead of insisting on precision and accuracy.

Humans take a lot of shortcuts (such as believing more easily the same facts presented with a confident tone) and the "firehose of bs" exploits it: this was already the case before generative AI, but AI amplifies the industrial-scale imbalance between the time needed to generate partially incorrect data and the amount of time/energy required to validate.

discreteevent · on Jan 4, 2023

Agreed that it is a slippery slope. Programming is understanding - like writing or teaching is understanding. To really understand something, we must construct it ourselves. We will be inclined to skip this step. This comment sums it up well:

> Salgat 8 days ago

> The problem with ML is that it's pattern recognition, it's an approximation. Code is absolute, it's logic that is interpreted very literally and very exactly. This is what makes it so dangerous for coding; it creates code that's convincing to humans but with deviations that allow for all sorts of bugs. And the worst part is, since you didn't write the code, you may not have the skills (or time) to figure out if those bugs exist

https://news.ycombinator.com/item?id=34140585

MetaWhirledPeas · on Jan 4, 2023

> To really understand something, we must construct it ourselves.

I think the real power of these bots will be to lead us down this path, as opposed to it doing everything for us. We can ask it to justify and explain its solution and it will do its best. If we're judicious with this we can use it to build our own understanding and just trash the AI's output.

alpaca128 · on Jan 4, 2023

How is that worse than having to look at every online post's date to estimate whether the solution is out of date? Or two StackOverflow results where one is incorrectly marked as duplicate and in the other the person posting the answer is convinced that the question is wrong.

ChatGPT can completely cut out the online search and give an answer directly about things like compiler errors, and elaborate further on any detail in the answer. I think that 2-3 further GPT generations down the line it will be worth the time for some applications.

The problem I see is less the overall quality of responses but people overestimating on where it can be used productively. But that will always be a problem with new tech, see Tesla drivers who regularly take a nap in the car because it didn't crash yet.

Double_a_92 · on Jan 4, 2023

Unless the responses in those old online forums where intentionally malicious, they might be reasonably helpful even if not 100%.

While ChatGPT spews out complete nonsense most of the time. And the dangerous part is that that nonsense looks very reasonable. It gets very frustrating after some time, because at first you are always happy that it gave you a nice solution, but then it's not usable at all.

mannykannot · on Jan 4, 2023

I'm a glass-half-empty sort of person: in my experience, even perfectly good answers for a different version can be problematic, and sometimes harmful.