andrewstuart 7 minutes ago

ChatGPT seems to be really good at analysis but continues to be bad at coding.

It constantly loses existing code. That alone makes it not worth using. I can’t be bothered constantly working to identify and prevent losing existing features.

It is however very powerful for doing debugging and diagnostics with throwaway code.

It’s also useful to analyse stuff with ChatGPT and give that to Gemini or Claude to do the coding.

cwyers an hour ago

This is Microsoft subsidizing Claude inference costs -- if you look at how they charge models against your allotment, Gemini, GPT-5 and Claude 4 Sonnet all cost the same, despite Claude 4 Sonnet being more expensive than the other two. Not really sure I understand the economics here, especially since there's not really a clear winner between GPT-5 and Claude 4 Sonnet for coding (if anything I think GPT-5 puts up a better showing).

  • drewbitt 12 minutes ago

    > I think GPT-5 puts up a better showing

    Would the more casual Copilot audience be OK with gpt-5-high - the model that many say is better than Sonnet - taking significantly longer to respond? Potentially minutes longer. A faster model can make sense as a default

  • martinald 20 minutes ago

    "Sonnet being more expensive than the other two" -> you mean based on public pricing? Microsoft will not be paying retail prices for this.

  • mattalex 35 minutes ago

    It might be that they pay less for anthropic depending how many tokens are generated by each model: total cost is token cost times number of tokens. I haven't checked gpt5, but it is not impossible that price wise they might be very comparable if you account for reasoning tokens used.

kerpal 3 hours ago

Claude/Anthropic is more focused on productivity (Coding, Spreadsheets, Reports). ChatGPT seems more focused on general-purpose LLM (Research, Cooking, Writing, Image Generation).

Makes sense that MS would partner with Anthropic since their tool-use for productivity (Claude Code) seems superior. I personally rarely code with ChatGPT, almost strictly Claude.

  • dmurray 2 hours ago

    Some people might be surprised that MS would pick the product with the best technological fit rather than the one they already have a deep business and financial relationship with.

    Surely Microsoft's expertise these days is in cross-selling passable but not best-in-class products to enterprises who already pay for Microsoft products.

    It says something about how they view the AI coding market, or perhaps the level of the gap between Anthropic and OpenAI here, that they've gone the other way.

    • dijit an hour ago

      They are right to be surprised.

      Why is Azure popular? Not on its own merits, it's because there is a pre-existing relationship with Microsoft.

      Why is Teams the most widely used chat tool? Certainly not because it's good.. it is, again, pre-existing business relationships.

      Seems odd for a company that survives (perhaps even thrives) on these kinds of intertwined business reasons to, themselves, understand that they should go for merit instead.

      • mikestorrent 36 minutes ago

        Yep. Similarly, Microsoft Entra... if you want Office, you're getting it anyway. Might as well use it for SSO, right? And here's your free Teams license... how can you justify paying for Slack when we've a perfectly good chat client at home?

        • dijit 7 minutes ago

          I tried for a while to get Entra working with an external identity provider (Google Workspace).

          The other way around worked (Google could use Entra) but it was basically impossible to backend Entra from Google. Weird.

    • thewebguyd 28 minutes ago

      > It says something about how they view the AI coding market

      I think Microsoft views models as a commodity and they'd rather lean into their strengths as a tool maker, so this is Microsoft putting themselves into a position to make tools around/for any AI/LLM model, not just ones they have a partnership with.

      Honestly I think this sort of agnosticism around AI will work out well for them.

  • pnathan 2 hours ago

    I've been happy with Anthropic models. I also have been using the Google models more, with decent results. The Copilot/OpenAI models don't seem to be as good as a rule of thumb, can't explain exactly why.

    Overall, I think Google has a better breadth of knowledge encoded, but Anthropic gets work done better.

  • mnky9800n an hour ago

    I like perplexity's deep research model which is based on deepseek i think. i use that for most kind of writing, discussion, research, etc. where I need some kind of feedback. Claude seems to go crazy sometimes when you ask it to do the same task. Whereas for coding, Claude Code is obviously better than everything else under the sun.

  • _fat_santa 2 hours ago

    This has been largely my experience as well. Claude does way better with coding while ChatGPT does better with general questions.

    • bobbylarrybobby an hour ago

      The new gpt-codex-* models are giving Claude Code a serious run for its money IMO. If OpenAI can figure out the Codex CLI UI (better permissions, more back and forth before executing) then I think they will have the better agentic coder.

      • kerpal an hour ago

        Tried codex but so far feels a lot slower than claude code. Perhaps because I'm on the basic plan?

  • airstrike an hour ago

    Are there any open models that compete with Claude in its tool use capabilities for complex tasks?

    Feels like an area where we could use more competition...

  • m_mueller 3 hours ago

    GPT-5 is pretty decent nowadays, but Claude 4 Sonnet is superior in most cases. GPT beats it in cost and usable context window when something quite complex comes up to plan top-down.

    • CharlieIsAHero 2 hours ago

      What do you mean by usable context window? Sonnet 4 is 968k and gpt5 is 368k. Are you saying the context window on sonnet is useless?

      • m_mueller 2 hours ago

        I never implied it's useless. I don't have scientific data to back this up either, this is just my personal "feeling" from a couple hundred hours I've spent working with these models this year: GPT-5 seems a bit better at top-down architectural work, while Sonnet is better at the detail coding level. In terms of usable context window, again from personal experience so far, to me GPT-5 has somewhat of an edge.

        • 613style 2 hours ago

          Agreed. My experience is GPT5 is significantly better at large-scale planning & architecture (at least for the kind of stuff I care about which is strongly typed functional systems), and then Sonnet is much better at executing the plan. GPT5 is also better at code reviews and finding subtle mistakes if you prompt it well enough, but not totally reliable. Claude Code fills its context window and re-compacts often enough that I have to plan around it, so I'm surprised it's larger than GPT's.

    • boredtofears 2 hours ago

      What I find interesting is how much opinions vary on this. Open a different thread and people will seem to have consensus on GPT or Gemini being superior.

      Even the bench marks don’t seem all that helpful.

      • TuxSH an hour ago

        Well, last I checked Claude's webchat UI doesn't have LaTeX rendering for output messages which is extremely annoying.

        On the other hand, I wish ChatGPT had GitHub integration in Projects, not just in Codex.

        I've also had Claude Sonnet 4.0 Thinking spew forth incorrect answers many times for complex problems involving some math (with incapability to write a former proof sometimes), whereas ChatGPT 5 Thinking gives me correct answers with formal proof.

      • kissgyorgy 2 hours ago

        I think it depends on the domain. For example, GPT-5 is better for frontend, React code, but struggles with niche things like Nix. Claude's UI designs are not as pretty as GPT-5's.

        • omneity 2 hours ago

          This is also pretty subjective. I’m a power user of both and tend to prefer Claude’s UI about 70-80% of the time.

          I often would use Claude to do a “make it pretty” pass after implementation with GPT-5. I find Claude’s spatial and visual understanding when dealing with frontend to be better.

          I am sure others will have the exact opposite experience.

        • boredtofears an hour ago

          This is what I mean - even opinions on domain are wildly different. I've seen people say Claude's React is best.

glimshe 3 hours ago

Anthropic doesn't allow me to use my phone number across my personal and business logins. I simply can't use Claude where I need it, even if I'm willing to pay. I don't understand why they add so much friction when everyone else just allows me to do work.

  • electric_muse 2 hours ago

    This whole “real phone number is your access code to every service” trend is really frustrating.

    I had the same experience recently with: - Ticketmaster - Docusign - Vercel

    Probably a handful more I forgot.

    I believe the main reason is because it prevents fraud.

    But I see a deeper motive that phone numbers are more friction to change and therefore our “real” numbers become hard-to-change identity codes that can easily be used to pull tons of info about you.

    You give them that number and they immediately can look up your name, addresses, age, and tons of other mined info that was connected to you. Probably credit score, household income, etc.

    Phone numbers have tons of “metadata” you provide without really knowing it. Like how the Exif data in a photo may reveal a lot about your location and device.

    • derekdahmer 2 hours ago

      As someone who implemented phone verification at a company I worked for, it’s 100% for preventing spam signups intending to abuse free tiers. API companies can get huge volumes of fake signups from “multiplexers” who get around free tier limits by spreading their requests across multiple accounts.

      • AlexandrB an hour ago

        This makes sense for free tiers of products, but if you provide CC info for a paid tier, you shouldn't also have to provide a phone number. One or the other.

        • moduspol 44 minutes ago

          I think people can use stolen / one-time use / prepaid / limited purchase size credit cards fairly easily, too. And you might not find out until after they've racked up a non-trivial amount of costs.

      • anonym29 an hour ago

        Because SMS verification is so cheap (under a dollar per one-time validation, under $10/mo for ongoing validation), this approach really only makes sense for ultra-low-value services, where e.g. $0.50 per account costs more than the service itself is worth.

        Because of this low value dynamic, there are many techniques that can be used to add "cost" to abusive users while being much less infringing upon user privacy: rate limiting, behavioral analysis, proof-of-work systems, IP restrictions, etc.

        Using privacy-invasive methods to solve problems that could be easily addressed through simple privacy-respecting technical controls suggests unstated ulterior motives around data collection.

        If your service is worth less than $0.50 per account, why are you collecting such invasive data for something so trivial?

        If your service is worth more than $0.50 per account, SMS verification won't stop motivated abusers, so you're using the wrong tool.

        If Reddit, Wikipedia, and early Twitter could handle abuse without phone numbers, why can't you?

    • anonym29 an hour ago

      Mandatory phone number registration does not and never has prevented fraud.

      Plenty of free VOIP services exist, including SMS reception.

      Even when the free service providers are manually blocklisted, one-time validations can be defeated with private numbers on real networks / providers for under a dollar per validation, and repeated ongoing validations can be performed with rented private numbers on real networks / providers for under ten dollars per month.

      The rent-an-SMS services that enable this are accessible through a web interface that allows connections from tor, vpns, etc - there is no guarantee that the telecom provider's location records of the IMEI tied to that phone number is anywhere close to the end user's real geographic location, so this isn't even helpful for law enforcement purposes where they can subpoena telecom provider records.

      This "phone number required" practice exists for one primary reason: for businesses to track non-fraudulent users, data mine their non-fraudulent users, and monetize the correlated personal information of non-fraudulent users without true informed consent (almost nobody reads ToS's, but many would object to these invasive practices if given a dialogue box that let them accept or decline the privacy infringements but still allowed the user to use the business' service either way).

      Sometimes, they are also used for a secondary reason: to allow the business to cheap out on developer costs by cutting corners on proper, secure MFA validation. No need to implement modern, secure passkeys or RFC-compliant TOTP MFA, FIDO2, U2F when you can just put your users in harm's way by pretending that SMS is a secure channel, rather than easily compromised by even common criminals with SS7 attacks, which are not relegated to nation-state actors like they once were.

      • slipnslider 43 minutes ago

        >never has prevented fraud.

        Interesting, I've heard otherwise but it was anecdotes. Do you have any data on that?

        > to track non-fraudulent users

        You listed a large number of ways to fake the phone number which is why you believe it doesn't prevent fraud. What is to stop a non-fraudulent user from doing the same thing to prevent the tracking by the company?

  • giancarlostoro 2 hours ago

    Sounds like you want a Google Voice Number or similar service, but now you're spending money for someone else's awful software, and in some cases, some places will flag your number if its google voice and outright refuse to let you in.

    • rs186 2 hours ago

      ...Like Claude. They don't allow you to use Google voice numbers for verification.

      • giancarlostoro 2 hours ago

        I want a "burner" number, but I'm not sure what the best option is, do I buy a crappy phone at Walmart and use that number? What's the bare bottom of the barrel cost for a phone with no mobile data, only SMS?

  • raldi 3 hours ago

    When was the last time you tried?

    • glimshe 2 hours ago

      A month or so ago.

  • dathinab 2 hours ago

    wait what do they need a phone number for???

    • criddell 11 minutes ago

      I think they do that to make it more difficult for one person to open multiple accounts. You can still do it, you just need another phone.

paxys 3 hours ago

Claude was the gold standard for coding but I have had a lot of success with GPT-5. Nowadays I pretty much always default to GPT-5.

  • jbm 12 minutes ago

    Yeah, likewise. Claude has been going downhill recently for me, while Codex works great. I nearly cancelled my ChatGPT membership until they started providing codex, and now I'm considering if I want to use Pro again.

    It's weird though because ChatGPT itself is not particularly better than it was before. Bringing down costs per token probably means they can do more reasoning before coding than Codex does.

  • bwat49 3 hours ago

    yeah I've been getting better results with codex (gpt5) vs claude

YmiYugy an hour ago

Personally I found gpt-5 to be a bit better than sonnet-4. At least in cursor. Claude is still more reliable and competent at tool calling, but I found gpt-5 to be better in token efficiency and a lot better at instruction following.

  • kranke155 an hour ago

    This seems to change every 6 months? Just my impression.

alberth 21 minutes ago

Embarrassingly dumb question ...

is Claude Code just a plug-in for an existing editor. Or it is the entire editor itself?

piker 3 hours ago

In some ways it makes sense to pave the way for Claude to protect the brand of VS Code. On the other hand, it’s a bit of a head-scratcher since it seems like VS code was built as a loss-leader to sell Microsoft cloud products. Perhaps enterprise ChatGPT, co-pilot and GitHub can make up the difference even if the community tier favors Claude.

Edit: maybe Cursor forced this and Microsoft is taking its choice to open license VS code on the chin. Will be interesting to see the strategy with Visual Studio going forward.

dawnerd 2 hours ago

Auto feels like a way for them to slightly push people towards paid models more. If they really favored Anthropic, claude would be the included free model, right?

  • mynameisvlad an hour ago

    On their roadmap (it's in the linked blog post https://code.visualstudio.com/blogs/2025/09/15/autoModelSele...):

    > Let users on a free plan take advantage of the latest models through auto

    It also describes how the auto selector works in more detail:

    > When using auto model selection, VS Code uses a variable model multiplier based on the automatically selected model. If you are a paid user, auto applies a 10% request discount. For example, if auto selects Sonnet 4, it will be counted as 0.9x of a premium request; if auto selects GPT-5-mini, this counts as 0x because the model is included for paid users. You can see which model and model multiplier are used by hovering over the chat response.

verdverm 3 hours ago

Anthropic didn't make the cut in our evaluation (data usage concerns). They have also been the shadiest of the companies

They lost me when they expired my money and then tried to take more without asking

bartalama 2 hours ago

Claude Sonnet 4 is the best for generating code for me so far, albeit needing some investment in instruction files and prompt files when using GitHub Copilot.

dev1ycan an hour ago

I use deepseek on the daily, the other day I tried to use claude and was surprised when after like 5 messages (had decent amount of code though) I got "limited"

daft_pink 3 hours ago

This needs that archive link that bypasses the paywall. I had to read it on my Apple News+ subscription to avoid the paywall.

gigel82 3 hours ago

I don't think it's Microsoft that favors it. It's likely customers. Claude wipes the floor with all the GPTs in GitHub Copilot (in my experience).