• 4 Posts
  • 129 Comments
Joined 7 months ago
cake
Cake day: May 11th, 2024

help-circle


  • I think it’s more likely a compound sigmoid (don’t Google that). LLMs are composed of distinct technologies working together. As we’ve reached the inflection point of the scaling for one, we’ve pivoted implementations to get back on track. Notably, context windows are no longer an issue. But the most recent pivot came just this week, allowing for a huge jump in performance. There are more promising stepping stones coming into view. Is the exponential curve just a series of sigmoids stacked too close together? In any case, the article’s correct - just adding more compute to the same exact implementation hasn’t enabled scaling exponentially.







  • Hackworth@lemmy.worldtoTechnology@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    2 months ago

    Yeah, using image recognition on a screenshot of the desktop and directing a mouse around the screen with coordinates is definitely an intermediate implementation. Open Interpreter, Shell-GPT, LLM-Shell, and DemandGen make a little more sense to me for anything that can currently be done from a CLI, but I’ve never actually tested em.


  • Hackworth@lemmy.worldtoTechnology@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    2 months ago

    I was watching users test this out and am generally impressed. At one point, Claude tried to open Firefox, but it was not responding. So it killed the process from the console and restarted. A small thing, but not something I would have expected it to overcome this early. It’s clearly not ready for prime time (by their repeated warnings), but I’m happy to see these capabilities finally making it to a foundation model’s API. It’ll be interesting to see how much remains of GUIs (or high level programming languages for that matter) if/when AI can reliably translate common language to hardware behavior.