• Neshura
      link
      fedilink
      English
      164 days ago

      pretty much, AI (LLMs specifically) are just fancy statistical models which means that when they ingest data without reasoning behind it (think the many hallucinations of AI our brains manage to catch and filter out) it corrupts the entire training process. The problem is that AI can not distinguish other AI text from human text anymore so it just ingests more and more “garbage” which leads to worse results. There’s a reason why progress in the AI models has almost completely stalled compared to when this craze first started: the companies have an increasingly hard time actually improving the models because there is more and more garbage in the training data.

      • oce 🐆
        link
        fedilink
        English
        15
        edit-2
        4 days ago

        There’s actually a lot of human intervention in the mix. Data labelers for source data, also domain experts who will rectify answers after a first layer of training, some layers of prompts to improve common answers. Without those domain experts, the LLM would never have the nice looking answers we are getting. I think the human intervention is going to increase to counter the AI pollution in the data sources. But it may not be economically viable anymore eventually.

        This is a nice deep dive of the different steps to make today’s LLMs: https://youtube.com/watch?v=7xTGNNLPyMI