• FarceOfWill@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 hours ago

    I don’t see how you can write the law such that it allows training ai on copyrighted data without making it possible to train a special llm on a single github instead of the entire universe, and essentially treat it as a full compression of the source.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      21 minutes ago

      The outputs are still bound to copyright laws. Tracing pixel per pixel over an artwork doesn’t make it immune to copyright laws, maliciously over training gen ai to act like a database and outright copy shouldn’t either.

      If you have a carbon copy of someone’s github, it doesn’t matter if you generated it, it’s still a copy. Although code is a difficult example since I’m not entirely where the line is for one repo to be different then the other when they are accomplishing the same task.

      I always imagined businesses just grabbed the gpl software and would tell their employees to rewrite it but different. Most things I dive down into seem to stem from one algorithm or two from a paper and the rest is fluff.