Each OpenAI and Google turned to transcribing YouTube movies to additional practice their AI fashions, which can violate creators’ copyrights, the New York Occasions experiences. The report particulars how the 2 tech giants, together with Meta, minimize corners to entry as a lot knowledge as doable to coach their AI fashions.
OpenAI’s Sora simply dropped a trippy music video to fan the AI hype flames
In response to the report, OpenAI used Whisper, a speech recognition instrument, to transcribe multiple million hours of YouTube movies. It then fed the transcripts into GPT-4, the highly effective AI system that the newest mannequin of ChatGPT’s chatbot runs on. Google, which owns YouTube, additionally transcribed YouTube movies to coach its AI fashions.
The transcription of movies by each corporations might infringe on creator’s copyrights to their movies. Different makes use of of creator content material to coach AI has prompted copyright and licensing lawsuits.
OpenAI’s use of YouTube movies additionally might violate Google’s guidelines, which prohibits using its movies for “impartial” purposes and “automated means (corresponding to robots, botnets or scrapers)” of accessing its movies.
Matt Bryant, a spokesperson for Google, instructed the New York Occasions that the corporate was unaware of any such use by OpenAI. However the report alleges that folks at Google knew about OpenAI’s unauthorized use of YouTube movies and uncared for to take motion as a result of it was doing the identical factor. Google additionally instructed the paper that it solely trains its AI on movies from creators who’ve agreed for his or her content material for use on this method.
In July 2023, Google modified its phrases of service to permit the use public on-line materials like Google Docs and Google Maps restaurant critiques to additional practice its AI fashions.