I recently came across this book, “Who Owns This Sentence? A History of Copyrights & Wrongs”. I will put a non- affiliate link to the book later.
I wanted to conduct a small learning experiment on my own here. Here are the details of the experiment.
I have been quite intrigued with the term “Copyright” ever since ChatGPT came into the picture, leading to all these questioning and lawsuit against OpenAI for using the copyrighted materials.
I have some thoughts and ideas on what is “copyright”. And I’m going to put into this issue of newsletter to again invite more perspectives from you, my subscribers.
At this step, I will let the above mentioned book influence me by reading it of course. :)
In another issue of this newsletter, I will be putting down how my perspective and thought has changed thereafter.
Interested? Small plug here! Consider sharing this issue with your community! :)
Ok here goes!
History
From most layman looking at the need for copyright, we can understand that there are significant effort spent in writing an article or painting a picture. Most people want to be compensated, and as an economy if users are “using” the article or picture for capital gains, creators will want to have a share of the revenue. This part is understood by many.
Let us unpack this process here. There are a few perspectives we can look at.
What Constitutes Copyright?
Let’s discuss what is copyright of a book/article or image, and what are we really protecting with copyright, shall we? :)
Let’s look at a book first. A book is made up of words mostly but to say it is made up of words is too simplistic a view. What every book/article wants to do is convey a few ideas. There is a main topic followed by several sub-topics surrounding this main topic. Now there needs to be a certain arrangement of the words to convey that main idea. See the following:
“The Earth rotates around the Sun.”
Vs
“The Sun rotates around the Earth.”
Same words in a statement but the whole meaning has changed after the re-arrangement of the words.
So two components in a book that is conveying an idea. The words and also the arrangement/order of the words. However, copyright at its most basic level is protecting neither of them but rather the topic at hand, if you think of it from first principles.
Bringing this across different media, images will be the RGB number of each pixel, followed again by the arrangement of these pixels on the canvas, and music will be musical notes, again followed by the arrangement of these musical notes and since music is linear, there is also the tempo which I feel is part of arrangement of musical notes. I will pause my discussion on images and music here though for I want to focus on book/article copyright at the moment plus I am not that familiar with images and music yet on what copyright is really protecting for these media. I’ll come back to these two media later perhaps. :)
The point I want to drive at is copyright has two components, the media elements followed by the arrangement of these media elements.
Protecting the Topic
If we say the copyright of the book is to prevent “abuse” or “misuse” of the topic conveyed. This becomes a very grey area, because how do we deem abuse, misuse or even fair use?
And to even move to determining whether there is abuse, or misuse, the court needs the author to prove that the topic has been “copied” over, which is another difficult obstacle to clear. Especially when most countries have a “innocent till proven guilty”. The onus is very much on authors to prove the topic was copied over and followed by abuse and misuse. From what I see, it is an uphill task!
Concluding Remarks
The topic I want to drive at is this. Copyright protects the idea behind the book/article, not the words and the arrangement of words. To sue LLM trainers on copyright infringement is an uphill task, because it will be difficult to prove the book topic is residing inside the LLMs. LLMs will just generate and output an amalgamation of similar topics rather. To get LLMs to generate the precise topic out, will take tremendous effort in prompt engineering to get it, “tremendous” can mean from days to months of even years, from what I see.
I’ll pause the discussion here and proceed to the book once I finish the current book “Mathematical Intelligence”.
Please, let me know your thoughts on copyright. I especially will love to hear some perspectives on copyright for images and music. Thanks in advance! :)
Consider supporting my work! You can make a “book” donation and drop me some wisdom! :)
Book: Who Owns This Statment?
Great topic to kick off and unpack in this AI-fuelled knowledge economy. I’ll state my position plainly before I make the arguments – I’m not a fan of copyright or even IP protection. Ideas are cheap and the person “exploiting” the idea has put in significant effort to think about its utility and addressable market. I could similarly argue that an “idea maker” can benefit from someone monetising their idea through fame and more contractual work. Should the “idea maker” compensate the “idea monetise”?
Consider also the case where a person with photographic memory reads as much of the books he or she can. This person is then compensated as the go-to fount of information. Should the book authors be compensated?