2 Comments
User's avatar
BlueSilverWave's avatar

There's a genre of "A.I." criticism that is shaped basically like: "of course it can pass an econ exam, the econ exam is in the training data". And we always eventually find the overwhelming majority of the exam in question in the training data. My own negative biases aside, at best it feels like an interpolator between different points in "text space".

Given this, the biggest item I haven't been able to come around on re: generative A.I. is the copyright problem. If we accept the concept of the "generative A.I." as "predicting the next phrase" (I imagine interpolating between points [training data] on an n-dimensional "text-space" graph) based on training data, all of that (largely copyrighted) corpus is encoded in the model like a really shitty zip-file. I think even in relatively copyleft worldviews, this is a big problem.

This is not unique to code, however: we see the same question with people. I read a lot. My memory is hazy sometimes, but I remember the gist of a lot of things, does that make my writing inherently copyright-infringing? I'd say no, unless I am particularly egregious (there is thankfully significant precedent to rely on), but a computer can regurgitate text more or less verbatim. And it can imitate style!

But, then again: "Friends, Romans, Countrymen, lend me your eyes, I come to bury A.I., not to praise it."

N.B. I believe I read that Bloomberg is using a narrowly trained model on financial reports and stocks to improve its financial performance. That being all either internal data or public information seems to get around my concern.

Expand full comment
TinySpark's avatar

Well, does a person own 'a style'?

Moreover, if we can say 'of course it can pass an econ exam, the econ exam is in the training data' (which I don't think is true when we talk about Bryan Caplan's exams), then the same also applies *to econ students*, because the only way econ students can pass econ exams is that they've studied for them!

In any event, I am calm about the copyright issue. I feel it can be more or less easily resolved in a range of ways. (And, to be honest, I don't think a computer that reads people's writings, remembers them, and learns from them is copyright infringement.)

Expand full comment