Discussion about this post

User's avatar
BlueSilverWave's avatar

There's a genre of "A.I." criticism that is shaped basically like: "of course it can pass an econ exam, the econ exam is in the training data". And we always eventually find the overwhelming majority of the exam in question in the training data. My own negative biases aside, at best it feels like an interpolator between different points in "text space".

Given this, the biggest item I haven't been able to come around on re: generative A.I. is the copyright problem. If we accept the concept of the "generative A.I." as "predicting the next phrase" (I imagine interpolating between points [training data] on an n-dimensional "text-space" graph) based on training data, all of that (largely copyrighted) corpus is encoded in the model like a really shitty zip-file. I think even in relatively copyleft worldviews, this is a big problem.

This is not unique to code, however: we see the same question with people. I read a lot. My memory is hazy sometimes, but I remember the gist of a lot of things, does that make my writing inherently copyright-infringing? I'd say no, unless I am particularly egregious (there is thankfully significant precedent to rely on), but a computer can regurgitate text more or less verbatim. And it can imitate style!

But, then again: "Friends, Romans, Countrymen, lend me your eyes, I come to bury A.I., not to praise it."

N.B. I believe I read that Bloomberg is using a narrowly trained model on financial reports and stocks to improve its financial performance. That being all either internal data or public information seems to get around my concern.

Expand full comment
1 more comment...

No posts