AI, Education 🙵 the End of All Things

José Goudet Alvim

2024-03-10

I’ve recently seen memes circulating in response to a proposal (doesn’t matter whose it is or if it’s real or not) of using “artificial intelligence” tools in order to grade school assignments, at least to some extent. The sheer bleakness of basic education globally, even prior to AI, already colors my general response to attempts to “disrupt” a system that really needs a hug, and not becoming the new arena for “going fast and breaking things”; but there is something to be gleaned from the surreal image of students submitting AI generated gobbledygook to be graded by similar AI systems.

“Intelligence”?

First, I feel it my duty to preface this discussion with a little diatribe about the expression “artificial intelligence” as applied in scare-quotes and oft repeated by marketing people and science-y people with a stake in the game. When I or a tech guy talk to each other, we know that AI is an umbrella term that refers broadly to a few buckets

  1. Advanced, but deterministic, algorithms doing clever stuff with data, logical reasoning about data-sets. Formal methods that have been thought out and reasoned about by a group of programmers: people put finger to key cap and typed out a system whose behavior is, formally at least, wholly understood (at least ideally, programmers are typically not so lucky).
  2. Advanced, sometimes non-deterministic, “machine-learned” algorithms: processes that operate on data that has some kind of numerical cartesian representation (imagine a board with several sliders and each configuration/image/snapshot of that board is a citizen of a mathematical “space”) (Image: Grafischer 31-Band-Equalizer1)

    Those algorithms haven’t been “coded” by anyone in particular, but instead they start as a mindless, random, manipulation of the sliders and combinations thereof that underwent a process called “training”. Training consists of (theoretically carefully) slapping and punching it in order to coax it to produce statistically better results on its “training data” – which is a collection of examples we are effectively trying to squeeze into the neural network’s architecture.
  3. Combinations of those two: processing data and feeding it to a neural network, and using the output of that in other processes.
  4. Other stuff that people dream up and may be slightly different but the gist is the same.

While the discussion of what constitutes intelligence, what are reasonable partitions of the human mind into roughly orthogonal faculties and processes, and what learning really means, are all very rich and really interesting topics, they is not the object of our interest in this discussion. What I aim to illustrate is that, repeat after me:

When a technically minded person speaks of AI, they are referring to non-magical systems, with real limitations, that only very superficially resemble human cognition, in shape or outcome, if at all.

I stress this point because treating artificial intelligence as it currently exists as anything more than quirky jargon is reckless. The field of artificial intelligence, as an area of interest and research, is named after its pretensions and ultimate goals, not the object of their current understanding (like physics, chemistry, biology).

So why do we keep hearing about AI like it’s, frankly, mystical. Well because there is a lot of money to be made. I’d like to humbly propose a derogatory: “smart-washing”, referring to the phenomenon of technical mumbo-jumbo and hype being used to dress up an idea that is either bad in principle or could never get off the ground.

End of diatribe. Let’s talk about education and AI.

Automated Reasoning and its Consequences in Education

Arguments ab absurdo (which some people have a real hard time distinguishing from reductio ad absurdum) have a special place in my heart, because, although fallacious, they can be constructively applied to arrive at positive and enlightening conclusions: if you take a position or policy and exaggerate it to an extreme degree you get two things

  1. If you actually look at how the extreme case fails, you’ll often spot stress lines extant in the moderate position, and this helps you elaborate a subtler argument for the earnest position you oppose;
  2. If you present your interlocutor with an extreme extrapolation of their position (without, of course, claiming it is what they believe), and they agree with you that that specific degree of intensity is too much, you can conclude your disagreement is a matter of placing a line as opposed to a matter of values and principles.

    This is important to determine whether they should go to the reeducation camps or to the firing squad, etc. etc.

It must be remarked, before we proceed, that this is a bit akin to a bad case scenario of sorts. An honest estimation of the future will always be rather bleak, and while my made up timeline below is definitely bad, it is not as bad as things can definitely get. In short: what follows is probably worse than what will happen, but definitely better than what could happen. It should be understood that when I use definite statements like “will be” etc. I refer to the future of the world we are considering. Don’t pester me about rebuking a possibility that is being entertained for the purposes of abducting the mechanisms that would be present and operating in any lesser version of the scenario we are considering.

In the spirit of morbid curiosity, let us consider the absolutely apocalyptic Universe wherein students are secretly using ChatGPT et al to help writing their assignments and teachers are using similarly capable software to grade and give suggestions for their improvement, and also create a holistic digest of how the class performed, like what were common things that tripped up the pupils, where were concepts misapplied, etc. On its face, rather innocuous and even potentially a positive and benign development.

One of the first consequences of this is that as those tools become more specialized, our lateral freedom to deviate from their capabilities diminishes in proportion to the degree of their adoption: consider an overworked teacher (pleonasm); they have a tool that can reasonably speed up grading, but only in a particular set of exercise types that it is sufficiently competent in. While innocuous at first, if adoption of the tool is necessary to meet demands (cf. overworked) then the choice can easily become a cruel one: what if the most appropriate, or most interesting, exercises are not supported by the machine? What if the machine-controllers decide to divest focus from a particular topic or pedagogical approach, rendering the tool obsolete for it? Do we pivot to accompany the trends?

There is a large teacher deficit in all disciplines in many countries, in those countries that are so lucky as to have met their educational needs, this deficit can easily be manufactured by a healthy dose of austerity, anti-science and anti-intellectual rhetoric, and the commodification and divestment in public education. In this world we are considering, one does not even need to privatize public education to adequately dismantle it, it suffices to offer an alternative that privatizes pieces of the means of education: grading etc.

What does this look like? By the nature how AI tools work, they tend to be either focused on data classification (say, “this is the handwriting recognition system”, which looks at images and tries to guess what is written there) or more generative tasks (like ChatGPT, which transforms prompts, interactively, into some kind of output). There is some level of interplay because a classification task could involve a degree of looking at the currently established context (everything that was read as of yet in the image) and seeing if there is something that could be placed there in order to make the whole more sensible, and conversely, a generative task could also contain a subtask that classifies its current output and mediates changes to keep it on subject, for instance.

Because of how those systems are trained (by examples and costly processes of automatic approximation), and the large complexity and cost of operating, training and maintaining them and the machines that house them, these tools become (as they already have) centralized in large companies fueled by venture capital and private equity investment. While open source and “consumer operable” neural networks exist and are available, as it stands they are orders of magnitude (in the strictest sense) less powerful than a ChatGPT instance and even the medium sized ones struggle to run on my work laptop.

Lemma: Those companies can dictate What the tools can do, and thus steer education very discretely.

Lemma: If those tools become necessary, public education becomes hostage, as it is the most vulnerable and most desirable.

These lemmata are justified, in my opinion, by what was previously established: public education is chronically underfunded, our technocratic overlords love technological solutions to societal problems, and funneling money, earmarked for education, towards private BigTech companies is so befitting of the Sardonic Steersman of our Fates that we all know that is exactly how the cookie would crumble. The desirability of control over public education is a story as old as public education: grossly speaking, education can either be seen as instrumental: as a thing done onto children to make them more valuable as workers; or humanitarian: as a thing done for children to make them better persons2. The subtle but absolute control over systems like grading and classroom metrics analysis (as well as cross-classroom analysis, including automated teacher grading) is trivial to exercise when you control the machines and the software being run in schools, and that should have sufficient strength to convince the reader of my two previous propositions.

Let’s dwell on some modes those teaching tools could operate in: grading and analysis. As already elaborated, compatibility with the grading tool effectively determines what exercises are being proposed, and therefore gives way to control what is being tested through the how of it being tested, and thus ultimately, as proposed, what is being taught to begin with. But this I consider settled in this discussion, what now begs our attention is the analysis part: a school principal would find it invaluable to have aggregated and compared all the little graphs and lines and charts that relate every datum to every other datum a global control grading system could spew.

From this, a teacher or principal could track the specifics of a student’s progress through the educational system. They could also interleave that information with metadata such as aggregate social profile data, from social media, and even data gathered from the classroom: let’s pretend that recording our kids 24/7 will ameliorate the school shooting epidemic in the USA, let’s run data analysis on them and cross reference that with other databases. What kinds of wonderful questions can we start trying to answer?

Oh baby, the sky is the limit if you are looking for a place to jump off of.

Of course, with the power of AI, we can use all that metadata (some of which will inevitably leak/be stolen by hackers due to shit opsec, or – even better – be sold “anonymized” but with just enough dots to reconstruct the profiles of those kids like companies like Facebook, Instagram, TikTok, etc. already do…) to generate solutions to the problems inherent to the system that perpetuates them, and what’s more interesting: since those systems cannot feasibly run on local computers, all the data that you care about being analyzed, ranging from grades, names, the entire fucking database, all of it will and must be sent to some Cloud Capital techno-feudal mogul ghoulishly hoarding it, for each school that uses their services.

Corollary: A lot of those services would initially be offered for free.

Remark: When a service is offered for free, it’s paid by other means.

Remark: When a service is offered for free, often the serviced is the payment.

Again, the proposition doesn’t require much convincing: as a tool to organize and expedite the teaching process and offer “real benefits, usually reserved for business intelligence experts with enterprise grade analytics software”, its initial adoption and market capture would be accelerated if it was made free or given free of charge to schools and similar institutions. Their business model is predicated on the thesis that the information gathered and the control they obtained over the education system is reason enough for venture capitalists to invest and subsidize operations. They’d just have to wait until that information can be sold for targeted advertisement and social engineering, as well as the control over education itself being leveraged for the benefit of their stakeholders. It’s self evident that this, or something along these lines, would be exceptionally attractive.

Aside from easily being usable to obfuscate policy decisions to persecute students, teachers, etc. according to the whims of whomever controls the machines, those machines are also fairly unpredictable, which is to say: a very large team with robust testing and very specialized knowledge can probably wrangle the beast continuously, but any single person is dwarfed by both the complexity and scale of the systems in such a way that any wrongdoing that isn’t egregiously overt enjoys an automatic level of plausible deniability. Furthermore, because of the way justice is served and sought in the “civilized” world, crimes of that level of baroqueness are very hard to prosecute: to even agree about what a complex AI system actually does, to the extent that it eliminates reasonable doubt, requires a team of experts studying a system that will often be proprietary and isn’t designed to be comprehensible even by its creators.

This must be emphasized: Neural Network models aren’t comprehensible, we are still very far from being able to look at a model and give meaningful information that could sway a jury except after many mistakes were made so that we can perform statistical analysis and determine that: ”weird, this system seems to hate black people”. Or more inane things, like how you can print patterns, that look like TV static took acid, on a T-shirt and have it so that a particular model will think you are a banana, because that pattern overloads its banana-sensing neurons and drowns out the fact that the person wearing that T-shirt is an oblivious pedestrian.

This will get to the point of complexity and convolutedness where it will feel like trying to interrogate a man as to why he shot your dog but you’re only allowed to look at an MRI of his brain and similar such measures. But it’s not a man, it’s product, a product controlled by people who can and will poke around its little nerves or change whole chunks of it, roll it all back to an older version, etc. Finding who to blame for any outcome in that scenario is near impossible for the same reason you can’t easily pinpoint what exactly made that weird man freak out and kill Rex.

Lemma: The tool-owners will deflect accusations and prosecutors will fail to explain the problem to juries, as it intermixes sociological problems, corporate governance problems and technological problems, all of which are highly complex.

Corollary: There will be very little accountability, and responsibility largely will be shunted, via EULAs and TOSs, to end users: schools etc.

These seem self evident to me.

Let’s distill the discussion so far into a couple of comprehensible bullet points:

It’s not fucking over, kiddos, I’ve got more nightmares to share.

About that whole “Learning” thing…

So, yeah, why are teachers teachers and why are students there? How does learning happen. Well, there are way better informed people out there that can, in principle, second say me. But I’ll risk the controversial opinion that giving a single fuck is probably required.

Now this sober bad trip I’ve induced in myself has soured my mood and I have been writing for a few hours, so forgive my evident descent into impolite incoherence that you no doubt have been witnessing as you read. It is necessary to engage in things viscerally because the guts are the wellspring of passions, and without them any intellectual dissertation is an absolute waste of time. You think this is unrelated to the point of discussion but I feel they are actually joined at the hips:

A careful reader might have noticed that, throughout the discussion so far, I haven’t talked about the student’s side of the story. Or, better said, we have mentioned them insofar as how they are affected by the teaching tools. A whole parallel universe will exist of cheating tools and arguably more moral “coaching” tools. In this section we’ll talk more about how the students will engage differently with education because of these technologies, and how a disruption of that relation is dangerous in no small part because of apathy towards learning.

The coaching tools I believe, will look a lot like the grading and analysis tools, and may be employed by after-school tutors and those can even join in a sort of eldritch constellation of terrible decisions and coalesce into a social media platform: imagine centralizing after-school activities, socializing, maybe even some gaming, and school activities, and, when you graduate, maybe even a Linkedin-esque abomination of circle-jerking corporate platitude festering grounds, it even has some sneak peak of what the job market has in store for the enterprising (ie. poor) teenagers who choose (need) to work an internship while they study! If such a mega-app was birthed into your Universe, it would be heralded, not only by seven fucking trumpets, but by an outpour of praise for how comprehensive and integrated (maybe even gamified) learning and has become under its auspices.

Terror aside, the coaching tools do not interest me greatly, because they mostly fall within the trappings of the teaching tools. What I think is of more interest to us is how you can expect students and teachers to behave in a system where a few things can happen:

These are some of the ones I’ve been able to come up on the spot. Let’s go through them in no specific order. The lack of diversity in questions resulting from the limitations of the automatic grading systems and the difficulty in producing a grading system that can cope with the subtle nuances of how a question may be answered incorrectly but still be essentially correct are, in my view, sister issues that have, like any pair of siblings, a common cause: the cost and sheer complexity of accommodating those subtleties in a system that doesn’t operate like a human.

Any teacher worth their weight in chalk will tell you that when you elaborate an exam, the several different questions are essentially probing the student’s understanding of that particular subject. If you have a diverse set of probes, you obtain more information about that student in light of the exam being more comprehensive. The same concept is being examined from a multiplicity of perspectives because it is (and machine learning folks know this all too well) very hard to convince yourself that an adversarial system (remember, the student, usually, thinks of you as their adversary) actually learned something as opposed to be mimicking or pretending, or gaming your metrics.

Moreover, if that teacher is worth their weight in Hagoromo chalk, they will tell you that the best questions are the ones where the student is required to expose the most of their knowledge on the subject, this is often achieved by relating different concepts and mixing things in a way that requires a measure of creativity and may even be open ended in nature. The reason those are valuable is because as humans we can empathetically place ourselves in the mind of the student by reading the answer and following their reasoning. This lets us gauge, recognize and intuit shortcomings, conceptual blind spots and misconceptions. It also allows us to glean sparks of genuine inspiration, moments of realization, or intellectual rigor, even in an incomplete or, on the whole, incorrect answer.

Now this is hard, it’s hard to write those questions and it’s hard to grade them. Some people have a calling and develop the skill and pump those out like it’s trivial but it requires a large set of abilities. Specifically, and this is how I think of it, for open ended questions you’re grading each answer against an idealized perfected version of itself, of that specific student, Timmy, that you are dealing with at that moment. You weigh in things you’d like to see Timmy develop, and things you see seedlings of in him right now. Maybe it’s a turn of phrase that is elegant, or a sharp wit and tight analogy, it is very subtle and personal and it requires emotion and an actual relation with the student in question.

Proposition: Grading open-ended questions and similarly useful metrics of knowledge is Hard™ and requires much more than an approximation of human cognition

Proposition: The more information you can glean about the student’s knowledge, the Harder™ it becomes for AI to grade and extract

Proposition: The more interesting/stimulating questions for the students are the ones that lets them flex cognitive muscles, which are the ones that reveal the most about their knowledge.

Corollary: The questions AI are capable of grading easily are less informative for teachers and less stimulating for students.

The fact that the questions an AI grading system is most likely to support are both duller, simpler and less informative than the ones we ought to be posing our students is serious, of course. And I think this general statement leads well and naturally into two other problems I’ve cited: the ease of use of generative programs to answer dull, non open-ended, simple questions is self-evident. The machines are well equipped to grade those questions in part because they can also answer them well enough for the kind of supervised feedback learning techniques companies have been using for a while:

Basically, training sucks and human supervision is exceptionally expensive due to the volumes of data required to be processed and, thus, work-hours. So, what ends up happening is that they take their [M]odel M and give it to some humans, those humans give feedback; they take that feedback and use it to train a model P which tries to [P]redict what the humans will think of something that M produces, then you use the fake human P further train M, while you keep training P. You can add a bunch of bells and whistles, change things around but the fundamental problem is that training requires massive amounts of trudging through generated content, which is an activity uniquely suited to… you guessed it, more AI stuff. The short and sweet take is that structurally, problems tractable with AI have a dual, or mirror image, problem of predicting the feedback a model’s response will get.

This indicates to me, although it’s not a proper formal statement, that the questions an AI system may effectively be able to grade with enough accuracy to be worth your time to use it are, by virtue of how the grader was probably trained, also the kind of questions you could train an AI system to answer. I have no real evidence of this, but it feels like an actual issue.

Whether or not my gut feeling is relevant or accurate is somewhat besides the point, as even if I’m wrong, there will still be a large incentive for students to use generative AI systems to solve assignments and homework. It’s only natural as it is already happening without AI grading. But why, then, would I posit that the situation would be worse even if extant AI methods are not particularly competent at solving the kinds of questions they are capable of grading? Simple: lower quality questions means less student engagement, as less is required of them, and they have less avenues of self-expression within the confines of the question format and the capability of the AI to understand and recognize them.

It’s important to understand that there is a level of self-actualization inherent to overcoming an intellectual challenge, in the exact same way that one may obtain personal satisfaction from finishing a set in the gym or reaching a particular milestone in any activity of their choice. A key element of the sense of satisfaction is that it is particularly hard to cheat: cheating that experience is, really, the heart of gamification, in the twisted sense of the word. Why is it twisted? Because it requires a slight of hand, it is by necessity artificial and therefore self-dishonest. When the student sees through it, when they see through the little badges and confetti and pseudo-gamer shit, they become cynical towards it. This is antipodal to any healthy realization of the one’s increasing potential, which is naturally self-rewarding and life-affirming.

Therefore, you cannot fake the joy of solving a hard puzzle with fanfare alone. The problems worth your time are worth it because you couldn’t do them before, and are recognized now by the challenge-master and yourself as capable. Problems worth your time are worth doing with friends, and losing sleep over, not because you’ll fail if you fail, but because they are enthralling. Gamification is just dressing up the same sludge you wouldn’t serve to cattle, and you bet there will be a lot of gamification in education. There is just one problem: the smart kids will see through it, and if education is too dumb for some kid, it’s just dumb. If the other kids can’t see through it, there is no merit in that, it simply means you’ve failed them so spectacularly that they just don’t know any better.

As it stands, children already feel that much of their school subjects are pointless and we’d do them no favors by making them even more trite, removing ourselves as teachers from a substantial activity of our roles and further alienating the relation that must exist between teacher and students.

How does it change? The teacher is increasingly more suspicious that their pupils’ words are not theirs in earnest; they see themselves as having been stripped of a certain emotional wiggle room and personal attachment to the classroom, as the grader cannot reliably be “a little more lenient with a kid that is struggling, or didn’t phrase it the best way they could, but who’s got the gist and just needs a little push” – whether you think that’s appropriate or not, it is a degree of freedom removed from their hands, and an important one at that: if you have no such freedom, you cannot emotionally invest yourself; and the students, on their side, see the teacher less as a source of knowledge but more as a dispenser of it: the judge ultimately is the grader, and therefore teacher can, very easily, be thought as “insufficiently” capable to prepare them for the demands of the grading system.

This is true even if the teacher still has some de jure “ultimate say” on the grades and the grader is a “suggestion”, as deviation naturally will require justification, and an introduction of a step that can, in fact, produce evidence against the teacher in the school board should the grade be adjusted to lower than the grader’s suggestion. Moreover, if the goal of the grader is to make that process faster, then by necessity it makes the teacher care less for it, because if the teacher still reads all the answers with due attention, they might as well grade them too. Now this is different from using a TA to help with grading, because the TA obtains useful experience and propagates the profession, and the TA and teacher can communicate more effectively than the teacher and a grading AI system that is opaque and doesn’t do conversations.

All of these aside, as we’ve mentioned, AI systems are fickle and prone to developing unexpected quirks. A generation of kids raised on frequent tests graded by a system will, inevitably, learn those quirks and apply them very skillfully. For us it might be hard to imagine, but picture you have classes with the same teacher your whole life, you learn the words and kinds of arguments they like, you hear from a friend that word repetitions make it mad, you’re not sure if it’s true or not, but you’re not going to risk finding out on your finals, are you? Your upperclassman Joseph swears he once got a solution wrong and the grader gave him a passing grade, he says he only had to draw a little dinosaur, but that’s surely bullshit. Well you didn’t study for this test anyway, might as well see if it works.

Done freaking out? Yes, a non-trivial amount of meta-knowledge is required to succeed at school and in higher education. Part of the game is knowing you are a player. Part of it is knowing your adversary, how the teacher plays and what are their objectives. But whereas you or I have had several dozen teachers throughout our lives, some kids in our universe of madness will have exactly one: the semiconductor thing that decides if they fail or not. Sure there are a lot of talking heads that say what the thing will expect of us, but they are ornamental: they change, the thing remains.

Tying things up

Good God this was bleak. Thankfully we can stop pretending any of this is currently happening to the same degree that I’ve been assuming for the sake of extracting truth from utter desperation. So, what have we learned from this experiment?

  1. We live in a very short sighted society, eager to embrace new ideas to solve old problems, provided those new ideas come in the form of “technologies”. Something no one thought of before or had the ability to do, but now is available to many with little friction or cost of adoption.

  2. Social problems, specially long standing problems such as the ineffectual nature of public education, globally speaking, resist merely technocratic solutions as evidenced by them having persisted in more or less the same state, if not becoming worse, with time and technological progress.

  3. Superficially benefic technologies can radically change how we relate to each other in well established, structured and even regimented roles (teacher/student, lawyer/client, artist/patron, doctor/patient, friend/friend), as well as how we relate to our roles/ourselves in those relations: am I, the teacher, actually not responsible for grading? Am I, the student, the student of the yapper in front of me, or am I the student of the grader, and the person before me is just a TA?

  4. We must consider every technology, every societal disruption, in the power-system that permeates our society. Ceding an inch is losing three: when next we fight we need to fight to recover one, having one less while they have one more. AI as it stands has very interesting applications but anything that leaves the smallish scale requires computing power that is incompatible with a democratic power distribution.

  5. Trading local computing for remote computing means your data is remote as well: you cannot compute without data. Your data being remote means someone else has it, it being someone else’s means it can be sold.

  6. We must be wary of technologies that enable central aortic actors (States, Corporations) to exert capillary power. We must be creative when catastrophizing because surely people in power will be creative when imagining its uses.

  7. Capillary power, if directed by a central authority, and automated by some kind of AI system, can sustain a level of totalitarianism that has no precedents, because the sovereign can, in principle, directly poke at interpersonal relationships in a discrete and plausibly deniable way.

  8. Having a legal framework to prosecute, disgorge and exemplarily punish companies and executives guilty of shady activity hiding under the auspices of “it’s so complex you can’t begin to sue me” should become a priority before we can’t begin to sue them.

References, Footnotes, Whatever


  1. This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license. Attribution: Wiki-piet.↩︎

  2. This is the crux of many nonsensical grievances about taxes not being taught in school whereas algebra is (considering some kids leave highschool without understanding operation precedence, or thinking that “percentages are reversible” is a surprising fact, I think taxes might be too complex a subject to tackle). What is interesting is that those people fail (cynically, I’d say the pretend to fail) to understand how useful a full education is and how it makes you stand out and confident, as opposed to educated “just enough to do the job well enough”.

    They hardly ever complain about not being taught other useful practical subjects such as carpentry, making musical instruments, chairs, tables, welding and basic electronics, basics of electrical engineering that goes farther than Ohm’s law, etc. or how to debate properly and clearly, how to orient yourself with the stars, or tell when it’s going to rain based on barometric data, or fucking ethics […]

    Sure, if you take some extra classes plus boy scouts etc. you might get a reasonable segment of that pie. But that’s not the heart of the issue: this kind of argument irks me because kids leave highschool without even really knowing basic math, and they want to trivialize it further until it’s life-support education: an Amazon warehouse training video and “how to get the IRS off your back”.↩︎