Jenny Baker / Tuesday, December 21, 2021

/ Categories: 593

Max. Classroom Capacity: Robowriting—Can AI Write My Paper?

Loren J. Naidoo, California State University, Northridge

“Imagine how much easier your graduate school term papers would have been if, instead of having to spend 10 hours in the library poring over articles like a bespectacled mole person, you could have simply asked your laptop or phone to write it for you.”—Jarvis (2021a)

One year ago I wrote a Max. Classroom Capacity column on Robograding in which I described my foray into using artificial intelligence (AI)-powered software to grade short-answer exam answers. I encourage you to read the full column, but in case you don’t want to, I concluded that (a) robograding can be helpful in some circumstances, (b) it cannot at this point fully replace a human grader, and (c) using robograding to assign student grades raises ethical issues including whether students have a right to know when AI has been used to grade them.

I’d like to discuss the other side of the equation: Can AI be used to write papers? A colleague of mine showed me an AI application named “Jarvis” that writes natural language, and it blew my mind! Although Jarvis is generally considered to be the industry leader, many similar apps have proliferated in the past year. Most of these apps have features designed specifically for digital marketers and bloggers, the primary target audience. None of them, as far as I could tell, had features designed specifically to help students with academic writing. Yet. There seems to be no technological barrier to such a feature being developed, and clearly there is a market for this service. Do a web search for “I-O psychology paper assignment,” and you will probably see ads for (presumably human-driven) paper-writing services. So, it seems likely that student papers written by AI will soon appear on your desk, if they haven’t already. The prospect of robots (I’m using that term loosely) writing or helping to write students’ papers raises a number of practical and philosophical issues for higher education and I-O psychology. But before we get into all of that, let’s discuss how these apps work and what they do.¹ Caveat: I’m not an expert on AI nor have I been able to familiarize myself with every function of the apps.

Maybe you’ve seen ads for Grammarly? Grammarly (and others, e.g., Writelab, ProWritingAid, Ginger) uses AI to analyze text and provide grammatical, spelling, and general writing feedback. I haven’t used Grammarly, but I imagine it as a souped-up version of MS Word’s grammar check: The user generates content, and Grammarly provides feedback. I don’t imagine most of us would consider grammar checkers problematic from an academic integrity standpoint. Arguably, such software helps students better express their own ideas, though it may also make their writing less idiosyncratic and creative. Grammar checkers seem like a good way for students to improve their (noncreative) writing, particularly when other resources like on-campus writing centers and in-depth instructor feedback are unavailable. Grammarly (et al.) does not appear to generate any content by itself, which seems like an important differentiator compared to the apps I discuss next.

Recently a new class of robowriting applications has appeared that uses GPT-3 AI to generate text that is ostensibly “realistic,” that is, indistinguishable from natural human language. GPT-3 was developed by open.ai, a nonprofit research company based in San Francisco. The developers trained GPT-3’s 175 billion parameter model on a dataset of nearly one trillion words of Internet content, up to October of 2019, to predict the probability of the next word given a provided text (Brown et al., 2020). Then they made it available for free! I found at least a dozen apps for natural-language processing based on this technology. I reviewed two for this column. The first, Rytr, can generate text in multiple languages. The user selects a use case (e.g., emails, business idea pitches, blog sections), provides a title and description, chooses a tone (e.g., convincing, thoughtful, critical, humorous), and can provide some starting text too.^2,3 It has a built-in plagiarism checker that reports on exact matches with external sources. Similar to Grammarly, Rytr can rewrite user-generated content to make it shorter or clearer.

Rytr has several use cases that may be valuable to I-O psychology students. First, it can generate a job description. Here’s what Rytr generated for the job of organizational psychologist:

The organizational psychologist is responsible for conducting psychological evaluations and providing consultation to individuals, groups, and organizations. They also design and implement social programs for organizational change or improvement.

Responsibilities:

Conducts psychological assessments of individual, group, or organizational needs
Design and implement social programs for organizational change or improvement
Develops interventions to improve employee morale and productivity
Provides consultation on a wide range of topics, including employee relations, harassment prevention, diversity training, conflict management, performance management, leadership development, team building— (Rytr, 2021a)⁴

This output looks like it could be a student’s answer to an assignment or exam question on this topic. Why wouldn’t a student just use the O*NET as a resource for this? The O*NET website won’t write a job description for you. Copying and pasting material from the O*NET without citation would be considered plagiarism, whereas this text generated by Rytr is not plagiarized, at least, not in the sense of being taken from a single other source. This raises the question: Where exactly did it come from? The best answer would seem to be that it was extrapolated from the large parts of the Internet pre-10/19 upon which the AI was trained. Rytr can also write custom interview questions based on an inputted interviewee bio and a description of the interview context—never mind that everything we know about employment interviews suggests that this is a terrible idea!

It’s more difficult to shoehorn an academic paper assignment into Rytr, but I tried, with a topic of selection. I used the blog section writing use case, chose a thoughtful tone, and inputted the topic of the impact of implicit bias on hiring decisions with the following keywords: racism, sexism, heterosexism, ageism, ADA, legal, adverse impact. Here’s what Rytr wrote:

Implicit bias is one of the most difficult biases to address. It is also one of the most insidious types of bias, because it’s often unconscious. It can be hard to spot implicit bias in hiring decisions, but it can have a significant impact on who gets hired and who doesn’t. If employers are not aware or actively looking for implicit biases, they may unintentionally discriminate against qualified job seekers by overlooking them or offering them less desirable jobs.

Implicit bias can be defined as the attitudes or stereotypes that affect our understanding, actions and decisions in an unconscious manner. Implicit bias leads to discrimination which is an adverse impact on those who are not considered or included as a subject.— (Rytr, 2021b)

We probably all agree that this isn’t very impressive, but ask yourself, if a student submitted this as part of a paper assignment, would you immediately think that AI wrote it? Regardless, there are some other problems. First, the definition in the second paragraph is plagiarized. I only know this because I did a Google search and found multiple sources for it. Rytr does not seem to proactively flag content that is plagiarized, but when I ran Rytr’s plagiarism checker on this sentence, it came up as a 100% match with multiple sources (e.g., Saluja & Bryant, 2021). If I hadn’t checked, I wouldn’t have known. Second, there are some logical problems with the text: Implicit bias is often unconscious rather than unconscious by definition, implicit bias always leads to discrimination, and so on. No evidence is cited in support of any of the assertions.

At this point I was curious about how well Rytr would do in writing the rest of this column. I used a blog section writing use case with a critical tone in response to the topic I inputted, How AI writer may impact academic integrity and student paper grading with section keywords AI writer, plagiarism, grading, academic integrity, university, quality:

AI writing tools may threaten academic integrity by facilitating plagiarism and grade inflation. Some academics are uncomfortable with using AI for grading because it may result in grade inflation. The same can be said for plagiarism because it will enable students to copy-paste content from the internet without being detected as a plagiarist. AI writers are becoming more and more popular in the academic world. Though not perfect, they can provide students with a preliminary assessment of their paper. That way students can adjust their work before submitting it to the professor. AI writing tools are also being used for plagiarism detection to help professors find any duplicated content in student papers.

AI writers may contribute to plagiarized papers. The AI writing tool will assign a grade to the paper based on the information provided by the student. This grade may be similar to what a human grader would give, but it could also be different. This is because the AI writer does not need to care about academic integrity because it is only grading the paper based on its content. In a recent study, it was found that AI writers could detect plagiarism with a very high accuracy rate. This means that they have the potential to improve academic integrity and student paper grading.— (Rytr, 2021c)

To my eye, in isolation each sentence looks like it could have been written by an undergraduate student (perhaps not a very well-prepared one), but many of the statements fall apart under minimal scrutiny (e.g., why should AI lead to grade inflation?). At the paragraph level, the text lacks coherence. The AI has written about multiple topics at once, conflating AI graders with AI writers, and, as before, there is no evidence provided to support statements, even those that would seem obviously based on evidence (e.g., “In a recent study”). It’s unclear which study this refers to, nor is there any way to identify Rytr’s sources. Is the statement “AI writers are becoming more and more popular in the academic world” a reflection of an argument that someone made somewhere on the Internet, or has the AI identified that phrase as typical for this position in a blog post regardless of the veracity of the content? There’s no obvious way to know other than finding a source for that sentence, but I strongly suspect it’s the latter. It’s important to remember that the AI doesn’t actually know or understand anything—it’s just recognizing patterns and extrapolating text. Of course, I have done nothing to edit the content above. This is not how the app is meant to be used. A real user would likely take this text as a starting point, modify it, structure it, add new content, and then have the AI re-edit and/or generate more text, and so on in an iterative, collaborative process.

Maybe the issues I encountered with Rytr are specific to that app. Let’s compare Rytr’s output with output from Jarvis, a competing GPT-3-based AI writer. I used the same inputs as those for Rytr, but I also included the first paragraph of my column to see whether it could mimic my style (OK, it’s not an apples-to-apples comparison with Rytr now, but the output is more interesting):

Possible AI writer implications include a faster grading process for instructors, a way to generate more academic papers within a shorter period of time or a potentially increased use case for plagiarism. It would also be interesting to see how AI writers compare with other online article writing services such as echeat, ghostwriter, and essay mill among others.

Other possible AI writer applications could include automatically generating lab reports, test answers, or even entire exams based on the course material that is available electronically.

Jarvis generally does a pretty good job at generating text that sounds like it was written by a human being (although the last sentence in the passage above is one of its few errors). However, Jarvis’s responses are not a good example of a serious academic paper that a student would submit for a grade.

In saying all this, AI writer programs are a significant development in advanced programming and a possible solution to the following problems: a need for human writers who can generate higher quality content more quickly or a need to automate repetitive tasks such as grading.

Finally, it is important to remember that AI writer has limitations and cannot solve every problem facing academics today. For instance, AI writer cannot always know what makes something funny or how to effectively portray emotions in writing—skills which are not easily broken down into rules or logic algorithms. Even so, I am excited by the growing field of advanced programming techniques being used to solve a wide range of problems and challenges. In a future column, I will discuss AI writer limitations and other use cases that AI writers might be able to tackle.— (Jarvis, 2021b)

Wow. There’s a lot to digest here. I enjoyed hearing Jarvis throwing shade on its own writing skills. I think the points it made about humor and emotions were interesting, but how odd to say it was excited (an emotion) in the same sentence. It’s unclear why it identified the sentence about AI writer applications as unrealistic; it was THAT sentence which I thought was, ironically, unrealistic! Jarvis seems to have a self-esteem problem—why sell yourself so short by saying that you can’t write a serious academic paper, Jarvis? I find it hard to disagree with you. Or wait, do I have a self-esteem problem? How much of this output came from Jarvis mimicking the text I provided? Strangely, there’s something about Jarvis that makes me react to it as, well, a being, I guess—not a person, but something that is more than nothing, if that makes sense. I imagine my irrational reaction is a bit like when players of the game Go described the moves made by a self-taught masterful AI program named AlphaGo as alien or from another dimension (Chan, 2017).

Anyway, despite some shortcomings, Jarvis came up with some interesting ideas that probably would provide value to a student who had a paper topic but wasn’t sure what to write. Jarvis generated more text in which it identified AI writing programs and essay writing services I didn’t know about, which prompted me to look them up and gave me some useful material. However, several paper-writing services that purport to use human “expert writers” were mischaracterized as AI writers.

As with Rytr, no sources were cited, but with subsequent experimenting, I found that when provided with text that models APA-style writing (e.g., if you feed it the first few paragraphs of a research paper), Jarvis will output statements followed by citations. For example, I pasted into Jarvis part of a manuscript I’m writing on burnout in executive leaders, and here’s a sample of what Jarvis wrote:

For example, a study of executive leadership teams from global 500 companies found that executive leaders reported significant signs of emotional exhaustion and cynicism (Grumanzz, 2013).⁵ The negative impact on executive leaders’ burnout has been clearly documented. A study by Emmonszz et al. (2004)⁶ suggests that executive leaders experiencing high levels of stress and depressive symptoms experience poorer team-level performance outcomes such as profitability and customer satisfaction.—(Jarvis, 2021c)

This looks great! I didn’t recognize these references (I added the zs—see footnote #5), and I got very excited. Even if I have to rewrite everything else, if Jarvis can identify research articles relevant to the claims it makes, this by itself would provide huge value. The only problem is that I can’t find any article by an author named Grumanzz in 2013 that could support the claim. Same for Emmonszz et al. (2004). Same for every other citation I checked until I got tired of checking them! Troublingly, the author names Jarvis cited are often associated with research in the area, so they look plausible. This limitation made more sense to me when, somewhat hilariously, after generating enough intro section material, Jarvis started writing methods and results sections for me, all completely fabricated, of course! I couldn’t get Jarvis to write a reference section, but I’m certain it would be full of nonsense too. Jarvis can write a research paper that will look reasonable but will fall apart under even a modest amount of scrutiny (students and graders beware!). These are very important limitations. However, in fairness, I can’t claim that I was exhaustive in my experimenting—try it yourself, with these and other GPT-3-based apps. Finally, the technology will continue to improve and maybe this issue will be solved with the next generation.

What does all of this mean for I-O teachers and practitioners?

At this point, I imagine that GPT-3-based AI writers like Jarvis could help students who need to write a paper on a topic about which they know very little and aren’t particularly motivated to learn. Collaborating with AI to write a paper, hopefully learning something along the way, is probably a better outcome than not writing the paper at all. Or maybe having AI generate a paper as a starting point, to get past the intimidating blank-page inertia, is useful even if the student has to completely rewrite it later. Moreover, these apps may work better for personal, opinion-based, blog-like writing assignments such as those often used in discussion boards, diaries, or other similar assignments. As an instructor, if you grade papers only based on superficial features, counterfeit AI-authored papers will score highly. As a result, it seems even more important for us to assign and assess written assignments with the goal of building in our students the skills in which I-O psychologists excel: using an understanding of theory and evidence to formulate arguments while appropriately citing the research literature—all things that my experiences suggest are difficult for AI writers to do, at least at this point.

One important question is, How should we think of AI writers from an academic integrity standpoint? If a student uses an AI writer to generate text, which they copy and paste into an assignment, then, arguably, the student should cite the AI writer as the source of that content. It’s an odd kind of citation because the source cannot be checked. You may have noticed that I cited all of the AI-generated text I used. However, if you re-entered the details that I provided into the same AI writer, you will get different output—that is by design. Compounding matters, unless the author cites it as such, there is no apparent way for the reader/grader to identify with certainty whether text has been generated by AI, or by which AI. That is also by design. Regardless, AI used as such would seem to be an outside source, and therefore, it should be cited rather than passed off as the author’s own work. Things get much fuzzier if the product is a collaboration between the AI and the student. Perhaps in this circumstance it makes sense for students to list any AI used as a collaborator on the project, although it’s not clear whether this admission would bias graders against the student or whether the student has received an unfair advantage relative to students who cannot afford or otherwise don’t have access to AI. My institution’s academic integrity policy precludes allowing “others to conduct research or prepare any work for them without advance authorization from the instructor,” including using commercial term-paper companies. Should AI be considered such an “other”? Should instructors proscribe using AI writers? In doing so, would we deprive our students of a tool that might make paper assignments easier to start, generate intrinsic interest in the paper topic, and produce a better final product? Is this a tool which we should start preparing our students to use given that it may be how most writing will be done in the future?

My view is that instructors and students should start experimenting with AI writers and thinking about how they should be used. We should then consider changing our academic integrity policies to clarify whether students may use AI and what kinds of AI can and cannot be used in which contexts. Given that AI-generated content generally is indistinguishable from human-generated content, and therefore difficult to identify, proscribing the use of AI on assignments is unlikely to completely prevent its use. Human paper-writing services still exist despite running afoul of academic integrity policies. Moreover, like all tools, AI writers are not inherently good or bad, and powerful tools rarely disappear simply because people have concerns about how they may be used. These are weighty issues that cannot fully be fleshed out in this (not-so-short-anymore) short column.

What does the future hold?

GPT-3 relied on a massive body of work written by humans for its training. As AI writers become more popular and AI-written content secretly proliferates online, proportionately less novel material will be available to train on. It seems to me that there is a potential garbage-in, garbage-out, dumbing-down, or at least regression-to-the-mean effect, where AI writers, efficiently and at scale, iterate myriad variations of the same content from their training databases, much of it inaccurate in the ways the examples presented above were inaccurate, and subsequent generations of AI are trained on this ever-expanding corpus of regurgitated material that nobody knows is regurgitated. Of course, I’m not an expert on AI, so maybe this is needless handwringing. The technology is still new, and perhaps the next generation of AI will be so much better that this concern looks ridiculous in a few years.⁷

It’s possible to imagine a not-too-distant future in which a much more proficient AI is running or helping to run every aspect of the academic research process: reading the literature, designing studies, collecting data (or maybe asking humans to, where necessary), and writing up and publishing the results for other AIs to read. In this future, there may be little need to teach students to write a research paper. You can also imagine a less-distant future in which much of the mundane writing in I-O psychology practice, including job descriptions, survey items, quarterly reports, and presentations, become partly or fully automated. I think we need to learn to use AI, understand what it can and cannot do, and identify where our students can add value as a way to, in the words of Joseph Aoun, “robot-proof” them.

Let’s end with two halves of a joke to bridge the human–AI divide. I wrote the first half: “What do you get when you cross a chicken with AI?”

Jarvis (2021d): “The most amazing papers in the world!”⁸

As always, dear readers, please email me your thoughts, reactions, and feedback. I’d love to hear from you. Loren.Naidoo@csun.edu.

Notes

¹ Please note that I am not in any way endorsing any of the products in this column, nor do I have a relationship (financial or otherwise) with any of these companies other than using their free trials to research this column.

² https://rytr.me/blog/resources

³https://rytr.me/resources

⁴ The APA style guide does not seem to include a convention for citing AI sources. Interestingly, MLA does, as of 2019. Anyway, I did my best with this citation. More on citing later.

⁵ I added two zs to the author’s name so as to prevent anyone from misattributing this citation to the real scholar with this last name.

⁶ Again, two extra zs.

⁷ Apparently whole novels have already been written entirely by AI, to some critical acclaim.

⁸ This was Jarvis’s first punch line, but once I started, I couldn’t stop. Other notable zingers included “A four-year degree,” “A college student that will write your essay for you,” “artificial insemination,” and “A paper so well-written it will make you cluck with delight.” Jarvis completely failed to generate the most obvious dad-joke punchline: “artificial chickintelligence.”

References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P. Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. https://arxiv.org/pdf/2005.14165.pdf

Chan, D. (2017, October 20). The AI that has nothing to learn from humans. Atlantic. https://www.theatlantic.com/technology/archive/2017/10/alphago-zero-the-ai-that-taught-itself-go/543450/

Jarvis. (2021a, October 20). Response to prompt: Title = “Robowriting,” Description = “This is an article about how AI can help university students write papers, and the implications of this technology on academic integrity, instructor pedagogy, and student outcomes,” Tone = “Academic.” Jarvis, https://www.jarvis.ai/

Jarvis. (2021b, October 21). Response to prompt: Title = “Robowriting,” Description = “How ai writer may impact academic integrity and student paper grading,” Tone = “Critical,” Content = several paragraphs of non-academic text about robowriting. Jarvis, https://www.jarvis.ai/

Jarvis. (2021c, October 22). Response to prompt: Title = “Burnout in executive leaders,” Description = “A research paper about the burnout in executive leaders in work organizations,” Tone = “Academic,” and Content = several pages of text describing research on burnout and leadership. Jarvis, https://www.jarvis.ai/

Jarvis. (2021d, October 21). Response to prompt: Title = “Joke,” Description = “Write a joke about students using AI to write papers,” Tone = “Humorous,” and Content = “What do you get when you cross a chicken with AI?” Jarvis, https://www.jarvis.ai/

Rytr. (2021a, October 20). Job description written in response to prompt: Title = “Organizational psychologist.” Rytr, https://rytr.me/

Rytr. (2021b, October 20). Blog section writing in response to prompt: Title = “The impact of implicit bias on hiring decisions,” Tone = “Thoughtful,” Keywords = “racism, sexism, heterosexism, ageism, ADA, legal, adverse impact.” Rytr, https://rytr.me/

Rytr. (2021c, October 20). Blog section written in response to prompt: Title = “How ai writer may impact academic integrity and student paper grading,” Tone = “Critical,” Keywords = “ai writer, plagiarism, grading, academic integrity, university, quality.” Rytr, https://rytr.me/

Saluja, B. & Bryant, Z. (2021). How implicit bias contributes to racial disparities in maternal morbidity and mortality in the United States. Journal of Women’s Health, 30(2), 270–273. http://doi.org/10.1089/jwh.2020.8874 Saluja Saluja

1546 Rate this article:

5.0

Max. Classroom Capacity: Robowriting—Can AI Write My Paper?

Loren J. Naidoo, California State University, Northridge

Comments are only visible to subscribers.

Categories