For even an off-the-cuff observer of Donald Trump’s electoral profession, it’s clear that the previous, and future, president has centered his political mission on an opposition to immigration. He has, over and over, made inflammatory statements about immigrants — lots of which collapse underneath scrutiny.
In an election yr through which immigration was a vital problem for voters and fashionable anti-immigrant sentiment grew, our query turned: How can we perceive Trump’s immigration rhetoric in its full scope and significance, and the way may we equally interrogate Vice President Kamala Harris’ language?
The Marshall Mission got down to deal with this query forward of the 2024 election. Specializing in immigration — an space of public discourse rife with falsehoods, and an specific cornerstone of Trump’s marketing campaign — we determined to take a hen’s-eye view of every candidate’s feedback on immigration over many years of public life accessible in Factba.se, a public database of presidential candidate statements.
Our aim was to course of a whole bunch of hundreds of traces of transcript textual content to drag out 1) what number of of those statements had been about immigration, 2) what number of of these immigration statements had been repetitions of the identical concept and three) what number of of these repeated concepts had been false.
Processing massive quantities of data is a crucial problem in journalism. Till lately, a processing job of this scale would have been deserted as unimaginable: No reporter can realistically learn and categorize 10 million phrases — we roughly estimated it will take the typical reader round 700 hours. Enter pure language processing.
NLP is the usage of computer systems to know, course of and generate textual content. Methods like matter modeling, classification and clustering are long-established in pc science, and have lately turn into extra accessible in much less technical fields by means of growing computational assets and improved interfaces. These strategies can vastly enhance reporters’ capability to seek out and course of the data they’re searching for.
We used NLP methods to guage the dimensions and nature of Trump’s and Harris’ immigration rhetoric, which had been starkly completely different attributable to their approaches to immigration and the differing lengths of their candidacies. After scraping over 350,000 traces of textual content from nearly 4,000 Factba.se transcripts courting from 1976 to the top of September 2024, we filtered and grouped the statements into these made by every of the candidates, and used a binary classifier to establish over 12,000 of them that had been about immigration.
From there, we used a clustering algorithm to create teams of comparable claims. Reporters manually reviewed the outcomes, combining some clusters and splitting up others, refining them right into a closing set of main claims about immigration. We tailor-made a binary classifier for each and ran it on your complete corpus, which gave us, for every declare, a set of statements more likely to be making that declare. Lastly, reporters manually reviewed every set of statements, confirming which of them had been examples of that individual declare.
The consequence was a set of 13 claims rigorously checked by reporters, and a decrease certain for the variety of instances Trump has made some variation of every one. That allowed us to indicate he has repeated a number of the claims at the least 500 instances.
For instance, Trump has referred to unauthorized immigrants as criminals at the least 575 instances, as snakes that chunk at the least 35 instances, as coming from prisons, jails and psychological establishments at the least 560 instances and as inflicting crime in sanctuary cities at the least 185 instances. He has described the development of a wall on America’s southern border as important to public security at the least 675 instances, and has argued at the least 50 instances that mass deportations are acceptable as a result of President Dwight Eisenhower did it. We discovered all of those claims to be both totally false or, on the very least, extremely deceptive.
On this mission, we used primary, trusted NLP strategies to drag significant findings out of a mountain of textual content. And you are able to do it, too. By arming yourselves with NLP methods to simplify massive datasets into subsets which might be extra manageable for human evaluation and utilizing them to set decrease bounds (reminiscent of “at the least 50 instances”), reporters like you may turn into extra environment friendly with out sacrificing accuracy.
We collaborated with Robert Flagg, a knowledge scientist and father of Anna Flagg, one of many mission’s reporters. He designed and developed code for the evaluation with The Marshall Mission, and supplied knowledgeable steering on NLP to reporters.
Listed here are some extra particulars about how we did it:
Scraping
We wanted the uncooked knowledge, so our first step was to scrape speech transcripts for the candidates from Factba.se. We obtained permission from Factba.se earlier than scraping.
Utilizing Python and the Stunning Soup and Selenium libraries, we pulled down a listing of each candidates’ speeches, interviews and different accessible transcripts from the Factba.se search outcomes web page, together with the URLs of particular person transcripts, which we additionally then scraped. Factba.se gives the transcripts conveniently damaged up into small segments of textual content, normally one or two sentences, labeled by speaker. We counted every of these snippets of speech as one assertion.
After scraping, the consequence was a dataset of public statements of the candidates, interviewers and different contributors within the talking occasions, in addition to the date, location and different items of metadata in regards to the occasion.
Subsequent, we wanted to drag out all of the statements associated to immigration. We determined to make use of a binary classification mannequin, a way of categorizing knowledge into certainly one of two teams, as a result of the character of our drawback was to label every assertion as certainly one of two issues: about immigration, or not about immigration.
Such a classifier works by studying patterns from an preliminary “coaching set” of labeled knowledge, which it may possibly then apply to labeling new knowledge. So we wanted some labeled knowledge — and a major quantity of it. However we didn’t need to spend weeks having people label pattern materials. As an alternative, we ran a subset of the info by means of the big language fashions (LLMs) GPT-4o mini and Claude 3.5 Haiku, which we prompted to label every assertion as both about immigration or not.
To enhance the accuracy of the LLM responses, we used Clue and Reasoning Prompting, a way that requires the LLM to purpose step-by-step by first producing a listing of useful hints, after which articulating a diagnostic reasoning course of earlier than making a judgment about whether or not the passage is or just isn’t about immigration.
Utilizing the ensuing labeled knowledge as an preliminary coaching set, we fine-tuned a RoBERTa binary classifier, a state-of-the-art classification mannequin. We ran the mannequin on the general unlabeled knowledge. When the mannequin expressed low confidence in its reply, reporters manually reviewed and supplied labels, added the ensuing labeled knowledge to the coaching set and educated the mannequin once more. We repeated this cycle a number of instances to enhance the mannequin’s efficiency, a method referred to as lively studying.
Clustering to establish main themes in immigration rhetoric
We hypothesized that lots of the statements had been repetitions of the identical concept. So we wanted a option to group collectively statements that had been comparable in that means.
We turned to a typical deep-learning device referred to as a transformer, which works by representing enter knowledge as high-dimensional vectors. Transformers had been launched in “Consideration Is All You Want,” a seminal paper by Google builders that turned a key constructing block within the area. Listed here are some extra particulars about transformers.
In our case, our enter knowledge was the statements. We used a sentence transformer to embed the statements in excessive dimensions, and the UMAP dimension discount approach to create a simplified illustration of every assertion. We then clustered these into teams of associated statements utilizing DBSCAN.
Human evaluation
The aim of this evaluation was to discover the universe of candidate statements about immigration, and report out the key themes we noticed and the way usually they had been repeated. Our findings wanted to be 100% reported by people. All our language processing was to get to the stage the place reporters may step in with their experience.
Reporters learn statements from every cluster that had been highlighted. To assist this evaluation, we once more used an LLM, prompting it for a abstract of every cluster primarily based on its 10 most related statements as outlined by the mannequin’s reported degree of confidence. We paired this info with WizMap, a device used to visualise high-dimensional embeddings, which reporters used to see and discover the immigration statements.
Reporters mixed some clusters and break up aside others. The pc-aided work made this course of far more environment friendly, shortly surfacing themes and patterns from an in any other case overwhelming quantity of textual content.
Counting statements for every declare
Our closing set of immigration claims in hand, we once more educated the binary classifiers, searching for statements that matched every declare. We used an identical course of to earlier than, fine-tuning every classifier with a set of statements labeled by an LLM and improved by human evaluation.
Reporters then manually reviewed the statements returned by the classifier, typically amounting to a whole bunch of statements or extra for a single declare. Any assertion deemed to not strictly match the declare was thrown out. These false positives had been extra frequent for some claims than others, typically numbering within the a whole bunch.
The consequence was a complete listing of main repeated claims about immigration pushed by the candidates’ catalogs of immigration-related statements. For every of the Trump claims that we fact-checked, we had a set of as much as a whole bunch of cases, all confirmed by human reporters.
For instance, the mannequin surfaced a sample in Trump’s speeches of citing a bunch of remoted, tragic circumstances to allege that undocumented immigrants are killing People en masse. Reporters learn all of the statements categorized in that class, throwing out any false positives, and located that Trump had made this declare greater than 235 instances.
Reporting with pure language processing
On this mission we used classifiers, LLMs and clustering to slender a big dataset of textual content, utilizing human reporters at strategic factors to information the method, on the finish producing a completely human-reported set of outcomes.
We hope this work generally is a helpful reference for a way reporting initiatives can use computer systems for one thing they’re good at — processing a number of textual content — and people for one thing they’re good at — offering nuanced editorial judgment.
Proper now, reporters have an ideal alternative to make use of trusted NLP strategies as a robust device to each increase and velocity up their work. By mixing computer-aided methods with conventional journalism, we’re in a greater place than ever earlier than to deal with reporting issues that contain huge quantities of data, with out sacrificing accuracy.











![One-Week Faculty Development Programme (FDP) on Literature as a Repository of Indian Knowledge Systems by NLU Tripura [Online; Aug 25-30; 7 Pm-8:30 Pm]: Register by Aug 24](https://i2.wp.com/cdn.lawctopus.com/wp-content/uploads/2025/08/Faculty-Development-Programme-FDP-on-Literature-as-a-Repository-of-Indian-Knowledge-Systems-by-NLU-Tripura.png?w=120&resize=120,86&ssl=1)








