Monday, January 26, 2026
Law And Order News
  • Home
  • Law and Legal
  • Military and Defense
  • International Conflict
  • Crimes
  • Constitution
  • Cyber Crimes
No Result
View All Result
  • Home
  • Law and Legal
  • Military and Defense
  • International Conflict
  • Crimes
  • Constitution
  • Cyber Crimes
No Result
View All Result
Law And Order News
No Result
View All Result
Home Crimes

How We Recategorized The Justice Department’s Death In Custody Data

How We Recategorized The Justice Department’s Death In Custody Data



Beneath the phrases of the Demise in Custody Reporting Act, the Justice Division is required to gather details about everybody who dies in prisons and jails throughout america. The intention behind this effort is that, by means of the aggregation of particulars about individuals who die within the custody of 1000’s of regulation enforcement companies, the ensuing dataset may inform future life-saving coverage adjustments.

Details about these deaths is usually not made public at a degree of granularity that may present perception into particular person circumstances. Nonetheless, as a consequence of what was doubtless a configuration error on a Division of Justice web site, we obtained unprecedented entry to the total, deanonymized dataset.

Via an evaluation carried out earlier this 12 months, we had been in a position to present critical deficiencies throughout the dataset that essentially name into query its potential to offer correct insights into developments in in-custody mortality.

This evaluation is meant to rectify a kind of issues in what, regardless of its myriad points, continues to be doubtless the one greatest supply for understanding in-custody dying in america.

Every dying listed within the dataset comprises a “Method of Demise” area that shows a broad label of why somebody died — like “Pure Causes” or “Suicide” — chosen from a listing of eight pre-selected classes. Deaths additionally include a “Temporary Circumstances” area, which comprises a free-text description of the dying, starting from a single phrase to a number of paragraphs.

We found frequent mismatches between what was described in an entry’s “Temporary Circumstances” and the “Method of Demise” that might logically observe from that description. For instance, a number of “Temporary Circumstances” describing capital punishment did not be marked as executions.

Our purpose with this challenge was to make use of a suggestions loop of superior synthetic intelligence functions and handbook, human labeling to assign an correct “Method of Demise” to every dying within the dataset primarily based on what’s current within the “Temporary Circumstances” area. The ensuing evaluation gives each a clearer view of high-level developments of how persons are dying in custody, in addition to an evaluation for a way often the “Method of Demise” area within the dataset didn’t align with the opposite info supplied.

What’s the information?

In November 2024, a web page on the web site of the Bureau of Justice Help, the workplace within the Justice Division managing the in-custody deaths information assortment, displayed a number of tables exhibiting high-level summarizations of the info — resembling counts of deaths by location sort and method.

Whereas the tables didn’t show this info on the particular person degree, it was doable to click on by means of a sequence of menus to make the visualization instrument show the total, unredacted dataset. We downloaded that dataset on Nov. 20, 2024.

Justice Division officers didn’t reply after we requested if this publicity was intentional. Nonetheless, shortly after we downloaded the info, the web site was reconfigured to make subsequent downloads of the total dataset unimaginable.

The information we downloaded contained details about 25,393 deaths that occurred in prisons, jails, group correction packages, and whereas regulation enforcement officers had been making arrests, stretching from Oct. 1, 2019 by means of Sept. 30, 2023.

An preliminary evaluation of the dataset that we revealed this 12 months confirmed a number of systemic issues.

Over 680 people who we all know died in regulation enforcement custody primarily based on info collected by advocacy teams or by means of media reviews had been lacking. For comparability, we reviewed a listing of 1,847 recognized deaths, largely targeted in Louisiana, Alabama and South Carolina.

When deaths had been listed, the descriptions had been usually insufficient. A random pattern of roughly 1,000 deaths discovered that, in over 75% of circumstances, the “Temporary Circumstances” didn’t meet the Bureau of Justice Help’s personal customary for completeness.

For extra details about how The Marshall Undertaking acquired the dataset and what it comprises, please consult with our earlier submit on the info evaluation. The columns used for this evaluation had been “Method of Demise” and “Temporary Circumstances.”

Evaluation

Deciding on data

The dataset we downloaded from the Bureau of Justice Help web site included not solely individuals who died in prisons and jails, however individuals who died whereas being arrested by cops or sheriff’s deputies, in addition to individuals who died whereas collaborating in group corrections packages, like in a midway home.

Since we wished our evaluation to focus solely on prisons and jails, we omitted the three,716 deaths that had been labeled as occurring throughout arrest, in group corrections, or the place the placement was unknown. That left us with 21,675 entries.

Broad categorization clustering

We began by clustering on the “Temporary Circumstances” column, the place we grouped textual content strings by their similarity to at least one one other with a view to reveal normal developments within the dataset.

Embeddings flip clauses or sentences into significant vector representations in semantic area (i.e. two clauses with related meanings ought to exist in roughly the identical space of the vector area). Two “Temporary Circumstances” with roughly related meanings, like “coronary heart assault” and “cardiac arrest,” ought to be shut in distance within the vector area. We used OpenAI’s “text-embedding-3-large” embedding mannequin to transform each entry within the “Temporary Circumstances” column right into a vector with a size of three,072.

OpenAI’s mannequin doesn’t practice on the info we entered into its system.

Subsequent, we clustered these vectors in multi-dimensional area to grasp the info’s form and roughly what number of clusters we must always create. For this process, we used a Uniform Manifold Approximation and Projection (UMAP) method to cut back the dimensionality of our area after which the HDBSCAN clustering algorithm to label every row within the dataset with an assigned cluster labeled by the algorithm. To judge the space between any two vectors, which is critical for clustering, we used the HDBSCAN’s cosine similarity metric.

We selected HDBSCAN over a Okay-means clustering algorithm, since we weren’t initially certain what number of clusters had been very best for the dataset, which might be essential to outline when utilizing a k-means clustering algorithm. HDBSCAN is ready to decide by itself a super variety of clusters primarily based on the general form and distribution of the vectors throughout the dataset.

The algorithm recognized roughly 20 to 30 clusters, relying on the parameter picks for the minimal cluster measurement and the variety of neighbors anticipated. We then ran TF-IDF on all of the “Temporary Circumstances” for every cluster with a view to label every cluster with a possible title. Some examples of subject labels had been “Cardiopulmonary Arrest / Cardiovascular Illness / Failure” and “Fentanyl Toxicity / Fentanyl Intoxication / Fentanyl.”

We manually reviewed the deaths assigned to every of the clusters, studying by means of “Temporary Circumstances” to find out whether or not every dying belonged within the cluster wherein HDBSCAN had assigned it.

Grow to be a Member

Be a part of the group that retains prison justice on the entrance web page.

As soon as we had manually gone by means of every cluster, we froze the entries in every cluster the human reviewer had determined belonged after which re-clustered the unfrozen entries with a view to obtain a greater understanding of the doable titles for every cluster. We additionally tried to take away from the clustering cycles as many outcomes indicating unknown causes of dying or had been nonetheless pending post-mortem outcomes earlier than a reason behind dying willpower may very well be made.

This step was made harder by the a number of misspellings of the phrase “unknown.”

Whereas every spherical helped our understanding of the clusters, we had been nonetheless left with a quite broad class of “-1”, representing HDBSCAN’s reject pile of unclassified entries, which weren’t utilized in assigning cluster labels.

This course of resulted in a listing of named clusters into which we may then type the deaths within the dataset. We mapped every of the clusters we had created onto the unique “Method of Demise” classes outlined by the Justice Division.

Zeroshot Classification

Subsequent, we used the identical OpenAI embedding mannequin to do “zero-shot classification,” which compelled every of the “Temporary Circumstances” into one of many clusters we outlined by calculating the cosine similarity between the vector of a given “Temporary Circumstance” and the vector for every of the clusters and deciding on the very best rating.

This course of generated a spreadsheet containing every “Temporary Circumstance,” its predicted cluster, the cosine similarity rating between the entry and its assigned cluster, its corrected method of dying, and some flags for human assessment.

We added a flag calling for a human to assessment it if:

The cosine similarity rating was low, lower than 0.3, indicating a weak match

The hole between the cluster with the very best similarity rating and the cluster with the second-highest similarity rating was underneath 0.02, indicating that both of these two clusters may probably be a superb match for an outline of the transient circumstance.

There weren’t overlapping phrases within the predicted label and the transient circumstances.

A low similarity rating and an absence of overlapping phrases, for instance, was an indication to a human reviewer that the categorization might not be correct.

Deaths within the dataset that had been flagged on this method acquired a human assessment for potential reclassification.

Entries with inadequate descriptions, like “Gunshot wound to the chest,” which may conceivably be positioned in a number of classes — like murder, use of pressure by regulation enforcement or suicide — had been left to their authentic ”Method of Demise” characterization.

Counting

As soon as we felt assured that every one entries had been categorised accurately, we used the Python library pandas to group entries and rely the quantity in every cluster and replace the “Method of Demise” class as essential.

Limitations

There are a number of limitations to our evaluation that would potential have an effect on how precisely our outcomes replicate the scope of how persons are dying in America’s prisons and jails:

Our prior reporting on the problem revealed that there are numerous deaths lacking from this dataset. We weren’t in a position to decide if these lacking deaths had been distributed randomly throughout the methods wherein individuals died, as a result of we didn’t have a extra full dataset of in-custody deaths for comparability. If deaths of a sure sort had been systematically underreported, our outcomes may very well be skewed.

Our classifications are depending on the standard of the “Temporary Circumstances” entries, which have wild variation within the degree of element current from one description to the subsequent. As well as, some may very well be deliberately written to obscure particulars in regards to the dying that may very well be embarrassing or incriminating for the company holding the incarcerated individual on the time of their dying.

Round 8,600 rows indicated the reason for dying was unknown as a consequence of a pending post-mortem or toxicology report and the knowledge was by no means subsequently up to date. If sure manners of deaths sometimes took longer than others to find out trigger, our outcomes may very well be undercounting these sorts of deaths.

California, which has the second-largest variety of incarcerated individuals of any state within the nation, doesn’t listing a ”Temporary Circumstances” for any deaths, as a consequence of state privateness guidelines. We categorised all of these deaths as being unknown of their “Method of Demise” as a result of it’s unimaginable for us, or anybody utilizing the federal information, to evaluate the credibility of determinations primarily based on info submitted by the state.

Easy methods to work with us

We now have determined to not publicly launch the whole listing of names, as a consequence of privateness concerns for the households of incarcerated people. Nonetheless, if you’re a journalist or researcher thinking about reporting on, or researching, deaths in custody utilizing this dataset, please fill out this type.

In the event you’re thinking about studying extra about reporting on in-custody deaths, try our information for journalists, revealed as a part of The Marshall Undertaking’s Examine This sequence.

Acknowledgements

Due to Jeff Kao at Bloomberg for advising on the technical features of the clustering evaluation.



Source link

Tags: CustodydataDeathDepartmentsJusticeRecategorized
Previous Post

From Paper to Pixels: How Litigators Can Successfully Adopt Digital Binders

Next Post

Terror on the Beach – Helen Dale

Related Posts

Burglary crew hit 3 more businesses this morning, bringing total to 11 this month, police say
Crimes

Burglary crew hit 3 more businesses this morning, bringing total to 11 this month, police say

January 25, 2026
How Trump Has Reshaped the Justice Department and Other Criminal Justice Areas in His Second Term
Crimes

How Trump Has Reshaped the Justice Department and Other Criminal Justice Areas in His Second Term

January 25, 2026
Friends, students honor 'Miss Mayra,' elementary school teacher slain in North Hollywood
Crimes

Friends, students honor 'Miss Mayra,' elementary school teacher slain in North Hollywood

January 24, 2026
Exclusive | Depraved Bronx house of horrors mom’s financial problems revealed — as she remains free on $25K bail
Crimes

Exclusive | Depraved Bronx house of horrors mom’s financial problems revealed — as she remains free on $25K bail

January 24, 2026
At the New York Post: How Trump’s ICE enforcement record blows Obama’s out of the water— by a lot – Crime Prevention Research Center
Crimes

At the New York Post: How Trump’s ICE enforcement record blows Obama’s out of the water— by a lot – Crime Prevention Research Center

January 24, 2026
Restricting jury trials unlikely to tackle courts backlog
Crimes

Restricting jury trials unlikely to tackle courts backlog

January 23, 2026
Next Post
Terror on the Beach – Helen Dale

Terror on the Beach – Helen Dale

Swedish air chief talks next-gen warplane, Gripen’s anti-drone role

Swedish air chief talks next-gen warplane, Gripen’s anti-drone role

  • Trending
  • Comments
  • Latest
Dallas suburb working with FBI to address attempted ransomware attack

Dallas suburb working with FBI to address attempted ransomware attack

September 27, 2024
Detectives Investigating Shooting in Capitol Hill – SPD Blotter

Detectives Investigating Shooting in Capitol Hill – SPD Blotter

October 2, 2025
J. K. Rowling and the Hate Monster – Helen Dale

J. K. Rowling and the Hate Monster – Helen Dale

June 24, 2024
19-year-old fatally shot in quiet NYC neighborhood

19-year-old fatally shot in quiet NYC neighborhood

September 29, 2025
There Goes Lindsey Halligan – See Also – Above the Law

There Goes Lindsey Halligan – See Also – Above the Law

January 22, 2026
Army scraps PEOs in bid to streamline procurement, requirements processes

Army scraps PEOs in bid to streamline procurement, requirements processes

November 16, 2025
Two Weeks in Review: 12—23 January 2026

Two Weeks in Review: 12—23 January 2026

January 26, 2026
Border Patrol agents kill VA nurse during protest

Border Patrol agents kill VA nurse during protest

January 26, 2026
Burglary crew hit 3 more businesses this morning, bringing total to 11 this month, police say

Burglary crew hit 3 more businesses this morning, bringing total to 11 this month, police say

January 25, 2026
How Trump Has Reshaped the Justice Department and Other Criminal Justice Areas in His Second Term

How Trump Has Reshaped the Justice Department and Other Criminal Justice Areas in His Second Term

January 25, 2026
Why the US Army must focus on winning the first battle of the next war

Why the US Army must focus on winning the first battle of the next war

January 25, 2026
Internship Opportunity at Rashtriya Raksha University, Gandhinagar [Online; Multiple Roles]: Apply Now!

Internship Opportunity at Rashtriya Raksha University, Gandhinagar [Online; Multiple Roles]: Apply Now!

January 26, 2026
Law And Order News

Stay informed with Law and Order News, your go-to source for the latest updates and in-depth analysis on legal, law enforcement, and criminal justice topics. Join our engaged community of professionals and enthusiasts.

  • About Founder
  • About Us
  • Advertise With Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Law And Order News.
Law And Order News is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Law and Legal
  • Military and Defense
  • International Conflict
  • Crimes
  • Constitution
  • Cyber Crimes

Copyright © 2024 Law And Order News.
Law And Order News is not responsible for the content of external sites.