‘Vibe Coding’ my way into a RAG pipeline

LLMs and RAG

For most people who are up-to-date in tech, large language models (LLMs) are nothing new. In fact, they are downright pervasive. One of the largest challenges with LLMs today is that they are very good at coming up with plausible-sounding answers, but less good at telling us why or where they got that answer from. It’s often good practice to cast a critical eye at LLM output and determine whether or not it is real or just a hallucination. These concerns have led to suggestions that we can augment a user’s prompt with a set of reference documents that can be used to verify the output of the LLM, or to improve the prompt itself. For example, imagine we wrote a prompt asking an LLM about parking restrictions in New York State:

Q: Is it permitted to park between roadways on a highway according to New York State law? A: According to New York State law, parking between roadways on a highway is not permitted, except where official signs indicate otherwise. Specifically, Section 1200 of the New York Vehicle and Traffic Law states that parking is prohibited in areas such as medians, between divided roadways, and other restricted zones unless otherwise posted.

versus:

Q: Is it permitted to park between roadways on a highway according to New York State law? Review the following rules from the New York Vehicle and Traffic Law. Provide direct quotes to support your answer: [full text of Vehicle & Traffic Law Section 1202]

A: No, it is not permitted according to New York Vehicle and Traffic Law. The law explicitly states:

“No person shall stop, stand or park a vehicle… in the area between roadways of a divided highway, including crossovers, except in an emergency.”

This is found under Section 1(i) of the provided rules. The only exception is in the case of an emergency.

This is a bit of a contrived example, but the general idea is that we can improve the LLMs performance by providing relevant external information alongside the standard instructions. Including this extra information helps to guard against hallucinations, and also gives the user more guidance on why the LLM came to its specific answer. In the example above, the LLM is technically correct in the first answer that parking is not permitted - but I think it invents a rule about official signs allowing otherwise. This is probably speculation on the part of the LLM. In the prompt containing the full-text of the relevent set of rules, we obtain a shorter, cleaner response with the precise rule relevant to the question - including the citation to the exact section and subsection.

Retrieval-augmented generation

The example I’ve described above is essentially RAG or Retrieval-augmented generation. RAG is another one of those buzzwords that enters the conversation every time “AI” comes up. I wrote this blog post, in part, to demystify it for my readers (all 3 of your) and myself. When you boil down the workflow, RAG is essentially an additional step of retrieving relevant information from a database or document store before generating a response. Instead of relying solely on a model’s pre-trained knowledge, the RAG step pulls in context-specific data relevant to the question. This approach is particularly useful when dealing with rapidly changing information or domain-specific knowledge.

Coding Out a RAG Pipeline

In a recent blog post I walked through a step-by-step process of how to set up a A/B testing process for prompt refinement. I relied on data from a recent DrivenData competition that used youth suicide narrative reports from the CDC’s National Violent Death Reporting System. I was pretty happy with the workflow I built out, but couldn’t help but feel that I could improve it somehow. The NVDRS has nearly 300 page coding manual with detailed instructions for each variable type. For example, section 5.5.9 describes the rules of how to code a death that is the result of a gang-related crime:

5.5.9 Gang-related: CME/LE_GangRelated
Definition
Definitions for gang-related homicide can vary by law enforcement agency or CME and tend to capture
deaths that are classified as gang-motivated (i.e., the motive of the incident was gang-related) or had
suspected involvement of a gang member (i.e., a gang member was a suspect or victim in the incident).
This variable captures both types of gang-related deaths reported by agencies.

Response Options
- 0 No, Not available, Unknown
- 1 Yes, gang motivated
- 2 Yes, suspected gang member involvement
- 3 Yes, gang-related not otherwise specified
- 4 Organized crime including motorcycle gangs, mafia, and drug cartels

There are nearly a hundred such rules in this manual. However, it doesn’t really make sense to try and pass the entire manual into our prompt. Even if the context window could handle the full size, when you are paying for tokens it makes sense to minimize the prompt to only the necessary bits. So what if we just wanted to augment each prompt with only the references that are relevant for the narrative being coded? This is where RAG comes in!

Envisioning a RAG pipeline

The way I envisioned this working was to process the RAG step separately by first indexing the relevant sub-sections from section 5 of NVDRS coding manual, which contains all information about incident circumstances. I would extract out the subsection chunks and then index them in vector database for retreval at the time of prompt creation. My prompt creator class already adds the headers, instructions, and questions to the final prompt, so all we need to add is the additional step of then tacking on the relevant rules from the vector database (see below):

flowchart LR
    %% Improved node styling
    classDef input fill:#c4e3f3,stroke:#5bc0de,stroke-width:2px,color:#31708f
    classDef process fill:#d9edf7,stroke:#5bc0de,stroke-width:2px,color:#31708f
    classDef database fill:#dff0d8,stroke:#5cb85c,stroke-width:2px,color:#3c763d
    classDef output fill:#fcf8e3,stroke:#f0ad4e,stroke-width:2px,color:#8a6d3b
    
    %% Main components with better descriptions
    A["NVDRS Manual<br/>(Source Document)"] -->|"Reference material"| B
    B["RAG Model<br/>(Retrieval System)"] --> D
    C["Narrative Text<br/>(Case Information)"] -->|"Contains: '...victim felt depressed..'"| D
    C --> E
    
    %% Database and outputs
    D[("Vector Database<br/>(Knowledge Store)")] -->|"Retrieved: '5.3.4 Current depressed mood:'"| F
    E["Prompt Creator<br/>(Question Generator)"] -->|"Generates: Q1, Q2, Q3"| F
    
    %% Final output
    F["Final Prompt<br/>(For LLM Processing)"]
    
    %% Apply styles
    class A,C input
    class B,E process
    class D database
    class F output

In my mind, I figured I could come up with a quick and dirty solution by using regex to hit on key words in each narrative, and then use a semantic similarity model (like SentenceTransformers) to retrieve the top \(n\) rules from the vector database. For example, a narrative might have a section stating:

“Victim had been feeling depressed and sad in the days leading up to the incident”

Using regex we can grab the relevant words around our matched word (here, depressed), encode them, and then retrieve rules from the vector database. In the last step we append these to our prompt before executing it.

There’s just one problem - I’ve never done this before.

Vibe-Coding

What is “vibe coding”? One of my favorite definitions comes from ex-OpenAI founder Andrej Karpathy:

“There’s a new kind of coding I call”vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists”

In short, it represents a programmer’s full surrender to the LLM, and taking what it gives back on good faith. When problems arrive, you just dig deeper and let the LLM guide you even further down the rabbit hole, trusting the process. I think the term is very funny - but there is a bit of truth to this. “Vibe-coding” is sort of what I used to do early in grad school when I was trying to get some esoteric model running in R with virtually no background knowledge. To me, vibe-coding harkens back to the days of panicked copy-and-paste from a variety Stack Overflow posts.

With this in mind, I believe in sharing my work. Here’s the full conversation I used to set up the RAG framework. I had enough of an idea of what I wanted, but wanted to speed up the processing of generating code required to get the document chunking and indexing working.

Testing the RAG Process

So what did all that get us? Well, with the help of Claude we got a set of four functions that¹:

Extract the relevant pages from the coding manual.
Chunk up the pages into subsections based on headers.
Encode these chunks using a SentenceTransformers model.
Save the embedded chunks and the section indices in a vector database.

as well as two others:

A function to query and retrieve results from the vector database.
A function to append the results into a prompt-friendly text object.

I took the LLM-generated code and made some adjustments (maybe 10-15% or less) and then put them into their own .py file under my src folder. I then created a separate file index_rules.py to perform all the steps and locally store the vector database in a cache folder:

"Code to index rules from the NVDRS and store as vector store in cache"

from pypdf import PdfReader
from src.rag import (
    extract_pages,
    chunk_by_subsections_with_codes,
    encode_chunks,
    create_vector_store,
)

# import the full nvdrs coding manual
# we only need a subset of pages on circumstances
# page 74 - 149
page_min = 74
page_max = 148
cache_dir = "cache/"

reader = PdfReader("reference/nvdrsCodingManual.pdf")

# extract pages, chunk subsections, then store in cache

pages_circumstances = extract_pages(reader, page_min, page_max)
section_circumstances = chunk_by_subsections_with_codes(pages_circumstances)
section_embeddings = encode_chunks(section_circumstances)
index, stored_chunks = create_vector_store(section_embeddings, cache_dir)

With that done, the other adjustment I needed to make was to add the ability to query the vector database and return relevant coding rules based on matching key words in the narrative. What I did was set up a dict containing key words for each major question, and a query term to append to the retrieved text substring. So, for example, given a narrative like this:

“Victim was at home and complained about feeling sad and depressed. Victim had been treated for ADHD and bipolar disorder and had reportedly not been taking his medications in the days preceeding”

We would use a regex pattern to match 30 characters on either side of a matching keyword in the dict of keywords (one selected here below):

keyterms = {
    "DepressedMood": {
        "Terms": [
            "depressed",
            "depressive symptoms",
            "sad",
            "unhappy",
            "low mood",
            "feeling down",
            "persistent sadness",
            "major depression",
            "melancholy",
            "hopeless",
            "despair",
            "gloomy",
            "emotional distress",
            "tearful",
            "loss of interest",
            "worthlessness",
            "self-loathing",
        ],
        "Query": "Coding rules for DepressedMood",
    }

We then just loop through the dict of keywords and collect all the hits.

To illustrate: passing this example narrative into the search_vector_database performs the steps of searching for all regex hits, encodes the matching narrative text, and then queries it against the vector database. It then takes all of the results from the vector database search and passes these into another function that prepares it for insertion to the prompt. The create_prompt_rules function adds a header for the section for coding rules, and organizes them in order of section header. The code below shows a successful retreval for the DepressedMood variable:

test_narrative = "Victim was at home and complained about feeling sad and depressed. Victim had told his partner that he was thinking about taking his own life."

val, matched_variables = search_vector_database(test_narrative, 1, "cache/rules_index.faiss", "cache/rule_chunks.pkl")
PROMPT_RULES = create_prompt_rules(val, matched_variables)

print(PROMPT_RULES)


If present, use the following rules to guide your coding of variables. Closely follow these instructions:
    - Apply ONLY the rules relevant to the question
    - If a rule is not relevant to the question, disregard it entirely
    - Do NOT try and apply rules to questions where they are not closely relevant


## RULES FOR DepressedMood:
Evidence found: "and complained about feeling sad and depressed. Victim had tol"

RULE 1 [Section 5.3.4]:
5.3.4 Current depressed mood: CME/LE_DepressedMood 
 
Definition:  
Victim was perceived by self or others to be depressed at the time of the injury. 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion: 
Only code this variable when the victim had a depressed mood at the time of injury. There does NOT 
need to be a clinical diagnosis, and there does not need to be any indication that the depression directly 
contributed to the death. Other words that can trigger coding this variable besides “depressed” are sad, 
despondent, down, blue, low, unhappy, etc. Words that should not trigger coding this variable are 
agitated, angry, mad, anxious, overwrought, etc. 
 
 If the victim has a known clinical history of depression but had no depressive symptoms at the time 
of the incident, this variable should NOT be selected. 
 Depressed mood should not be inferred by the coder based on the circumstances (e.g., because the 
person reports a bankruptcy); rather it must be noted in the record. 
 
Manner of Death: All manners.

Adding it All Together

Now that I had the LLM stuff mostly incorporated, all I needed to do is append this new RAG workflow to my old LLM class. I added an extra parameter named include_rag that triggers the RAG process and appended it to the prompt if specified by the user:

def standard_prompt_caching(
        self,
        header: str | list = None,
        narrative: str | list = None,
        body: str | list = None,
        example_output: str | list = None,
        footer: str | list = None,
        include_rag: bool | list = False,
        **kwargs
    ) -> list:
        """Create multiple standard prompts based on all combinations of list elements.
        This puts the narrative at the end to support OpenAI prompt caching.
        """

        # Ensure all inputs are lists for consistent iteration
        if include_rag:
            val, matched_variables = search_vector_database(
                narrative,
                2,
                "cache/rules_index.faiss",
                "cache/rule_chunks.pkl",
            )
            rag = create_prompt_rules(val, matched_variables)
            params = [body, example_output, rag, footer, header, narrative]
        else:
            params = [body, example_output, footer, header, narrative]
        param_lists = [
            [item] if not isinstance(item, list) else item for item in params
        ]

The final result looks like this, which is structurally almost identical to the non-RAG version I did before. In fact, the only real change is adding include_rag = True to the LLM class parameters. This is all the code that processes the queries and passes them on to the OpenAI API:

Code

import pandas as pd
import json
from datetime import datetime

from openai import OpenAI
from src.prompts import HEADER1, BODY2, EXAMPLE_OUTPUT2
from src.prompt_creation import Prompt

client = OpenAI()
run_date = datetime.now().strftime("%Y-%m-%d")

# set up prompt
prompt_creator = Prompt()
ROLE = """You are a mental health expert reviewing law enforcement narratives of youth suicide incidents. 
Your task is to label variables relating to the incident. Closely review the following instructions. Read 
the provided narrative and then add labels corresponding to the variables into the described JSON format. 
Do NOT deviate from the instructions. If coding rules are present you may use them to guide your analysis. 
Do NOT rely solely on the rules.
"""

# load suicide narratives and labels
narratives = pd.read_csv("data/train_narratives_sample_200.csv")
labels = pd.read_csv("data/train_labels_sample_200.csv")


# Execute 1 version of prompt with RAG
json_list = []

for row in narratives.iterrows():

    # grab the unique id and text
    single_narrative = row[1]
    id = single_narrative["uid"]
    txt = single_narrative["NarrativeLE"] + single_narrative["NarrativeCME"]

    prompt_input = {
        "header": HEADER1,
        "narrative": txt,
        "body": BODY2,
        "example_output": EXAMPLE_OUTPUT2,
        "footer": None,
        "include_rag": True
    }

    # create a prompt, pass in the text narrative
    prompt_versions = prompt_creator.standard_prompt_caching(**prompt_input)

    version_num = 0
    for prompt in prompt_versions:
        # now append to list
        json_list.append(
            {
                "custom_id": f"{id}_{version_num}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": "gpt-4o-mini",
                    "messages": [
                        {"role": "system", "content": ROLE},
                        {"role": "user", "content": prompt},
                    ],
                    "max_tokens": 500,
                    "response_format": { "type": "json_object" },
                },
            }
        )
        version_num += 1


with open(f"json/output_{run_date}.jsonl", "w") as outfile:
    for entry in json_list:
        json.dump(entry, outfile)
        outfile.write("\n")

# upload batch to openai
batch_input_file = client.files.create(
    file=open(f"json/output_{run_date}.jsonl", "rb"), purpose="batch"
)

batch_input_file_id = batch_input_file.id
client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "Batch Testing 1 prompt x 200 examples with caching and RAG"},
)

Reviewing the output

Finally we can review the input and output from this. Here is the full un-edited prompt with the RAG element added. The RAG section is near the end of the prompt, just above full narrative text:

[Warning: Depictions of Self-Harm]

Full RAG Input Prompt


INSTRUCTIONS:
    Closely follow these instructions:
        - For each variable below return a 0 for 'no' and a 1 for 'yes' unless otherwise stated.
        - If more than two answers are available, return ONE of the numbered values.
        - Rely ONLY on information available in the narrative. Do NOT extrapolate.
        - Return a properly formatted json object where the keys are the variables and the values are the numeric labels.
        - Do NOT return anything other than the label. Do NOT include any discussion or commentary.

VARIABLES:   
    DepressedMood: The person was perceived to be depressed at the time
    - 1: Specific signs of depression were noted in the narrative (e.g., sad, withdrawn, hopeless)
    - 0: No mention of depressive symptoms

    MentalIllnessTreatmentCurrnt: Currently in treatment for a mental health or substance abuse problem
    - 1: The person was undergoing treatment for a mental health issue at the time of the incident
    - 0: No indication of current treatment

    HistoryMentalIllnessTreatmnt: History of ever being treated for a mental health or substance abuse problem
    - 1: Documentation of previous treatment for mental health or substance abuse is noted in the narrative
    - 0: No prior treatment mentioned

    SuicideAttemptHistory: History of attempting suicide previously
    - 1: A past suicide attempt was mentioned
    - 0: No documented prior attempts

    SuicideThoughtHistory: History of suicidal thoughts or plans
    - 1: Person's prior thoughts or plans of suicide were noted in the narrative
    - 0: No history of suicidal ideation mentioned

    SubstanceAbuseProblem: The person struggled with a substance abuse problem.
    - 1: Person was noted to have issues with alcohol or drug abuse
    - 0: No indication of substance abuse problems

    MentalHealthProblem: The person had a mental health condition at the time
    - 1: Person was noted to have a mental health condition at the time of the event
    - 0: No documented mental health condition

    DiagnosisAnxiety: The person had a medical diagnosis of anxiety
    - 1: Person was currently, or previously, been diagnosed or treated for anxiety
    - 0: No documented diagnosis or treatment for anxiety

    DiagnosisDepressionDysthymia: The person had a medical diagnosis of depression
    - 1: Person was currently, or previously, been diagnosed or treated for depression or dysthymia
    - 0: No documented diagnosis or treatment for depression

    DiagnosisBipolar: The person had a medical diagnosis of bipolar
    - 1: Person was currently, or previously, been diagnosed or treated for bipolar disorder
    - 0: No documented diagnosis or treatment for bipolar disorder

    DiagnosisAdhd: The person had a medical diagnosis of ADHD
    - 1: Person was documented as having been diagnosed or treated for ADHD
    - 0: Person had no documentation of treatment or diagnosis for ADHD

    IntimatePartnerProblem: Problems with a current or former intimate partner appear to have contributed
    - 1: Person's relationship issues with a spouse, partner, or ex-partner were mentioned as contributing factors
    - 0: No relationship issues were mentioned

    FamilyRelationship: Relationship problems with a family member (other than an intimate partner) appear to have contributed
    - 1: Person's conflicts with parents, siblings, children, or other family members contributed
    - 0: No family relationship problems mentioned

    Argument: An argument or conflict appears to have contributed
    - 1: A dispute, disagreement, or verbal altercation was mentioned as a contributing factor
    - 0: No argument or conflict mentioned

    SchoolProblem: Problems at or related to school appear to have contributed
    - 1: Issues such as academic struggles, bullying, or school disciplinary actions were noted
    - 0: No school-related problems mentioned

    RecentCriminalLegalProblem: Criminal legal problem(s) appear to have contributed
    - 1: The person was facing legal troubles such as arrest, charges, or sentencing
    - 0: No criminal legal issues mentioned

    SuicideNote: The person left a suicide note
    - 1: A written, digital, or verbal message was documented as a suicide note
    - 0: No mention of a suicide note

    SuicideIntentDisclosed: The person disclosed their thoughts and/or plans to die by suicide to someone else within the last month
    - 1: Suicide intent was communicated to another person within the last month
    - 0: No disclosure of intent mentioned

    DisclosedToIntimatePartner: Intent was disclosed to a previous or current intimate partner
    - 1: The person told a spouse or romantic partner about suicidal thoughts/plans
    - 0: No disclosure to an intimate partner

    DisclosedToOtherFamilyMember: Intent was disclosed to another family member
    - 1: The person told a parent, sibling, child, or other relative about suicidal thoughts/plans
    - 0: No disclosure to a family member

    DisclosedToFriend: Intent was disclosed to a friend
    - 1: The person told a friend about suicidal thoughts/plans
    - 0: No disclosure to a friend

    InjuryLocationType: The type of place where the suicide took place.
        - 1: House, apartment
        - 2: Motor vehicle (excluding school bus and public transportation)
        - 3: Natural area (e.g., field, river, beaches, woods)
        - 4: Park, playground, public use area
        - 5: Street/road, sidewalk, alley
        - 6: Other

    WeaponType1: Type of weapon used 
        - 1: Blunt instrument
        - 2: Drowning
        - 3: Fall
        - 4: Fire or burns
        - 5: Firearm
        - 6: Hanging, strangulation, suffocation
        - 7: Motor vehicle including buses, motorcycles
        - 8: Other transport vehicle, eg, trains, planes, boats
        - 9: Poisoning
        - 10: Sharp instrument
        - 11: Other (e.g. taser, electrocution, nail gun)
        - 12: Unknown


Here is an example narrative and expected output:

EXAMPLE NARRATIVE 1:
The V was a XX XX XX XX who died of an intentional mixed drug (fentanyl, sertraline, and amphetamine) intoxication. The V had been court ordered to admit to a addiction recovery center, and he was admitted two days ago.  He was last seen alive yesterday during room checks. He was in his room with two others. The V was found this morning unresponsive and CPR was instituted. EMS arrived and confirmed death. The V had a significant past medical history of anxiety, depression, cleft palate repair, PTSD and asthma. He reportedly had been very depressed lately and had expressed suicidal ideations including that he was going to "take lots of pills." Per grandmother and mother, the V was known to take Percocet and Adderall.

EXAMPLE OUTPUT 1:
{
    "DepressedMood": 1,
    "MentalIllnessTreatmentCurrnt": 1,
    "HistoryMentalIllnessTreatmnt": 1,
    "SuicideAttemptHistory": 0,
    "SuicideThoughtHistory": 1,
    "SubstanceAbuseProblem": 1,
    "MentalHealthProblem": 1,
    "DiagnosisAnxiety": 1,
    "DiagnosisDepressionDysthymia": 1,
    "DiagnosisBipolar": 0,
    "DiagnosisAdhd": 0,
    "IntimatePartnerProblem": 0,
    "FamilyRelationship": 0,
    "Argument": 0,
    "SchoolProblem": 0,
    "RecentCriminalLegalProblem": 1,
    "SuicideNote": 0,
    "SuicideIntentDisclosed": 1,
    "DisclosedToIntimatePartner": 0,
    "DisclosedToOtherFamilyMember": 0,
    "DisclosedToFriend": 0,
    "InjuryLocationType": 6,
    "WeaponType1": 9
}

Here is another example narrative and expected output:

EXAMPLE NARRATIVE 2:
Victim XX died of a self-intentional gunshot wound to the head, resulting in an exit wound, with a .357 caliber revolver at the victim's place of residence. The victim's ex-wife had attempted to contact the victim and had told dispatchers the victim was depressed and had thoughts of suicide. The victim left a suicide for his parents and ex-wife. Per the ex-wife, he was extremely depressed and the last text she received from him was telling her goodbye. The victim had never gotten over their divorce and struggled with depression and alcohol abuse. EMS was present and confirmed the victim deceased.

EXAMPLE OUTPUT 2:
{
    "DepressedMood": 1,
    "MentalIllnessTreatmentCurrnt": 0,
    "HistoryMentalIllnessTreatmnt": 0,
    "SuicideAttemptHistory": 1,
    "SuicideThoughtHistory": 0,
    "SubstanceAbuseProblem": 1,
    "MentalHealthProblem": 0,
    "DiagnosisAnxiety": 0,
    "DiagnosisDepressionDysthymia": 0,
    "DiagnosisBipolar": 0,
    "DiagnosisAdhd": 0,
    "IntimatePartnerProblem": 1,
    "FamilyRelationship": 0,
    "Argument": 0,
    "SchoolProblem": 0,
    "RecentCriminalLegalProblem": 0,
    "SuicideNote": 1,
    "SuicideIntentDisclosed": 1,
    "DisclosedToIntimatePartner": 1,
    "DisclosedToOtherFamilyMember": 0,
    "DisclosedToFriend": 0,
    "InjuryLocationType": 1,
    "WeaponType1": 5
}

Here is another example narrative and expected output:

EXAMPLE NARRATIVE 3:
This is the death of a XX XX (V). LE was dispatched at 0806 hours in reference to the V who was shot in the head. Upon arrival, the V was in an apartment with 2 other women present and a firearm. The V had a gunshot wound to the right side of the head and no exit was noted. Medics arrived and took over life saving efforts. Per the V's fiance the V suffered form anxiety and depression. The V's mood had been changing all day as he was approaching the 1 year marker of his high school friend dying by suicide. The V had been drinking throughout the day and at 2000 hours hours the V went to the park. The V was not allowed to have firearms. Prior to shooting himself the V said "you don't think I'll do it." The V was transported to the hospital where he died. The firearm used was .22 caliber and a note was found that appeared to have been written by a child that read "I love mom and dad." There were several gummy bears rolled up in a plastic bag with #9 written on it and a small zip lock bag with pink powder inside. There were several bottles of alcohol, THC oil and cigarette butts in the bathroom. Last year the V was committed for treatment of mental disorder.

EXAMPLE OUTPUT 3:
{
    "DepressedMood": 0,
    "MentalIllnessTreatmentCurrnt": 0,
    "HistoryMentalIllnessTreatmnt": 1,
    "SuicideAttemptHistory": 0,
    "SuicideThoughtHistory": 1,
    "SubstanceAbuseProblem": 1,
    "MentalHealthProblem": 1,
    "DiagnosisAnxiety": 1,
    "DiagnosisDepressionDysthymia": 1,
    "DiagnosisBipolar": 0,
    "DiagnosisAdhd": 0,
    "IntimatePartnerProblem": 0,
    "FamilyRelationship": 0,
    "Argument": 0,
    "SchoolProblem": 0,
    "RecentCriminalLegalProblem": 0,
    "SuicideNote": 1,
    "SuicideIntentDisclosed": 1,
    "DisclosedToIntimatePartner": 1,
    "DisclosedToOtherFamilyMember": 0,
    "DisclosedToFriend": 0,
    "InjuryLocationType": 1,
    "WeaponType1": 5 
}



If present, use the following rules to guide your coding of variables. Closely follow these instructions:
    - Apply ONLY the rules relevant to the question
    - If a rule is not relevant to the question, disregard it entirely
    - Do NOT try and apply rules to questions where they are not closely relevant


## RULES FOR SuicideNote:
Evidence found: "scovered, it appeared to be a suicide note. The v was located hanging fr", "home hanging from a tree.  A suicide note was found in the home.  The m"

RULE 1 [Section 5.7.6]:
5.7.6 Left a suicide note: CME/LE_SuicideNote 
 
Definition: 
Victim left a suicide note (or other recorded communication). Note can be written or electronic. 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion: 
 A will or folder of financial papers near the victim does not constitute a suicide note. 
 
Section 5. Circumstances  131 | Page  
   
 If the record states the person left a note, you can infer it was a suicide note in the absence of 
information indicating that the note had some other purpose. 
 A suicide note can be any essentially durable message; it does not have to be on a piece of paper. 
Emails, text messages, voice mail, or writing on any object (such as a wall or table) all qualify. Phone 
calls do not qualify, as they are not considered durable (even 911 calls, because the decedent may 
or may not have known they were being recorded). 
 A suicide note does not have to communicate that the person wants or intends to die. Notes that 
are written to warn others at the scene of the fatal injury that conditions may be hazardous or 
disturbing may be included (e.g., “Call 911, do not enter garage” or “Carbon monoxide – do not 
enter”).  
 A text or electronic message sent right before the suicide occurred should be labeled a suicide note, 
if there was no time between the sending/receipt of the message and the suicide. If there was time 
to intervene, this should be coded as “disclosed suicidal thought or intent.” If the contents of the 
suicide note are included or described in your source documents, summarizing this information in 
the narrative is helpful to provide further context for the incident. If you choose to do so, do not 
include verbatim text and do not include identifying information, but a summary of what was said. 
This information may be helpful in noting themes related to the suicide that go beyond standard 
NVDRS circumstances and are one of the only means of accessing the decedents’ thoughts about 
their suicide more directly. 
 
Manner of Death: Suicide and undetermined deaths. 
 
 


RULE 2 [Section 5.7.4]:
5.7.4 Recent Disclosed Suicidal Thoughts or Intent to Die by Suicide: 
CME/LE_SuicideIntentDisclosed 
 
Definition: Victim disclosed to another person their thoughts and/or plans to die by suicide within the 
last month. Disclosure of suicidal thoughts or plan can be verbal, written or electronic. 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion:  
This variable helps identifies suicides for which opportunities to intervene and prevent the death may 
have been present near the time of the suicide. It is also useful for exploring the association between 
stated intent and actual death. 
 
 Code as “Yes” if the victim had disclosed suicidal thoughts or plans to another person recently or 
within the last month, whether explicitly (e.g., “I have been thinking about suicide lately” or “I plan 
to go to my cabin with my gun and never come back”) or indirectly (e.g., “I know how to put a 
permanent end to this pain”). Include in the narrative any available details about whom the intent 
was disclosed to, how long before the death the intent was disclosed, and what was said during the 
disclosure. 
 Code as “Yes” if there was opportunity to intervene between the time the person disclosed intent 
and the injury event. 
o Do not code this variable if the victim disclosed the intention to kill him or herself only at the 
moment of the suicide (i.e., when there was no opportunity to intervene to stop the 
suicide). For instance, sending an email or text message right before the victim shot herself. 
This would be considered a suicide note. 
 Do not endorse this variable if the victim had talked about suicide sometime in the distant past but 
had not recently disclosed a current intent to die by suicide to anyone. This would be coded as 
“History of disclosed suicidal thoughts/plans”. 
 A separate suicide attempt by the victim within a month of the suicide should be coded as “Yes.” In 
this case, “History of suicide attempts” should also be coded, “Yes.” 
 The timing of when the victim disclosed the suicidal intent may be unclear (e.g., recently or some 
time ago) or not mentioned. Please use the following rules to code these cases: 
o Code “Yes” if the narrative states the victim “just” or “recently” told someone about his 
suicidal intent. 
o If the record indicates disclosure of intent, but is unclear about the timeframe (i.e., does not 
mention it all), code as “Yes.” 
o LE or C/ME reports may be unclear about timing of the disclosure. If the record indicates 
disclosure of intent in the past, but states that there was no disclosure for the current 
incident, do not code, instead use the “History of suicidal thoughts/plan/attempts.” 
o If the victim disclosed suicidal intent “a long time ago,” “more than a month ago,” or in the 
 
Section 5. Circumstances  130 | Page  
   
“past,” code as “No” and code “History of disclosed suicidal thought/plans/actions.” 
 
Manner of Death: Suicide and undetermined deaths.  
 
 




## RULES FOR DiagnosisAnxiety:
Evidence found: "ffered from depression, ADHD, PTSD, school problems and legal pr", "an undiagnosed and untreated PTSD.  The decedent had witnessed"

RULE 1 [Section 5.3.3]:
5.3.3 Mental Health Diagnosis Variables 
5.3.3.1. Mental health diagnosis 1: CME/LE_MentalHealthDiagnosis1 
5.3.3.2. Mental health diagnosis 2: CME/LE_MentalHealthDiagnosis2 
5.3.3.3. Other mental health diagnosis: CME/LE_MentalHealthDiagnosisOther 
 
Definitions: 
 CME/LE_MentalHealthDiagnosis1/2: Type of mental illness diagnosis 
 CME/LE_MentalHealthDiagnosisOther: Other type of mental illness 
 
Response Options: 
 CME/LE_MentalHealthDiagnosis1/2  
1 Depression/dysthymia 
2 Bipolar disorder 
3 Schizophrenia 
4 Anxiety disorder 
5 Post-traumatic stress disorder 
6 Attention Deficit/Hyperactivity Disorder (ADHD) 
7 Eating disorder 
8 Obsessive- compulsive disorder 
9 Autism Spectrum (including Asperger’s Syndrome) 
10 Fetal Alcohol Syndrome 
11 Down Syndrome 
12 Dementia (e.g., Alzheimer’s disease, Lewy Body Dementia) 
66 Other (specify in diagnosis text), including personality disorders, etc. 
88 Not applicable 
99 Unknown 
 
 CME/LE_MentalHealthDiagnosisOther  
Text 
 
Discussion:  
This variable indicates the nature of the victim’s mental health problem (the diagnosis), if available. 
 
 Code up to two diagnoses and then write in additional diagnoses (i.e., three or more diagnoses) in 
the “MentalHealthDiagnosisOther” field. When using the “MentalHealthDiagnosisOther” field, 
please separate diagnoses with a comma (e.g., antisocial personality disorder, narcissistic 
personality disorder). 
 For cases in which the victim was noted as being treated for a mental health problem, but the actual 
diagnosis is not documented, code “MentalHealthDiagnosis1” as “Unknown.” 
 
Section 5. Circumstances  82 | Page  
   
 If a diagnosis is not on the code list, code “Other” and record the diagnosis in the text field, 
“MentalHealthDiagnosisOther.” 
 Do not attempt to infer a diagnosis based on reading the symptoms. 
 While it is acceptable to endorse “Mental health problem” based on the victim’s prescription for a 
psychiatric medication, please do not infer or code a specific mental health diagnosis based on the 
medication. 
 Please note that bipolar disorder may be referred to as “manic depression” or similar terms (e.g., 
“manic depressive”) in source documents. While these are outdated terms, please code these cases 
as “2 – Bipolar Disorder.” 
 Obsessive compulsive disorder may be referred to as “OCD” in source documents. Please code these 
cases as “8 – Obsessive Compulsive Disorder.” 
 Post-traumatic stress disorder may be referred to as “PTSD” in source documents. Please code these 
cases as “5 – Post-traumatic Stress Disorder.”  
 
Manner of Death: All manners.  
 
 


RULE 2 [Section 5.3.1]:
5.3.1 Current diagnosed mental health problem: CME/LE_MentalHealthProblem 
 
Definition:  
Current mental health problem 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion: 
Code a victim as “Yes” for “C/ME/LE_MentalHealthProblem” if he or she has been identified as currently 
having a mental health problem. There does not need to be any indication that the mental health 
condition directly contributed to the death. 
 
 Mental health problems include those disorders and syndromes listed in the Diagnostic and 
Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) with the exception of alcohol and other 
substance dependence (as these are captured in separate variables). 
 Examples of disorders qualifying as mental health problems include diagnoses such as major 
depression, schizophrenia, and generalized anxiety disorder, as well as neurodevelopmental 
disorders (such as intellectual disability, autism, attention-deficit /hyperactivity disorder), eating 
disorders, personality disorders, and organic mental disorders (such as Alzheimer’s and other 
dementias). 
 Also indicate “Yes” if it is mentioned in the source document that the victim was being treated for a 
mental health problem, even if the nature of the problem is unclear (e.g., “was being treated for 
various psychiatric problems”). 
 It is acceptable to endorse this variable on the basis of past treatment of a mental health problem, 
unless it is specifically noted that the past problem has been resolved. However, do not code this 
circumstance based only on a positive toxicology test for psychiatric medications (such as 
antidepressants). There must also be some indication that the victim was actually being treated for a 
mental health condition, such as a current prescription, the report of a family member, etc. 
  
Also code: At least one Mental Health Diagnosis variable should also be coded if this is coded. If the type 
of mental health diagnosis is unknown, please code “Type of first mental illness diagnosed” as “99 –
Unknown.”  
 
Manner of Death: All manners. 
  
 




## RULES FOR DiagnosisAdhd:
Evidence found: "he suffered from depression, ADHD, PTSD, school problems and le"

RULE 1 [Section 5.3.3]:
5.3.3 Mental Health Diagnosis Variables 
5.3.3.1. Mental health diagnosis 1: CME/LE_MentalHealthDiagnosis1 
5.3.3.2. Mental health diagnosis 2: CME/LE_MentalHealthDiagnosis2 
5.3.3.3. Other mental health diagnosis: CME/LE_MentalHealthDiagnosisOther 
 
Definitions: 
 CME/LE_MentalHealthDiagnosis1/2: Type of mental illness diagnosis 
 CME/LE_MentalHealthDiagnosisOther: Other type of mental illness 
 
Response Options: 
 CME/LE_MentalHealthDiagnosis1/2  
1 Depression/dysthymia 
2 Bipolar disorder 
3 Schizophrenia 
4 Anxiety disorder 
5 Post-traumatic stress disorder 
6 Attention Deficit/Hyperactivity Disorder (ADHD) 
7 Eating disorder 
8 Obsessive- compulsive disorder 
9 Autism Spectrum (including Asperger’s Syndrome) 
10 Fetal Alcohol Syndrome 
11 Down Syndrome 
12 Dementia (e.g., Alzheimer’s disease, Lewy Body Dementia) 
66 Other (specify in diagnosis text), including personality disorders, etc. 
88 Not applicable 
99 Unknown 
 
 CME/LE_MentalHealthDiagnosisOther  
Text 
 
Discussion:  
This variable indicates the nature of the victim’s mental health problem (the diagnosis), if available. 
 
 Code up to two diagnoses and then write in additional diagnoses (i.e., three or more diagnoses) in 
the “MentalHealthDiagnosisOther” field. When using the “MentalHealthDiagnosisOther” field, 
please separate diagnoses with a comma (e.g., antisocial personality disorder, narcissistic 
personality disorder). 
 For cases in which the victim was noted as being treated for a mental health problem, but the actual 
diagnosis is not documented, code “MentalHealthDiagnosis1” as “Unknown.” 
 
Section 5. Circumstances  82 | Page  
   
 If a diagnosis is not on the code list, code “Other” and record the diagnosis in the text field, 
“MentalHealthDiagnosisOther.” 
 Do not attempt to infer a diagnosis based on reading the symptoms. 
 While it is acceptable to endorse “Mental health problem” based on the victim’s prescription for a 
psychiatric medication, please do not infer or code a specific mental health diagnosis based on the 
medication. 
 Please note that bipolar disorder may be referred to as “manic depression” or similar terms (e.g., 
“manic depressive”) in source documents. While these are outdated terms, please code these cases 
as “2 – Bipolar Disorder.” 
 Obsessive compulsive disorder may be referred to as “OCD” in source documents. Please code these 
cases as “8 – Obsessive Compulsive Disorder.” 
 Post-traumatic stress disorder may be referred to as “PTSD” in source documents. Please code these 
cases as “5 – Post-traumatic Stress Disorder.”  
 
Manner of Death: All manners.  
 
 


RULE 2 [Section 5.1.2]:
5.1.2 Coding Mental Health, Alcohol and Substance Abuse Problems 
Abstractors should code circumstances related to mental health, alcohol or substance abuse problems, 
or other addictions as “Yes” if any of these problems are indicated in investigation reports. A direct link 
to the death is not required. These circumstances are coded for all victims. 
 
 




## RULES FOR DepressedMood:
Evidence found: "h his girlfriend and was very depressed the evening before, the V's r"

RULE 1 [Section 5.3.4]:
5.3.4 Current depressed mood: CME/LE_DepressedMood 
 
Definition:  
Victim was perceived by self or others to be depressed at the time of the injury. 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion: 
Only code this variable when the victim had a depressed mood at the time of injury. There does NOT 
need to be a clinical diagnosis, and there does not need to be any indication that the depression directly 
contributed to the death. Other words that can trigger coding this variable besides “depressed” are sad, 
despondent, down, blue, low, unhappy, etc. Words that should not trigger coding this variable are 
agitated, angry, mad, anxious, overwrought, etc. 
 
 If the victim has a known clinical history of depression but had no depressive symptoms at the time 
of the incident, this variable should NOT be selected. 
 Depressed mood should not be inferred by the coder based on the circumstances (e.g., because the 
person reports a bankruptcy); rather it must be noted in the record. 
 
Manner of Death: All manners.  
 
 


RULE 2 [Section 5.7.1]:
5.7.1 History of suicidal thoughts or plans: CME/LE_SuicideThoughtHistory 
 
Definition:  
Victim had a history of suicidal thoughts or plans. Disclosure of suicidal thoughts or plan can be verbal, 
written or electronic. 
 
Response Options: 
0 No, Not Available, Unknown 
1 Yes 
 
Discussion:  
Use this code for victims who have at any time in their life expressed suicidal thoughts or plans. The 
victim may or may not have disclosed suicidal thoughts and/or plans close to the time of the suicide. 
 
 Suicidal ideation can be expressed directly (e.g., “I am thinking of killing myself”) or indirectly (e.g., “I 
don’t know if I want to go on living”). 
 When the timing is unclear (e.g., timing not mentioned) or if the suicidal thoughts were described as 
occurring in the “past,” “a few years ago,” or “just,” history of suicidal thoughts should be “Yes.” 
 Also code “Recently disclosed suicidal thoughts/plans” in addition to this item, if the victim disclosed 
suicidal thoughts and/or plans close to the time (within one month) of the suicide.”   
 Code “No” for previous suicide attempts if the victim did not disclose suicidal thoughts or plans. 
Previous suicide attempts should be coded as “History of Suicide Attempts.” 
 
Note: This variable was added in August 2013.  
 
Manner of Death: Suicide and undetermined deaths.  
 
 



Carefully read the following law enforcement narrative:
The V, an XX XX, was discovered by police hanging in a wooded area, 911 was called. EMS arrived and declared the V dead at the scene. The V had recently broken up with his girlfriend and was very depressed the evening before, the V's roommate was concerned the V was suicidal. The ex-girlfriend stated she was speaking with the V on the phone and he made suicidal statements to her, while talking to the V the ex-girlfriend heard a loud noise in the background and the phone went dead, she then contacted 911. Police were contacted and they made a welfare check and was unable to reach the V at his residence. While speaking to the V's roommate the police started to search on a wooded path behind his residence. A handwritten note was discovered, it appeared to be a suicide note. The v was located hanging from one of the trees just off path.The V, a XX XX, died from asphyxia due to hanging.  The decedent was found during a welfare check requested by his ex-girlfriend.  The ex-girlfriend reported he had called her and told her "he was going to kill himself" and then she heard a loud noise.  She arrived at his home and his gun was missing so she assumed he shot himself.  He was found in the woods behind his home hanging from a tree.  A suicide note was found in the home.  The mother reported he suffered from depression, ADHD, PTSD, school problems and legal problems.  The mother reported an undiagnosed and untreated PTSD.  The decedent had witnessed his friend shot in front of him.  Since then he had "gone down the wrong path".  He struggled with alcohol and drugs.  The mother reported he was trying to turn his life around after the birth of his child, but was denied visitation rights due to his legal troubles.

And the output:

JSON Output

{
    "DepressedMood": 1,
    "MentalIllnessTreatmentCurrnt": 0,
    "HistoryMentalIllnessTreatmnt": 1,
    "SuicideAttemptHistory": 0,
    "SuicideThoughtHistory": 1,
    "SubstanceAbuseProblem": 1,
    "MentalHealthProblem": 1,
    "DiagnosisAnxiety": 0,
    "DiagnosisDepressionDysthymia": 1,
    "DiagnosisBipolar": 0,
    "DiagnosisAdhd": 1,
    "IntimatePartnerProblem": 1,
    "FamilyRelationship": 0,
    "Argument": 0,
    "SchoolProblem": 0,
    "RecentCriminalLegalProblem": 1,
    "SuicideNote": 1,
    "SuicideIntentDisclosed": 1,
    "DisclosedToIntimatePartner": 1,
    "DisclosedToOtherFamilyMember": 0,
    "DisclosedToFriend": 0,
    "InjuryLocationType": 3,
    "WeaponType1": 6
}

I didn’t ask for any reasoning or citations for the rules provided. Here I just wanted the raw JSON output - but it would be trivial to add another section and ask for specific citations when the rules provided were applied. Looking at the retrieval part I am actually quite surprised how well it was able to pull the relevant rules based on keyword hits.

My Take

To be honest, I was surprised that most of the Claude-generated code worked as well as it did. I had to make few substantial changes. The code to do the chunking, embedding, and indexing took maybe under 30 minutes for me to read through, edit slightly, and execute. Adding these functions into my existing workflow took under and hour - so maybe 90 minutes total from prompt to working RAG proof-of-concept. To be honest, totally wild. I could have figured this out on my own, but this was like half a day of work, compared to the week it would take me to do it solely by scratch.

Here’s the rub - I think “vibe coding” can be helpful to jump start a project from nothing to a workable proof-of-concept. I do NOT think it is a good idea to rely 100% on AI-generated code without knowing what it is actually doing. Personally, I think its a good idea to get a working POC and then pick apart all of the functions and steps the LLM generated to understand whats happening. Blind coding can certainly be a “vibe” but I don’t think it is sustainable for real-world production-level code.

Footnotes

If you are curious about the code, you can look at my prompt-testing repo under my blog posts that contains the full set of classes and functions.↩︎