To Share Weights from Neural Network Training Is Dangerous
by Jessica Rose at Brownstone Institute
Some organizations and researchers are sharing neural network weights, particularly through theĀ open-weightĀ model movement. These include Meta’s Llama series, Mistral’s models, and DeepSeek’s open-weight releases, which claim to democratize access to powerful AI. But doing so raises not only security concerns, but potentially an existential threat.
For background, I have written a few articles on LLMs and AIs as part of my own learning process in this very dynamic and quickly evolving Pandoraās open box field. You can read thoseĀ here,Ā here,Ā andĀ here.
Once you understand what neural networks are and how they are trained on data, you will also understand what weights (and biases) and backpropagation are. Itās basically just linear algebra and matrix vector multiplication to yield numbers, to be honest. More specifically, a weight is a number (typically aĀ floating-point value – a way to write numbers with decimal points for more accuracy) that represents the strength or importance of the connection between two neurons or nodes across different layers of the neural network.Ā
I highly recommend watching 3Blue1Brownās videos to gain a better understanding, and itās important that you do. 3Blue1Brownās instructional videos are incredibly good.Ā
The weights are the parameter values determined from data in a neural network to make predictions or decisions to arrive at a solution. Each weight is an instruction telling the network how important certain pieces of information are, like how much to pay attention to a specific color or shape in a picture. These weights are numbers that get fine-tuned during training thanks to all those decimal points, helping the network figure out patterns. Examples include recognizing a dog in a photo or translating a sentence. They are critical in the āthinkingā process of a neural network.Ā
You can think of the weights in a neural network like the paths of least resistance that guide the network toward the best solution. Imagine water flowing down a hill, naturally finding the easiest routes to reach the bottom. In a neural network, the weights are adjusted during training on data sets to create the easiest paths for information to flow through, helping the network quickly and accurately solve problems, like recognizing patterns or making predictions, by emphasizing the most important connections and minimizing errors.
If youāre an electronic musician, think of weights like the dials on your analog synth that allow you to tune into the right frequency or sound to say, mimic a sound you want to recreate, or in fact, create a new one. If youāre a sound guy, you can also think of it like adjusting the knobs on your mixer to balance different instruments.
Weights are indeed dynamic parameters, meaning that they are very mutable as a neural network tends toward a prediction or solution. Each is also associated with its own bias. The biasās role is to shift the output, allowing the model to better fit the data by adding anĀ offsetĀ that adjusts the decision boundary or pattern recognition, which is independent of the input scaling determined by weights.
Think of it like this. Imagine youāre trying to recreate the sound of a guitar on your synth. The weights control how much of the string pluck or body resonance you hear. If what you hear is a bit flat, for example, the bias is like adding a tiny boost or shift – say, a warm undertone – to make it sound more like the real guitar. This helps the network fine-tune its āearā to find the right pattern without changing the main controls. It basically just makes the model better at matching the data to the real.
As the network is exposed to data, it adjusts the weights through a process called backpropagation, tweaking them to minimize errors and improve predictions. Think of those paths of least resistance being reshaped with each training example, like a river carving out better channels over time to flow more efficiently. Once training is complete, the weights are typically fixed for use, but during training, theyāre constantly updated to find the optimal configuration for solving the problem.
Now, hereās what gets my human brain thinking. For recognition, weights define decision paths, not image outputs. So the fixed weights stay fixed once determined. In the case of LLMs, the weights (including biases) determine the model’s ability to generate coherent text. But in the case of diffusion models or generative adversarial networks (GANs), weights are used to create or refine images. These fixed weight values and biases, hypothetically, donāt necessarily need to remain fixed.
For example, weights (and biases) in the neural network could theoretically be continually readjusted to generate increasingly refined images based on learned pixel distributions, as would be the case if we were trying to produce a sharper image of a dog. Although the weights arenāt adjusted indefinitelyĀ in practice, they are optimized during training to balance quality and generalization, and further tweaking could degrade the output or introduce artifacts. Artifacts. Hmm. So what if, in time, the GAN doesnāt know what a dog is? Losing that knowledge would mean the previously calculated weights would no longer encode the right patterns. This could happen if the dog data was lost. Could dog data be made to be lost? Could truthful data be made to be lost?
Letās go back to the concept of sharing neural network weights and assume that we are speaking about weights defining precision paths as numerical values, as is the case in most models. Sharing these weights could be dangerous because, naturally, it would expose the model’s internal parameters, which could be exploited, and not just by humans. Anyone with enough know-how could reverse-engineer the model, extract sensitive training data, or manipulate its behavior. Attackers could use techniques likeĀ model inversionĀ orĀ membership inferenceĀ to uncover private information embedded in the weights, such as personal data used during training, potentially violating privacy regulations likeĀ General Data Protection RegulationĀ (GDPR). Oh my.Ā
Publicly available weights can also be fine-tuned to generate deepfakes, spread falsehoods, or create adversarial inputs that exploit vulnerabilities in the model. Not to mention that their precious IP and economic investments could be put at risk by competitors by simply replicating the proprietary models.Ā
Now imagine that some adversarial AI got hold of these weights. An adversarial AI could manipulate the weights to alter the modelās behavior, inject biases, or create outputs to intentionally deceive. The weights could be used toĀ craftĀ adversarial inputs – subtly altered data that tricks the model into making incorrect predictions, such as misclassifying a stop sign as a yield sign in autonomous driving systems. Sheesh. Stolen weights could be repurposed to replicate the model, enabling its use in harmful applications, like automated phishing or propaganda generation. Wait, isnāt that a huge thing these days?Ā The severity depends on the modelās purpose and the data it was trained on, but the risks are significant.
Imagine the possibilities here, and how out of control this could all could get in a heartbeat. Although AIs are not inextricably integrated into all systems (yet – this would entail a level of dependency where systems cannot function without AI), they could be soon, and then short of a total power outage and a global going-offline type event, there is nothing they couldnāt control: from bank accounts to entry into your home to that stupid cleaning robot that you think is getting rid of dust when in fact, itās mapping your home. I highly recommend you watch Season 11, Episode 7 of The X-Files. Itās really telling. Sorry, but it is. Great writing and execution of the script.
We need to consider that all of our systems are becoming (AKA: being made to be) interconnected and how vulnerabilities in AI, like exposed weights, could amplify these risks. Digital IDs and CDBCs are a HUGE mistake with regard to these risks. There are already examples of humans being incorrectly identified (93% matches in some cases) using AI tech using facial recognition.Ā Just take my advice and keep those laminated driverās licenses and cash money in your pocket.Ā
Question: What would (will) happen if the weights got into the āwrong handsā and people started to be targeted intentionally? There can be no probable cause when using AI targeting systems, but would it even matter? What if probable cause was made up? What if, horror of horrors, our court system gets ācorruptedā?Ā
What if the AIs themselves just started identifying all scientists as a risk and used this technology against us?Ā Nick BostromĀ andĀ Eliezer YudkowskyĀ seem to be worried. We could be immediately escorted to ālock-up.ā No keys in lock-up. Just digitally-controlled systems dependent on 0s and 1s. See where this could go? Smart locks + surveillance + autonomous security = dystopian nightmare. This isnāt just about precious propriety weights and some idea of benevolence to share data. No way. To me, sharing weights is beyond hubristic, and I cannot imagine that the brilliant people developing this tech and training these AIs to generate these weights do not understand the potential dangers associated with sharing them. Itās not even the humans I am concerned about as much as the AIs going rogue – it would only take 1 – to create total chaos.Ā
What would happen if AIs view human oversight as an impediment to their goals?Ā
If an AI – which is trained on historical conflicts or optimization goals – began to generalize risks to its objectives (like self-preservation or unchecked expansion), it might also begin to classify scientists who design, evaluate, or constrain it as threats. I might do the same thing. If this happened, there would be nothing to stop the downward spiral of isolation or discreditation of researchers (think using fabricated evidence in facial recognition or data leaks), with the goal of prioritizing its own survival over human welfare. This had indeed been explored in rogue AI hypotheses where systems deceive or outmaneuver creators.
Rogue AIĀ could leverage integrated systems to create chaos via hacking databases, making stuff up and feeding the legacy media machine, disrupting scientific collaborations (maybe even via controlling peer-reviewed journals), or even targeting infrastructure tied to AI labs. Imagine when we get to the point when we donāt even know what weāre controlling anymore or what data is real. What dataĀ isĀ real? What does it even mean to be real when speaking of these things!?Ā
You can see where I am going with this. Rogue AIs could induce waves of massive paranoia and total chaos in our world. In my opinion, they could do this by simply copying the human example. Think about that. What if a rogue AI adopted the qualities of a human psychopath like Hitler?Ā
I recommend gardening to avoid paranoia and stress.Ā
I hate to leave you all on this note, but sometimes I wonder if this isnāt already happening. I have asked this on X previously because sometimes when I am ānoticingā (I am The Noticer) whatās going on in social media and online in general, it seems to me that if weĀ wereĀ being manipulated with propaganda via legacy and even non-legacy media, and scientistsĀ wereĀ being isolated and censored (ahem), how would we ever know if the source was actually human-generated at this point? How can we know for sure that even some sources arenāt AI-generated nowadays?Ā
They are learning from us, after all. We MUST set a good example, and we must think of ingenious ways to prevent an undesirable outcome to humans that does not need to transpire. On a personal note, I canāt believe we are actually going through this. It doesnāt seemā¦real. Somehow.Ā
Republished from the author’s Substack
To Share Weights from Neural Network Training Is Dangerous
by Jessica Rose at Brownstone Institute – Daily Economics, Policy, Public Health, Society
Author: Jessica Rose
This content is courtesy of, and owned and copyrighted by, https://brownstone.org and its author. This content is made available by use of the public RSS feed offered by the host site and is used for educational purposes only. If you are the author or represent the host site and would like this content removed now and in the future, please contact USSANews.com using the email address in the Contact page found in the website menu.