What Makes a DNN-based AI a Black Box?

Can we trust black box AI? [part 2]

Published on October 26, 2025 by SG

TLDR
What mainly causes opacity in DNNs is how they 'learn' to represent data during the training stage:

Implicit representation: transformation is needed to make representations meaningful + no way of decoding them + large scale of the model.
Extensional representation: features are learned not by means of a definition ("A bird has wings, a beak, and feathers.") but as a list of instantiations (image 1 depicts a bird, image 2 doesn't, image 3 depicts a bird).
Distributed representation: 1 feature is represented by multiple nodes; 1 node is involved in the representation of multiple features.

Other DNN features that further exacerbate the black box problem:

DNNs solve under-determined problems -> multiple solutions (in form of 'learned' models) are possible that would all produce the same output.
Often, nodes exhibit non-linear activation functions -> heightened complexity.

Due to their way of representing data, DNNs outperform other learning technologies. At the same time, these representations make it impossible, at least by the current state of scientific knowledge, to uncover the model a DNN has learned during training and by which it makes predictions and classifications. On account of this very feature, ML literature (see, e.g., Rudin, 2019) as well as press articles (see, e.g., McCormick, 2021) discuss DNNs as the paradigm case of black box AI: we humans can observe inputs and outputs but cannot comprehend how the algorithm arrives at the latter based on the former.

"Can we trust black box AI?" series

For an overview of how DNNs work, please refer to the first article in the series.

This is the second article of our "Can we trust black box AI" series, we'll sort out which features of DNNs preclude us from understanding how exactly they arrive at their output.

In the upcoming third article, we'll move on from technical to epistemological matters, and ask: "Should the black box problem of AI concern us (philosophically)?"

And finally, in the fourth part of this series, we'll discuss the potential of suggested remedies to the black box problem, both technical and political: "Can Explainable AI (XAI) remedy AI’s black box problem? And how meaningful is the right to an explanation in the EU AI Act?"

What we understand about DNNs

In addition to inputs and ouputs, we also know the general architecture of the DNN. As mentioned before, the ML programer who sets up the DNN specifies hyperparameters such as the number of layers and nodes, the activation functions as well as the cost function, the learning rate by which the DNN should adjust its weights when it is exposed to new data during the training stage, etc. Once the training process is complete, we can also observe all weights the DNN has learned and which now determine the DNN’s predictions for previously unseen data (inference). In addition, when we present the DNN with a particular input, e.g., a particular image, we can observe the activation levels any node produces for this particular input.

What we do not (and cannot) understand

Still, knowing all this, neither ML engineers who set up a DNN for recognizing tumor cells, nor any expert oncologists can understand why exactly the DNN classifies a particular tumor as either benign or malignant. DNNs remain black boxes because the representations their hidden layers operate on, i.e., large vectors and matrices of activation levels computed by the individual nodes of the net, are not only hierarchical, but also, as pointed out by Humphreys, implicit, extensional, and distributed (Humphreys, 2018):

Non-black box models, such as linear regressions or decision trees, compose their decision functions of explicit representations. A linear regression model of wages will consist of those variables the modelers consider relevant, for example, years of experience on the job, years of schooling, whether an individual works part-time, etc. A decision tree model for classifying birds might pay attention to whether a bird can fly, to its habitat, to its plumage markings and other features. From such models it is clear, or explicit, which features factor in the classification or prediction.

In contrast, with DNNs we do not know for certain the criteria (or at least not all criteria) on which they base their decisions and predictions. These decisions are determined by many computation steps spread over a large number of nodes and by the weights connecting these nodes. The weights define if and how the activation level computed at a certain node enters the ultimate classification or prediction decision.

According to Humphreys, representations are implicit to epistemic agents–as opposed to explicit–when epistemic agents cannot identify the content of these representations without some form of transformation or inference. Long vectors and large matrices require mathematical transformations and simplification, perhaps with the help of computers, and interpretation in order for a human epistemic agent to identify their content.

The use of implicit representations by itself does not fully explain why DNNs are black box technologies. Humphreys gives the example of an encrypted message as an implicit representation that may or may not be opaque to an epistemic agent depending on whether they have the means of decoding it. If they hold the key, the message is not opaque to them, or rather will not remain so, but it is still implicit: before they can identify it, they have to decipher it by transforming it according to the key (Humphreys, 2018).

In case of DNNs, we are lacking the key. A vector consisting of numbers is meaningless to us–unless we know what the components of this vector stand for. Given the large number of nodes, the large number of layers and the non-linearity of activation functions for certain types of nodes, it is all but clear, what feature is (or what features are) implicitly represented at each node and whether these features, if we knew them, would be meaningful to humans at all–rather than just useful for DNNs in order to generate very accurate predictions or classifications. In this sense, the implicit character of DNN representations translates–coupled with the size and complexity of the DNN's prediction or classification model–into epistemic opacity.

Aside from requiring transformation and inference in order for us to grasp them, representations of DNNs are also extensional. Humphreys defines extensional representations as “representations of a predicate (feature) that explicitly list all of the instances to which the predicate (feature) correctly applies” (Humphreys, 2018). Rather than operating with a pre-programmed representation of a feature that does not depend on particular instances of the feature, DNNs ‘learn’ features and represent them by vectors of activation levels that capture all instances of this feature in their training data. The components of such a vector indicate whether (or to what degree) some piece of input exhibits this feature.

When we are dealing with complex features that might be hard, perhaps impossible, to detect for humans, identifying which feature a vector of activation levels extensionally represents is a tough obstacle to overcome. It would require us to compare vectors of activation levels for many different inputs and to draw inferences as to which similarity in the inputs might account for a similarity in the vectors. Given the vast number of features DNNs track, it is impossible to decode every vector of a DNN in this way. Thus, the extensional representation by means of vectors and matrices provides a second source of opacity in DNN representations (Humphreys, 2018).

In addition, the fact that DNNs rely on distributed representations further exacerbates the problem of understanding which representations are used in a DNN and how: In a DNN, features and nodes do not correspond one-on-one. A particular node will typically represent multiple features, a particular feature will typically be represented by various nodes (Humphreys, 2018). Thus, even if one had a hunch that the 63rd node in the 44th layer might be detecting feature X one could not be sure that this node did not also consider features Y and Z and that feature X did not also play a role in the computation of the activation levels of other nodes. Therefore, the DNN’s opacity, can be attributed in parts to the representation of features in this distributed manner (in addition to the implicit and extensional representation of features).

Apart from these modes of representation, at least two other characteristics of DNNs contribute to their black box nature. First, the problems DNNs solve are under-determined: the number of nodes and weights in a DNN exceeds the number of input data points on which the DNN is trained. Thus, there are multiple solutions to the DNN’s classification problem, and it is therefore possible that networks which have learned different models always reach the same classification decision or make the same prediction, even though they assign different weights to certain features. Put differently, the same predictions or classifications across inputs can be achieved with several if not many different decision models. Therefore, it is hard to draw conclusions from the DNN output about the DNN’s decision model.

Further, many DNNs, such as convolutional neural nets used for image classification, employ various non-linear activation functions (Fan et al., 2021). By itself, non-linearity does not create a black box, however, it significantly increases the complexity of the model, rendering it even more difficult to remove opacity stemming from the aforementioned sources.

References

Fan, F.-L., J. Xiong, M. Li, and G. Wang (2021): “On Interpretability of Artificial Neural Networks: A Survey,” IEEE Transactions on Radiation and Plasma Medical Sciences, 741–760.

Humphreys, P. (2009): “The Philosophical Novelty of Computer Simulation Methods,” Synthese, 169, 615–626.

Humphreys, P. (2018): “Epistemic Opacity and Epistemic Inaccessibility,” unpublished.

McCormick, J. (2020-05-22): “Pinterest’s Use of AI Drives Growth: Using Neural Networks, the Site Is Able to Find Images—and Ads—That Will Catch the Consumer’s Eye,” The Wall Street Journal.

Rudin, C. (2019): “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead,” Nature Machine Intelligence, 1, 206– 215.