Researchers are trying to get to a provably beneficial AI that would for example prevent a self-driving car from veering into evil-doing. (GETTY IMAGES)

By Lance Eliot, the AI Trends Insider

As AI systems continue to be developed and fielded, one nagging and serious concern is whether the AI will achieve beneficial results.

Perhaps among the plethora of AI systems are some that will be or might become eventually untoward, working in non-beneficial ways, carrying out detrimental acts that in some manner cause irreparable harm, injury, and possibly even death to humans. There is a distinct possibility that there are toxic AI systems among the ones that are aiming to help mankind.

We do not know whether it might be just a scant few that are reprehensible or whether it might be the preponderance that goes that malevolent route.

One crucial twist that accompanies an AI system is that they are often devised to learn while in use, thus, there is a real chance that the original intent will be waylaid and overtaken into foul territory, doing so over time, and ultimately exceed any preset guardrails and veer into evil-doing.

Proponents of AI cannot assume that AI will necessarily always be cast toward goodness.

There is the noble desire to achieve AI For Good, and likewise the ghastly underbelly of AI For Bad.

To clarify, even if AI developers had something virtuous in mind, realize that their creation can either on its own transgress into badness as it adjusts on-the-fly via Machine Learning (ML) and Deep Learning (DL), or it could contain unintentionally seeded errors or omissions that when later encountered during use are inadvertently going to generate bad acts.

Somebody ought to be doing something about this, you might be thinking and likewise wringing your hands worryingly.

For my article about the brittleness of ML/DL see:

For aspects of plasticity and DL see my discussion at:

On my discussion of the possibility of AI failings see:

To learn about the nature of failsafe AI, see my explanation here:

Proposed Approach Of Provably Beneficial AI

One such proposed solution is an arising focus on provably beneficial AI.

Here’s the background.

If an AI system could be mathematically modeled, it might be feasible to perform a mathematical proof that would logically indicate whether the AI will be beneficial or not.

As such, anyone embarking on putting an AI system into the world would be able to run the AI through this provability approach and then be confident that their AI will be in the AI For Good camp, and those that endeavor to use the AI or that become reliant upon the AI will be comforted by the aspect that the AI was proven to be beneficial.

Voila, we turn the classic notion of A is to B, and as B is to C, into the strongly logical conclusion that A is to C, as a kind of tightly interwoven mathematical logic that can be applied to AI.

For those that look to the future and see a potential for AI that might overtake mankind, perhaps becoming a futuristic version of a frightening Frankenstein this idea of clamping down on AI by having it undergo a provability mechanism to ensure it is beneficial offers much relief and excitement.

We all ought to rejoice in the goal of being able to provably showcase that an AI system is beneficial.

Well, other than those that are on the foul side of AI, aiming to use AI for devious deeds and purposely seeking to do AI For Bad. They would be likely to eschew any such proofs and offer instead pretenses perhaps that their AI is aimed at goodness as a means of distracting from its true goals (meanwhile, some might come straight out and proudly proclaim they are making AI for destructive aspirations, the so-called Dr. Evil flair).

There seems to be little doubt that overall, the world would be better off if there was such a thing as provably beneficial AI.

We could use it on AI that is being unleashed into the real-world and then is heartened that we have done our best to keep AI from doing us in, and accordingly use our remaining energies on keeping watch on the non-proven AI that is either potentially afoul or that might be purposely crafted to be adverse.

Regrettably, there is a rub.

The rub is that wanting to have a means for creating or verifying provably beneficial AI is a lot harder than it might sound.

Let’s consider one such approach.

Professor Stuart Russell at the University of California Berkeley is at the forefront of provably beneficial AI and offers in his research that there are three core principles involved (as indicated in his research paper at

1)      “The machine’s purpose is to maximize the realization of human values. In particular, it has no purposes of its own and no innate desire to protect itself.”

2)      “The machine is initially uncertain about what those human values are. The machine may learn more about human values as it goes along, of course, but it may never achieve complete certainty.”

3)      “Machines can learn about human values by overserving the choices that we humans make.”

Those core principles are then formulated into a mathematical framework, and an AI system is either designed and built according to those principles from the ground-up, or an existent AI system might be retrofitted to abide by those principles (the retrofitting would be generally unwise as it is easier and more parsimonious to start things the right way rather than trying to, later on, squeeze a square peg into a round hole, as it were).

For those of you that are AI insiders, you might recognize this approach as being characterized by being a Cooperative Inverse Reinforcement Learning (CIRL) scheme, whereby multiple agents are working cooperatively and the agents, in this case, are a human and an AI, of which the AI attempts to learn from the human by the actions of the human instead of learning from the AI’s direct actions per se.

Some would bluntly say that this particular approach to provably beneficial AI is shaped around making humans happy with the results of the AI efforts.

And making humans happy sure seems like a laudable ambition.

The Complications Involved

It turns out that there is no free lunch in trying to achieve provably beneficial AI.

Consider some of the core principles and what they bring about.

The first stated principle is that the AI is aimed to maximize the realization of human values and that the AI has no purposes of its own, including no desire to protect itself.

Part of the basis for making this rule is that it would seem to do away with the classic paperclip problem or the King Midas problem of AI.

Allow me to explain.

Hypothetically, suppose an AI system was set up to produce paperclips. If the AI is solely devoted to that function, it might opt to do so in ways that are detrimental to mankind. For example, to produce as many paperclips as possible, the AI begins to take over steel production to ensure that there are sufficient materials to make paper clips. Soon, in a draconian way, the AI has marshaled all of the world’s resources to incessantly make those darned paperclips.

Plus, horrifically, humanity might be deemed as getting in the way of the paperclip production, and so the AI then wipes out humanity too.

All in all, this is decidedly not what we would have hoped for as a result of the AI paperclip making system.

This is similar to the fable of King Midas whereby everything he touched turned to gold, which at first seemed like a handy way to great rich, but then upon touching water it turns to gold, and the food turned to gold, and so on, ultimately he does himself in and realizes that his wishes were a curse.

Thus, rather than AI having a goal that it embodies, such as making paper clips, the belief in this version of provably beneficial AI is that it would be preferred that the AI not have any self-beliefs and instead entirely be driven by the humans around it.

Notice too that the principle states that the AI is established such that it has no desire to protect itself.

Why so?

Aha, this relates to another classic AI problem, the off-switch or kill-switch issue.

Assume that any AI that we humans craft will have some form of off-switch or kill-switch, meaning that if we wanted to do so, we could stop the AI, presumably whenever we deemed desirable to so halt. Certainly, this would be a smart thing for us to do, else we might have that crazed paperclip maker and have no means to prevent it from overwhelming the planet in paperclips.

If the AI has any wits about it, which we are kind of assuming it would, the AI would be astute enough to realize that there is an off-switch and that humans could use it. But if the AI is doggedly determined to make those paper clips, the use of an off-switch would prevent it from meeting its overarching goal, and therefore the proper thing to do would be for the AI to disable that kill-switch.

It might be one of the first and foremost acts that the AI would undertake, seeking to preserve its own “lifeblood” by disabling the off switch.

To try and get around this potential loophole, the stated principle in this provably beneficial AI framework indicates that the AI is not going to have that kind of self-preservation cooked into its inherent logic.

Presumably, if the AI is going to seek to maximize the realization of human values, it could be that the AI will itself realize that disabling the off-switch is not in keeping with the needs of society and thus will refrain from doing so.  Furthermore, maybe the AI eventually realizes that it cannot achieve the realization of human values, or that it has begun to violate that key premise, and the AI might overtly turn itself off, viewing that its own “demise” is the best way to accede to human values.

This does seem enterprising and perhaps gets us out of the AI doomsday predicaments.

Not everyone sees it that way.

One concern is that if the AI does not have a cornerstone of any semblance of self, it will potentially be readily swayed in directions that are not quite so desirable for humanity.

Essentially, without a truism at its deepest realm of something ironclad about don’t harm humans, using perhaps Issac Asimov’s famous first rule that a robot may not injure a human being or via inaction allow a human to be harmed, there is no failsafe of preventing the AI from going kilter.

That being said, the counter-argument is that the core principles of this kind of provably beneficial AI are indicative that the AI will learn about human values, doing so by observation of human acts, and we might assume this includes that the AI will inevitably and inextricably discover on its own Asimov’s first rule, doing so by the mere act of observing human behavior.

Will it?

A counter to the counter-argument is that the AI might learn that humans do kill each other, somewhat routinely and with at times seemingly little regard for human life, out of which the AI might then divine that it is okay to harm or kill humans.

Since the AI lacks any ingrained precept that precludes harming humans, the AI will be open to whatever it seems to “learn” about humans, including the worst and exceedingly vile of acts.

Additionally, those that are critics of this variant of provably beneficial AI that are apt to point out that the word “beneficial” is potentially being used in a misleading and confounding way.

It would seem that the core principles do not mean to achieve “beneficial” in that sense of arriving at a decidedly “good” result per se (in any concrete or absolute way), and instead beneficial is intended as relative to whatever humans happen to be exhibiting as seemingly so-called beneficial behavior. This might be construed as a relativistic ethics stanch, and in that manner, does not abide by any presumed everlasting or considered unequivocal rules of how humans ought to behave (even if they do not necessarily behave in such ways).

You can likely see that this topic can indubitably get immersed in and possibly mired into cornerstone philosophical and ethical foundations debates.

This also takes things into the qualms about basing the AI on the behaviors of humans.

We all know that oftentimes humans say one thing and yet do another.

As such, one might construe that it is best to base the AI on what people do, rather than what they say since their actions presumably speak louder than their words. The problem with this viewpoint of humanity is that it seems to omit that words do matter and that inspection of behavior alone might be a rather narrow means of ascribing things like intent, which would seem to be an equally important element for consideration.

There is also the open question about which humans are to be observed.

Suppose the humans are part of a cult that is bent on death and destruction, and in which case, their “happiness” might be shaped around the beliefs that lead to those dastardly results, and the AI would dutifully “learn” those as the thing to maximize as human values.

And so on.

In short, as pointed out earlier, seeking to devise an approach for provably beneficial AI is a lot more challenging than meets the eye at first glance.

That being said, we should not cast aside the goal of finding a means to arrive at provably beneficial AI.

Keep on trucking, as they say.

Meanwhile, how might the concepts of provably beneficial AI be applied in a real-world context?

Consider the matter of AI-based true self-driving cars.

For my detailed discussion about the paperclip problem in AI, see:

On the topic of AI singularity, see my explanation here:

For aspects about AI conspiracy theories, here is my take on the subject:

When considering the mindset of AI developers, see my discussion here:

The Role of AI-Based Self-Driving Cars

True self-driving cars are ones that the AI drives the car entirely on its own and there isn’t any human assistance during the driving task.

These driverless vehicles are considered a Level 4 and Level 5, while a car that requires a human driver to co-share the driving effort is usually considered at a Level 2 or Level 3. The cars that co-share the driving task are described as being semi-autonomous, and typically contain a variety of automated add-on’s that are referred to as ADAS (Advanced Driver-Assistance Systems).

There is not yet a true self-driving car at Level 5, which we don’t yet even know if this will be possible to achieve, and nor how long it will take to get there.

Meanwhile, the Level 4 efforts are gradually trying to get some traction by undergoing very narrow and selective public roadway trials, though there is controversy over whether this testing should be allowed per se (we are all life-or-death guinea pigs in an experiment taking place on our highways and byways, some point out).

Since semi-autonomous cars require a human driver, the adoption of those types of cars won’t be markedly different than driving conventional vehicles, so there’s not much new per se to cover about them on this topic (though, as you’ll see in a moment, the points next made are generally applicable).

For semi-autonomous cars, it is important that the public needs to be forewarned about a disturbing aspect that’s been arising lately, namely that despite those human drivers that keep posting videos of themselves falling asleep at the wheel of a Level 2 or Level 3 car, we all need to avoid being misled into believing that the driver can take away their attention from the driving task while driving a semi-autonomous car.

You are the responsible party for the driving actions of the vehicle, regardless of how much automation might be tossed into a Level 2 or Level 3.

Self-Driving Cars And Provably Beneficial AI

For Level 4 and Level 5 true self-driving vehicles, there won’t be a human driver involved in the driving task.

All occupants will be passengers.

The AI is doing the driving.

One hope for true self-driving cars is that they will mitigate the approximate 40,000 deaths and about 1.2 million annual injuries that occur due to human driving in the United States alone each year. The assumption is that since the AI won’t be driving and drinking, for example, it will not incur drunk driving-related car crashes (which accounts for nearly a third of all driving fatalities).

Some offer the following “absurdity” instance for those that are considering the notion of provably beneficial AI as an approach based on observing human behavior.

Suppose AI observes the existing driving practices of humans. Undoubtedly, it will witness that humans crash into other cars, and presumably not know that it is due to being intoxicated (in that one-third or so of such instances).

Presumably, we as humans allow those humans to do that kind of driving and cause those kinds of deaths.

We must, therefore, be “satisfied” with the result, else why we would allow it to continue.

The AI then “learns” that it is okay to ram and kill other humans in such car crashes, and has no semblance that it is due to drinking and that it is an undesirable act that humans would prefer to not have taken place.

Would the AI be able to discern that this is not something it should be doing?

I realize that those of you in the provably beneficial AI camp will be chagrined at this kind of characterization, and indeed there are loopholes in the aforementioned logic, but the point generally is that these are quite complex matters and undoubtedly disconcerting in many ways.

Even the notion of having foundational precepts as absolutes is not so readily viable either.

Take as a quick example the assertion by some that an AI driving system ought to have an absolute rule like Asimov’s about not harming humans and thus this apparently resolves any possible misunderstanding or mushiness on the topic.

But, as I’ve pointed out in an analysis of a recent incident in which a man rammed his car into an active shooter, there are going to be circumstances whereby we might want an AI driving system to undertake harm, and cannot necessarily have one ironclad rule thereof.

Again, there is no free lunch, in any direction, that one takes on these matters.

For why self-driving cars are a moonshot effort, see my discussion here:

For the edge problems and corner cases aspects, see my indication:

On the topic of illegal driving by autonomous cars, read my analysis here:


There is no question that we could greatly benefit from a viable means to provably showcase that AI is beneficial.

If we cannot attain showing that the AI is beneficial, at least provide a mathematical proof that the AI will keep to its stated requirements (well, this opens another can of worms, but at least sidesteps the notion of “beneficial,” rightfully or wrongly so).

Imagine an AI-based self-driving car that was subjected before getting onto the roadways to a provable safety theorem, and that had something similar that worked in real-time as the vehicle navigated our public streets.

Researchers are trying to get there and we can all hope they keep trying.

At this juncture, one thing that is provably the case is that all of the upcoming AI that is rapidly emerging into society is going to be extraordinarily vexing and troublesome, and that’s something we can easily prove.

Copyright 2020 Dr. Lance Eliot

This content is originally posted on AI Trends.

[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column:]

Source link