When thinking about intelligent beings, we have an implicit set of expectations for what its capabilities should be. I would argue that most all of what is today considered “Artificial Intelligence” falls far short of these expectations. For instance, they should have some ability to solve entirely new problems, not just ones they’ve solved before, or were explicitly trained to solve. The idea of this post is to think through the components that would be needed for a “truly” intelligent agent (IA), as an outsider to the field looking in.
I find it helpful to think about the following concepts in the context of an example. Like most good mental tangents, these ideas were conceived of in the shower, so let’s consider the example of shampooing robot hair.
- Objective: The IA must have one or more objectives along with priority assessments of each objective. In other words, it needs to know what it wants. In this example, the primary objective is to shampoo hair, and the second is to think about designing intelligent robots.
- Toolbox: There must be a collection of tools/actions/capabilities available to the agent that are relevant to achieving the objective. Ideally, this would be hierarchical, where the hierarchy is self-learning. This would mean that common combinations of tools could be grouped into a sequence, which would be a new tool. For instance, commonly observing “left-foot, right-foot” would result in a “walk” tool. This assumes some baseline of capabilities that can be combined into more complex sequences. For shampooing, we need to be able to pick up the bottle, open it, flip it over, squeeze it…etc, which might themselves be combinations of more basic motor skills. If you can’t control your hands, you can’t squeeze a bottle.
- World Model: The IA needs to have a model of its world in order to simulate possible actions. This enables it to simulate possible plans and evaluate their quality. This should include things like “if I let go of the bottle, it will drop”, and “if I squeeze it and the lid is open, the contents will come out”. The model should also allow associations, like “opening the lid on the shampoo bottle is similar to opening the lid on the body wash, which I already know how to do.” This operates as the playground from which the IA can think.
- Prediction Machine: The IA must be able to make predictions for how its available actions will influence the objective in the real world. This depends on the World Model. These predictions are necessary because when reality doesn’t meet expectation, that’s a learning opportunity and a signal that something different might need to be done. If there’s a strong prediction that squeezing the bottle will release the shampoo and it doesn’t, then that might signal the bottle is empty or the lid is closed.
- Planning: The IA must have some plan for how to achieve the objective. This would consist of a sequence of steps/tools as well as predictions about how things will proceed.
- Observation: The IA needs to be able to observe the world and compare expectations to reality. This provides the ground truth and therefore the basis for learning.
- Attention: The IA should be able to focus its compute, particularly when expectation deviates from reality. This should somehow inform the Planning. For instance, when something surprising was observed, the Attention mechanism might stop execution, update its understanding of the situation (and possibly its World Model), re-start planning in this new context, and then resume execution with the new plan. Attention also provides the mechanism by which the relevant data is extracted from the raw data. There must be some notion of “these are the observations and predictions relevant to shampooing my hair”. As the raw data comes streaming through the sensors, the IA needs the ability to ignore the extraneous.
- Learning Mechanism: Given the plan, the predictions, and the observables, the IA needs to be able to update its components. You might need to add in a step to check that the bottle is not empty, for instance, or that the bottle needs to be shaken when low.
These are a lot of capabilities, and surely only scratches the surface. This is important to think through because it’s likely that you can’t solve all of these problems in isolation and combine the solutions to get a fully functional agent. But instead, you would need to start with all the components from the beginning so that they can interact. Can you conceive of a simple toy problem and IA design that incorporates all of these components? Or can you conceive of an IA that doesn’t have one or more of these components?
I have some awareness that those in the field of Reinforcement Learning have dealt with many of these concepts, but I’m not sure if there’s a system that can check them all off. If you know of one, please let me know! Or if you have a better list, I’d like to see that too.