Neuro-symbolic approaches in artificial intelligence PMC

symbolic learning

Due to these problems, most of the symbolic AI approaches remained in their elegant theoretical forms, and never really saw any larger practical adoption in applications (as compared to what we see today). Controversies arose from early on in symbolic AI, both within the field—e.g., between logicists (the pro-logic “neats”) and non-logicists (the anti-logic “scruffies”)—and between those who embraced AI but rejected symbolic approaches—primarily connectionists—and those outside the field. Critiques from outside of the field were primarily from philosophers, on intellectual grounds, but also from funding agencies, especially during the two AI winters.

It is one form of assumption, and a strong one, while deep neural architectures contain other assumptions, usually about how they should learn, rather than what conclusion they should reach. The ideal, obviously, is to choose assumptions that allow a system to learn flexibly and produce accurate decisions about their inputs. Additionally, we initiated SciMED, AI Feynman and GP-GOMEA on experiments A-E 20 times. In each experiment, the most repeated outcome was assigned as the result that the SR system found. The remaining hyperparameters values, shown in Table 1, are obtained using a trial-and-error approach, and tested on various equations and datasets, not including the ones included in this paper. Forward chaining inference engines are the most common, and are seen in CLIPS and OPS5.

All operations are executed in an input-driven fashion, thus sparsity and dynamic computation per sample are naturally supported, complementing recent popular ideas of dynamic networks and may enable new types of hardware accelerations. We experimentally show on CIFAR-10 that it can perform flexible visual processing, rivaling the performance of ConvNet, but without using any convolution. Furthermore, it can generalize to novel rotations of images that it was not trained for.

This AI Paper Introduces Φ-SO: A Physical Symbolic Optimization Framework that Uses Deep Reinforcement Learning to Discover Physical Laws from Data – MarkTechPost

This AI Paper Introduces Φ-SO: A Physical Symbolic Optimization Framework that Uses Deep Reinforcement Learning to Discover Physical Laws from Data.

Posted: Thu, 23 Nov 2023 08:00:00 GMT [source]

Other ways of handling more open-ended domains included probabilistic reasoning systems and machine learning to learn new concepts and rules. McCarthy’s Advice Taker can be viewed as an inspiration here, as it could incorporate new knowledge provided by a human in the form of assertions or rules. For example, experimental symbolic machine learning systems explored the ability to take high-level natural language advice and to interpret it into domain-specific actionable rules.

Both frameworks, like most of the SR frameworks examined in SRBench, were not constructed to work specifically in the physical domain, meaning they do not constrain the outcome by any physical reasoning like dimensionality or monotonicity. In contrast, physics-oriented SR frameworks that were presented in SRBench, such as the strongly-typed GP72, grammatical evolution73, and grammar-guided74 frameworks, were shown to perform, on average, worse than the other models37. This might be because most of the examined datasets were not physical, meaning that the physics-informed SR systems had a disadvantage. Physical models must adhere to first principles and domain-specific theoretical considerations.

Natural language understanding, in contrast, constructs a meaning representation and uses that for further processing, such as answering questions. Knowledge-based systems have an explicit knowledge base, typically of rules, to enhance reusability across domains by separating procedural code and domain knowledge. A separate inference engine processes rules and adds, deletes, or modifies a knowledge store. A second flaw in symbolic reasoning is that the computer itself doesn’t know what the symbols mean; i.e. they are not necessarily linked to any other representations of the world in a non-symbolic way. Again, this stands in contrast to neural nets, which can link symbols to vectorized representations of the data, which are in turn just translations of raw sensory data. So the main challenge, when we think about GOFAI and neural nets, is how to ground symbols, or relate them to other forms of meaning that would allow computers to map the changing raw sensations of the world to symbols and then reason about them.

YAGO incorporates WordNet as part of its ontology, to align facts extracted from Wikipedia with WordNet synsets. The two biggest flaws of deep learning are its lack of model interpretability (i.e. why did my model make that prediction?) and the large amount of data that deep neural networks require in order to learn. These capabilities make it cheaper, faster and easier to train models while improving their accuracy with semantic understanding of language. Consequently, using a knowledge graph, taxonomies and concrete rules is necessary to maximize the value of machine learning for language understanding. The efficiency of a symbolic approach is another benefit, as it doesn’t involve complex computational methods, expensive GPUs or scarce data scientists.

Creating a ‘healthy, happy student population’ through outdoor learning

In this section, we provide an overview of the advantages and limitations of current SR methods. In addition, we focus on knowledge integration methods in the context of SR systems and review the state-of-the-art methods of SR that SciMED will be compared to in the experiments section. Note the similarity to the propositional and relational machine learning we discussed in the last article. One of the most successful neural network architectures have been the Convolutional Neural Networks (CNNs) [3]⁴ (tracing back to 1982’s Neocognitron [5]). The distinguishing features introduced in CNNs were the use of shared weights and the idea of pooling. However, there have also been some major disadvantages including computational complexity, inability to capture real-world noisy problems, numerical values, and uncertainty.

The GA-based one is less resource and time-consuming but stochastic, which may result in sub-optimal results. In contrast, the LV-based component is more computationally expensive but more stable and accurate, on average. A user can decide whether to use the GA, LV, or both SR components for a given task. As a default, the GA-SR is applied 20 times, and only if its results do not pass the criteria for accuracy or stability, the LV-Sr is initiated.

In essence, the concept evolved into a very generic methodology of using gradient descent to optimize parameters of almost arbitrary nested functions, for which many like to rebrand the field yet again as differentiable programming. This view then made even more space for all sorts of new algorithms, tricks, and tweaks that have been introduced under various catchy names for the underlying functional blocks (still consisting mostly of various combinations of basic linear algebra operations). With this paradigm shift, many variants of the neural networks from the ’80s and ’90s have been rediscovered or newly introduced. Benefiting from the substantial increase in the parallel processing power of modern GPUs, and the ever-increasing amount of available data, deep learning has been steadily paving its way to completely dominate the (perceptual) ML. A key component of the system architecture for all expert systems is the knowledge base, which stores facts and rules for problem-solving.[52]

The simplest approach for an expert system knowledge base is simply a collection or network of production rules.

From the perspective of the SR task, the search space should be reduced from all possible combinations into a space of solely the equations that comply with the physical restraints. Multiple knowledge integration methods have been proposed, which can be roughly divided into three main groups. Each approach—symbolic, connectionist, and behavior-based—has advantages, but has been criticized by the other approaches. Symbolic AI has been criticized as disembodied, liable to the qualification problem, and poor in handling the perceptual problems where deep learning excels. In turn, connectionist AI has been criticized as poorly suited for deliberative step-by-step problem solving, incorporating knowledge, and handling planning.

In that case, AI Feynman trains a NN on the data to estimate the structure of the function by five simplifying properties presumably existing in it (i.e., Low-order polynomial, compositionality, smoothness, symmetry, and separability). If simplifying properties are detected, they are exploited to simplify and solve the problem recursively. Additionally, if dimensional samples are provided, a dimensional analysis solver is applied, doubling as a feature selection method that reduces the search space of the unknown equation. This is done by constructing a new set of non-dimensional features containing at least one representation of each dimensional (original) feature, and the smallest number of non-dimensional features possible. An updated version of the algorithm adds Pareto optimization with an information-theoretic complexity metric to improve robustness to noise37,67.

Share this article

For these cases, it is not surprising that AI Feynman also demonstrated good performance, as it brute-forces all the polynomials up to a fourth-order, including the two linear configurations of these cases. Nevertheless, in experiment B, SciMED slightly outperformed AI Feynman and GP-GOMEA, finding a more accurate numerical prefactor of the equation. To facilitate quantitative benchmarking of our and other symbolic regression algorithms, we tested SciMED, AI Feynman and GP-GOMEA on five cases, simulating real-world measurements with significant noise. We compare SciMEDto them as AI Feynman is considered the cutting-edge system for physical purposes, while the general SR system of GP-GOMEA excels at finding accurate and straightforward mathematical models (a requirement for SR in the physical domain). Here experiment E was repeated 32 times, each time withholding or adding information through a specific input junction, demonstrating the impact of domain knowledge.

Most recently, an extension to arbitrary (irregular) graphs then became extremely popular as Graph Neural Networks (GNNs). This is easy to think of as a boolean circuit (neural network) sitting on top of a propositional interpretation (feature vector). However, the relational program input interpretations can no longer be thought of as independent values over a fixed (finite) number of propositions, but an unbound set of related facts that are true in the given world (a “least Herbrand model”).

symbolic learning

This kind of meta-level reasoning is used in Soar and in the BB1 blackboard architecture. Early work covered both applications of formal reasoning emphasizing first-order logic, along with attempts to handle common-sense reasoning in a less formal manner. 2) The two problems may overlap, and solving one could lead to solving the other, since a concept that helps explain a model will also help it recognize certain patterns in data using fewer examples.

To that end, we propose Object-Oriented Deep Learning, a novel computational paradigm of deep learning that adopts interpretable “objects/symbols” as a basic representational atom instead of N-dimensional tensors (as in traditional “feature-oriented” deep learning). For visual processing, each “object/symbol” can explicitly package common properties of visual objects like its position, pose, scale, probability of being an object, pointers to parts, etc., providing a full spectrum of interpretable visual knowledge throughout all layers. It achieves a form of “symbolic disentanglement”, offering one solution to the important problem of disentangled representations and invariance. Basic computations of the network include predicting high-level objects and their properties from low-level objects and binding/aggregating relevant objects together. These computations operate at a more fundamental level than convolutions, capturing convolution as a special case while being significantly more general than it.

These dynamic models finally enable to skip the preprocessing step of turning the relational representations, such as interpretations of a relational logic program, into the fixed-size vector (tensor) format. They do so by effectively reflecting the variations in the input data structures into variations in the structure of the neural model itself, constrained by some shared parameterization (symmetry) scheme reflecting the respective model prior. However, in the meantime, a new stream of neural architectures based on dynamic computational graphs became popular in modern deep learning to tackle structured data in the (non-propositional) form of various sequences, sets, and trees.

In this chapter, we outline some of these advancements and discuss how they align with several taxonomies for neuro symbolic reasoning. Like in so many other respects, deep learning has had a major impact on neuro-symbolic AI in recent years. This appears to manifest, on the one hand, in an almost exclusive emphasis on deep learning approaches as the neural substrate, while previous neuro-symbolic AI research often deviated from standard artificial neural network architectures [2]. This increase in activity is probably primarily due to the fact that advances in deep learning now make it possible to address challenge problems in neuro-symbolic AI that were quite out of reach before the advent of deep learning, thus adding to its attractivity for research and applications. However, we may also be seeing indications or a realization that pure deep-learning-based methods are likely going to be insufficient for certain types of problems that are now being investigated from a neuro-symbolic perspective. We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers.

SYMBOLIC LEARNING THEORY

In 2023, more than 14,000 young learners across the Rio Grande Valley participated in East Foundation programs. East Foundation began offering education programs, dubbed Behind the Gates, in 2014, motivated by research that shows kids who learn outdoors are “healthier, smarter, happier,” says Tina Buford, director of education at the Foundation. In light of the importance of integrating knowledge in SR tasks and the vast potential of knowledge loss during execution65, we believe it is essential to construct the SR pipeline with the scientist at its core. Subsequently, we present five experiments representing the cases SciMED aims to tackle with their results. However, to be fair, such is the case with any standard learning model, such as SVMs or tree ensembles, which are essentially propositional, too. Note the similarity to the use of background knowledge in the Inductive Logic Programming approach to Relational ML here.

These problems are known to often require sophisticated and non-trivial symbolic algorithms. Attempting these hard but well-understood problems using deep learning adds to the general understanding of the capabilities and limits of deep learning. It also provides deep learning modules that are potentially faster (after training) and more robust to data imperfections than their symbolic counterparts. That is certainly not the case with unaided machine learning models, as training data usually pertains to a specific problem. When another comes up, even if it has some elements in common with the first one, you have to start from scratch with a new model. First of all, it creates a granular understanding of the semantics of the language in your intelligent system processes.

Another shortcoming of this system is its high sensitivity even to small amounts of noise37, making it hard to implement on real-world measurements. Deep learning (DL) for SR systems works well on noisy data due to the general resistance of neural networks to outliers. An example of a Deep Symbolic Regression (DSR) system is proposed by48, which is built for general SR tasks rather than specifically for data from the physical domain. This DL-based model uses reinforcement learning to train a generative RNN model of symbolic expressions. Furthermore, it adds a variation of the Monte Carlo policy gradient technique (termed “risk-seeking policy gradient”) to fit the generative model to the precise formula. In the next article, we will then explore how the sought-after relational NSI can actually be implemented with such a dynamic neural modeling approach.

symbolic learning

Consequently, all these methods are merely approximations of the true underlying relational semantics. The idea was based on the, now commonly exemplified, fact that logical connectives of conjunction and disjunction can be easily encoded by binary threshold units with weights — i.e., the perceptron, an elegant learning algorithm for which was introduced shortly. While the interest in the symbolic aspects of AI from the mainstream (deep learning) community is quite new, there has actually been a long stream of research focusing on the very topic within a rather small community called Neural-Symbolic Integration (NSI) for learning and reasoning [12]. This only escalated with the arrival of the deep learning (DL) era, with which the field got completely dominated by the sub-symbolic, continuous, distributed representations, seemingly ending the story of symbolic AI. Amongst the main advantages of this logic-based approach towards ML have been the transparency to humans, deductive reasoning, inclusion of expert knowledge, and structured generalization from small data. Natural language processing focuses on treating language as data to perform tasks such as identifying topics without necessarily understanding the intended meaning.

System 1 is the kind used for pattern recognition while System 2 is far better suited for planning, deduction, and deliberative thinking. In this view, deep learning best models the first kind of thinking while symbolic reasoning best models the second kind and both are needed. In this line of effort, deep learning systems are trained to solve problems such as term rewriting, planning, elementary algebra, logical deduction or abduction or rule learning.

In this experiment, SciMED alerted the user that there is a possible dependent variable missing from the data and presented the equation with the lowest MAE score it found. AI Feynman and GP-GOMEA on the other hand, failed to terminate despite significant computational efforts. For each configuration of knowledge insertion, we conducted 20 iterations and recorded the percentage of correct results (grey bars) and the normalized computational time (blue scatter). The X-axis in the graph shows whether knowledge was inserted (grey) or withheld (white) from a specific input junction. First, the number of folds in the cross-validation (\(k\)) is a critical parameter in the AutoML component. Increasing \(k\) improves the performance of each model, but reduces the number of ML pipelines that can be evaluated in the same amount of time.

The issue is that in the propositional setting, only the (binary) values of the existing input propositions are changing, with the structure of the logical program being fixed. It has now been argued by many that a combination of deep learning with the high-level reasoning capabilities present in the symbolic, logic-based approaches is necessary to progress towards more general AI systems [9,11,12]. Driven heavily by the empirical success, DL then largely moved away from the original biological brain-inspired models of perceptual intelligence to “whatever works in practice” kind of engineering approach.

symbolic learning

As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game. The Symbolic AI paradigm led symbolic learning to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems. We investigate an unconventional direction of research that aims at converting neural networks, a class of distributed, connectionist, sub-symbolic models into a symbolic level with the ultimate goal of achieving AI interpretability and safety.

Neuro-symbolic lines of work include the use of knowledge graphs to improve zero-shot learning. Background knowledge can also be used to improve out-of-sample generalizability, or to ensure safety guarantees in neural control systems. Other work utilizes structured background knowledge for improving coherence and consistency in neural sequence models.

Although deep learning has historical roots going back decades, neither the term “deep learning” nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton’s now classic (2012) deep network model of Imagenet. Symbolic AI is reasoning oriented field that relies on classical logic (usually monotonic) and assumes that logic makes machines intelligent. Regarding implementing symbolic AI, one of the oldest, yet still, the most popular, logic programming languages is Prolog comes in handy. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages. One can wonder how applicative the SITL approach is in real-world scenarios, where the final result is unknown and a real discovery process is conducted. While integrating the correct guesses or knowledge may be considered more an art than science, knowledge integration has gained popularity in recent years101 and has been well utilized by researchers and engineers102,103.

In that case, this component assigns fitness scores to each chromosome corresponding to the accuracy achieved on the subset of features dictated by the chromosome. The first group of methods, the structure-related search space reduction, examines the structure of plausible equations, mainly by their partial derivatives, and incorporates assumptions (i.e., constraints) about them. For example,58 suggested that physical models guarantee monotonic behavior concerning some of its features and narrow the search space to include only monotonic functions. Extending this line of thought,59,60 suggested adding knowledge about convexity instead of only looking at monotonicity. Brute-force search-based SR systems are, in principle, capable of successfully solving every SR task as they test out all possible equations to find the best performing one40. However, in practice, a naive implementation of brute-force methods is infeasible, even on small-sized datasets, because of its computational expense.

  • As a default, the GA-SR is applied 20 times, and only if its results do not pass the criteria for accuracy or stability, the LV-Sr is initiated.
  • In order to advance the understanding of the human mind, it therefore appears to be a natural question to ask how these two abstractions can be related or even unified, or how symbol manipulation can arise from a neural substrate [1].
  • This vast exploitation of simplifying properties enabled AI Feynman to excel at detecting 120 different physical equations, significantly outperforming the preexisting state-of-the-art SR for physical data.

You can foun additiona information about ai customer service and artificial intelligence and NLP. Finally, Nouvelle AI excels in reactive and real-world robotics domains but has been criticized for difficulties in incorporating learning and knowledge. Semantic networks, conceptual graphs, frames, and logic are all approaches to modeling knowledge such as domain knowledge, problem-solving knowledge, and the semantic meaning of language. DOLCE is an example of an upper ontology that can be used for any domain while WordNet is a lexical resource that can also be viewed as an ontology.

symbolic learning

The true resurgence of neural networks then started by their rapid empirical success in increasing accuracy on speech recognition tasks in 2010 [2], launching what is now mostly recognized as the modern deep learning era. Shortly afterward, neural networks started to demonstrate the same success in computer vision, too. Historically, the two encompassing streams of symbolic and sub-symbolic stances to AI evolved in a largely separate manner, with each camp focusing on selected narrow problems of their own. Originally, researchers favored the discrete, symbolic approaches towards AI, targeting problems ranging from knowledge representation, reasoning, and planning to automated theorem proving.

  • We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
  • The second group of methods, the physical laws search space reduction, emphasizes the fundamental laws that any feasible solution should comply with.
  • However, a large value of \(\tau\) may result in drift, and the connections detected by the AutoML component would override the original connections inside the data.
  • While integrating the correct guesses or knowledge may be considered more an art than science, knowledge integration has gained popularity in recent years101 and has been well utilized by researchers and engineers102,103.

To increase the difficulty of this experiment, the chosen feature has hidden physical relations to other introduced features. In turn, this may lead to misleading performance scores and highlights the difficulty of obtaining a reliable symbolic expression. In experiment E, we compare SciMED’s to two state-of-the-art systems using a dataset of noisy measurements with \(12\) features. Here, only four features need to be selected to formulate a correct equation, but the high noise levels make SR difficult as all the features multiply with one another, increasing the noise in the target value significantly. Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques.