Artificial Intelligence Preprint | 2019-07-03
Artificial Intelligence
Inferred successor maps for better transfer learning (1906.07663v2)
Tamas J. Madarasz
2019-06-18
Humans and animals show remarkable flexibility in adjusting their behaviour when their goals, or rewards in the environment change. While such flexibility is a hallmark of intelligent behaviour, these multi-task scenarios remain an important challenge for machine learning algorithms and neurobiological models alike. Factored representations can enable flexible behaviour by abstracting away general aspects of a task from those prone to change, while nonparametric methods provide a principled way of using similarity to past experiences to guide current behaviour. Here we combine the successor representation (SR), that factors the value of actions into expected outcomes and corresponding rewards, with evaluating task similarity through nonparametric inference and clustering the space of rewards. The proposed algorithm improves SR's transfer capabilities by inverting a generative model over tasks, while also explaining important neurobiological signatures of place cell representation in the hippocampus. It dynamically samples from a flexible number of distinct SR maps while accumulating evidence about the current reward context, and outperforms competing algorithms in settings with both known and unsignalled rewards changes. It reproduces the "flickering" behaviour of hippocampal maps seen when rodents navigate to changing reward locations, and gives a quantitative account of trajectory-dependent hippocampal representations (so-called splitter cells) and their dynamics. We thus provide a novel algorithmic approach for multi-task learning, as well as a common normative framework that links together these different characteristics of the brain's spatial representation.
Generalizing from a few environments in safety-critical reinforcement learning (1907.01475v1)
Zachary Kenton, Angelos Filos, Owain Evans, Yarin Gal
2019-07-02
Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training, allowing them to learn about every possible danger, but this is often impractical. This paper investigates safety and generalization from a limited number of training environments in deep reinforcement learning (RL). We find RL algorithms can fail dangerously on unseen test environments even when performing perfectly on training environments. Firstly, in a gridworld setting, we show that catastrophes can be significantly reduced with simple modifications, including ensemble model averaging and the use of a blocking classifier. In the more challenging CoinRun environment we find similar methods do not significantly reduce catastrophes. However, we do find that the uncertainty information from the ensemble is useful for predicting whether a catastrophe will occur within a few steps and hence whether human intervention should be requested.
dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning (1809.02657v2)
Palash Goyal, Sujit Rokka Chhetri, Arquimedes Canedo
2018-09-07
Learning graph representations is a fundamental task aimed at capturing various properties of graphs in vector space. The most recent methods learn such representations for static networks. However, real world networks evolve over time and have varying dynamics. Capturing such evolution is key to predicting the properties of unseen networks. To understand how the network dynamics affect the prediction performance, we propose an embedding approach which learns the structure of evolution in dynamic graphs and can predict unseen links with higher precision. Our model, dyngraph2vec, learns the temporal transitions in the network using a deep architecture composed of dense and recurrent layers. We motivate the need of capturing dynamics for prediction on a toy data set created using stochastic block models. We then demonstrate the efficacy of dyngraph2vec over existing state-of-the-art methods on two real world data sets. We observe that learning dynamics can improve the quality of embedding and yield better performance in link prediction.
Analysis of the Synergy between Modularity and Autonomy in an Artificial Intelligence Based Fleet Competition (1907.01405v1)
Xingyu Li, Mainak Mitra, Bogdan I. Epureanu
2019-07-02
A novel approach is provided for evaluating the benefits and burdens from vehicle modularity in fleets/units through the analysis of a game theoretical model of the competition between autonomous vehicle fleets in an attacker-defender game. We present an approach to obtain the heuristic operational strategies through fitting a decision tree on high-fidelity simulation results of an intelligent agent-based model. A multi-stage game theoretical model is also created for decision making considering military resources and impacts of past decisions. Nash equilibria of the operational strategy are revealed, and their characteristics are explored. The benefits of fleet modularity are also analyzed by comparing the results of the decision making process under diverse operational situations.
ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning (1902.06007v2)
Andrew Silva, Matthew Gombolay
2019-02-15
Deep reinforcement learning has seen great success across a breadth of tasks, such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily available domain experts' knowledge that could help "warm start" the learning process. Further, learning from demonstration techniques are not yet efficient enough to infer this knowledge through sampling-based mechanisms in large state and action spaces. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions amenable to gradient descent and show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of reinforcement learning challenges.
GPT-based Generation for Classical Chinese Poetry (1907.00151v2)
Yi Liao, Yasheng Wang, Qun Liu, Xin Jiang
2019-06-29
We present a simple yet effective method for generating high quality classical Chinese poetry with Generative Pre-trained Language Model (GPT). The method adopts a simple GPT model, without using any human crafted rules or features, or designing any additional neural components. While the proposed model learns to generate various forms of classical Chinese poems, including Jueju, L"{u}shi, various Cipai and Couples, the generated poems are of very high quality. We also propose and implement a method to fine-tune the model to generate acrostic poetry. To the best of our knowledge, this is the first to employ GPT in developing a poetry generation system. We will release an online demonstration system in the near future to show the generation capability of the proposed method for classical Chinese poetry.
Voting-Based Multi-Agent Reinforcement Learning (1907.01385v1)
Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui
2019-07-02
The recent success of single-agent reinforcement learning (RL) encourages the exploration of multi-agent reinforcement learning (MARL), which is more challenging due to the interactions among different agents. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a distributed primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sub-linear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the global consensus to optimal. We also verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multi-agent systems.
Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence (1907.01368v1)
Peter Ström, Kimmo Kartasalo, Henrik Olsson, Leslie Solorzano, Brett Delahunt, Daniel M. Berney, David G. Bostwick, Andrew J. Evans, David J. Grignon, Peter A. Humphrey, Kenneth A. Iczkowski, James G. Kench, Glen Kristiansen, Theodorus H. van der Kwast, Katia R. M. Leite, Jesse K. McKenney, Jon Oxley, Chin-Chen Pan, Hemamali Samaratunga, John R. Srigley, Hiroyuki Takahashi, Toyonori Tsuzuki, Murali Varma, Ming Zhou, Johan Lindberg, Cecilia Bergström, Pekka Ruusuvuori, Carolina Wählby, Henrik Grönberg, Mattias Rantalainen, Lars Egevad, Martin Eklund
2019-07-02
Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grading. Methods: We digitized 6,682 needle biopsies from 976 participants in the population based STHLM3 diagnostic study to train deep neural networks for assessing prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test set comprising 1,631 biopsies from 245 men. We additionally evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics (ROC) and tumor extent predictions by correlating predicted millimeter cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI and the expert urological pathologists using Cohen's kappa. Results: The performance of the AI to detect and grade cancer in prostate needle biopsy samples was comparable to that of international experts in prostate pathology. The AI achieved an area under the ROC curve of 0.997 for distinguishing between benign and malignant biopsy cores, and 0.999 for distinguishing between men with or without prostate cancer. The correlation between millimeter cancer predicted by the AI and assigned by the reporting pathologist was 0.96. For assigning Gleason grades, the AI achieved an average pairwise kappa of 0.62. This was within the range of the corresponding values for the expert pathologists (0.60 to 0.73).
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation (1904.08311v2)
Gakuto Kurata, Kartik Audhkhasi
2019-04-17
Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7%/23.1% from 14.9%/24.1% on the Hub5 2000 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.
Knowledge Graph Embedding for Ecotoxicological Effect Prediction (1907.01328v1)
Erik B. Myklebust, Ernesto Jimenez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen
2019-07-02
Exploring the effects a chemical compound has on a species takes a considerable experimental effort. Appropriate methods for estimating and suggesting new effects can dramatically reduce the work needed to be done by a laboratory. In this paper we explore the suitability of using a knowledge graph embedding approach for ecotoxicological effect prediction. A knowledge graph has been constructed from publicly available data sets, including a species taxonomy and chemical classification and similarity. The publicly available effect data is integrated to the knowledge graph using ontology alignment techniques. Our experimental results show that the knowledge graph based approach improves the selected baselines.