Publications

Find full list of publications on Google scholar.

Preferential Temporal Difference Learning. Nishanth Anand, Doina Precup. ICML 2021. [PDF] [YouTube]
Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are visited. When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states. However, it may be interesting, when computing updates, to take into account other information than whether a state is visited or not. For example, some states might be more important than others (such as states which are frequently seen in a successful trajectory). Or, some states might have unreliable value estimates (for example, due to partial observability or lack of data), making their values less desirable as targets. We propose an approach to re-weighting states used in TD updates, both when they are the input and when they provide the target for the update. We prove that our approach converges with linear function approximation and illustrate its desirable empirical behaviour compared to other TD-style methods.
Recurrent Learning in Reinforcement Learning. Pierre Thodoroff*, Nishanth Anand*, Lucas Caccia, Doina Precup, Joelle Pineau. SPiRL workshop, ICLR 2019. [PDF]
In sequential modelling, exponential smoothing is one of the most widely used techniques to maintain temporal consistency in estimates. In this work, we propose Recurrent Learning, a method that estimates the value function in reinforcement learning using exponential smoothing mph{along the trajectory}. We establish its asymptotic convergence properties under some smoothness assumption on the reward. The proposed algorithm yields a natural way to learn a state dependent emphasis function that selectively learns to emphasize or ignore states based on trajectory information. We demonstrate the potential for this selective updating on a partially observable domain and several continuous control tasks.
Recurrent Value Function. Pierre Thodoroff*, Nishanth Anand*, Lucas Caccia, Doina Precup, Joelle Pineau. RLDM 2019. [PDF]
Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF's asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function.
Temporal Credit Assignment via Traces in Reinforcement Learning. Nishanth Anand. MSc Thesis. [PDF]
Reinforcement Learning is a framework for sequential decision making which is widely used in many domains such as robotics, autonomous driving, etc. Due to the sequential nature there exists the problem of assigning the credit to the actions taken in the past. This problem in reinforcement learning is known as temporal credit assignment. The problem of temporal credit assignment lies in the core of many methods such as options, online learning, off-policy learning, etc. within reinforcement learning framework. Several problems such as high variance in the value function estimates, sub-optimal policy, high sample complexity are a consequence of improper temporal credit assignment in reinforcement learning. In this thesis, we introduce and examine a couple of temporal credit assignment techniques. Specifically, we mitigate the problem of variance in value function by effectively assigning credit. First, we discuss the fundamental concepts of signals and reinforcement learning. Then, we introduce Recurrent Learning which smooths the value function along the trajectory. We then analyze the strengths of Recurrent Learning experimentally. Finally, we introduce filters from signal processing as a general framework for various traces in reinforcement learning. We show the effectiveness of filters with a couple of toy examples.
Stock Market Prediction Using Optimum Threshold Based Relevance Vector Machines. HS Karthik, Nishanth Anand, J Manikandan. ADCOM 2016. [PDF]
Machine learning is employed for myriad of applications ranging from engineering to non-engineering, medical to finance, sports to studies and many more. The huge demand for machine learning has spearheaded various techniques such as Hidden Markov Models (HMM), Artificial Neural Networks(ANN), Support Vector Machines (SVM), Relevance Vector Machines (RVM) and many more. It is well reported in literature that RVM outperforms SVM interms of sparseness as well as accuracy and hence the same is employed for the proposed work. In this paper, stock market prediction using optimum threshold based RVM is reported and its performance is evaluated using given input parameters for share market. In order to assess the performance of the proposed system, datasets from the following four stock exchanges are considered for evaluation, which includes NASDAQ, National Security Exchange (NSE), New York Stock Exchange (NYSE) and London Stock Exchange (LSE). It is observed that 19.17 - 83.33% of relevance vectors are pruned on using the proposed optimum threshold based RVM technique. Also a user friendly Graphical user interface is developed for the proposed work, which can be easily extended for various other machine learning applications too.
SAR image compression using Relevance Vector Machines. Nishanth Anand, J Manikandan. INDICON 2015. [PDF]
Synthetic Aperture Radar (SAR) images are built on-board an aircraft or spacecraft with the help of backscatters and these images are displayed on the cockpit, transmitted to ground station or stored in on-board storage disks based on the system where it is employed. SAR images represent vital information for a large variety of applications which includes automatic target recognition. Hence there is an urge to compress these images with negligible degradation in image quality. In this paper, a novel attempt is made to compress SAR images using RVM for aerospace and satellite applications. Also an optimum threshold based RVM image compressor is proposed and its performance is evaluated. In order to assess the effectiveness of the proposed system, datasets from USC-SIPI image databases are used. It is observed that the images are compressed by 40.36% to 88.53% with a PSNR ranging from 24.34 dB to 33.81 dB on using the proposed optimum threshold technique based RVM model.
Sparse representation using optimum threshold based relevance vector machine. Nishanth Anand, J Manikandan. INDICON 2015. [PDF]
Sparse representation is a signal processing technique that is capable of determining the entire signal from relatively fewer samples. Support vector machines (SVM) and relevance vector machines (RVM) are the most commonly used sparse representation techniques, where the ability of the model to estimate the output is directly related to the sparsity. It is also reported in literature that the performance of RVM is superior over SVM in terms of accuracy and sparseness. In this paper, an optimum threshold based relevance vector machine is proposed for sparse representation. In order to assess the sparseness of proposed approach, three signals and datasets from UCI databases are used for sparse approximation using proposed RVM model and the results are reported. The performance of proposed system is assessed using two parameters, Relative error and Mean square error. It is observed that the number of relevance vectors is pruned by 7.18 - 69.46% on using the proposed optimum threshold technique based RVM model for sparse approximation.