Rajat Sen

I am a Research Scientist at Google. My areas of interest are bandit algorithms, black box optimization and time-series forecasting. I obtained my PhD. at UT Austin, where I was lucky to be advised by Dr. Sanjay Shakkottai.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We introduce a new context-enriched time series forecasting benchmark TimesX. TimesX contains a wide selection of high-quality real-world time series and diverse textual contexts from an automated generating pipeline, which helps address three main issues of existing benchmarks: (1) poor generalization due to low data volume and data being synthetic, (2) restricted forms of context, and (3) an inability to mitigate data leakage. We conduct a thorough empirical study of current multimodal solutions on TimesX. Our results suggest that most multimodal solutions that work well on existing benchmarks may fail on TimesX. In contrast, simple ensemble methods that leverage the rich textual context can outperform strong unimodal baselines and other multimodal baselines. ** Below this is what was submitted to ITP. ** We create a real world multimodal time-series forecasting benchmark that encompasses diverse domains and regions. Each time-series is annotated by various kinds of contexts like metadata, date and holiday information, dynamic events related to the time-series. This is sufficiently more advanced than other available benchmarks which rely wither on static metadata alone or synthetic examples. This forms a test bed for multimodal forecasting. We also present some baseline results showing that ensembles of publicly available LLMs and time-series foundation models can demonstrate non-trivial performance on this bechmark. View details
    Preview abstract We pioneer the study of in-context training for time-series foundation models. We create finetuning examples that not only include the usual (context, horizon) pairs for forecasting; but also related time-series examples in-context. We finetune a pretrained time-series foundation model on the type of in-context examples mentioned above. Our training is decoder-only and can adapt not only to any context, horizon pair (up to a certain maximum context) but also to any number of supplementary time-series examples (again up to a certain maximum number of examples). Appropriately trained models can then learn to borrow patterns from these related examples to do better on the original forecasting task. We show that this opens up interesting features like the ability to prompt the time-series foundation model with different related examples. This can help the finetuned model to adapt to specific features of a dataset at inference time. We show that such adaptions can lead to better zero-shot performance on popular forecasting benchmarks as compared to supervised deep learning methods, statistical models as well as other time-series foundation models. View details
    Exploiting LLM for Multimodal Time Series Forecasting
    Zihao Zhou
    Mathew Luo
    Benjamin Xue
    Weihao Kong
    Abhimanyu Das
    Abhishek Tanpure
    Howard Tsa
    Kai Kim
    2024
    Preview abstract Current deep learning methods only study time series data in separation. There exists a considerable gap between the time series community and communities that work with vision and language data. With the tremendous success of Large Language Models (LLM), many start to wonder how to exploit LLMs for improved forecasting. A few recent works explored LLMs by either converting numerical time series into text strings, or fine-tuning a pretrain vision-language model. In this paper, we argue that these approaches do not fully exploit the predictive power LLMs for text data and are susceptible to outputting irrelevant tokens. We propose to exploit LLM for multimodal time series forecasting by combining textual data with numerical time series. We develop a framework that can efficiently encode multimodal sequence data and generate time series data only as forecasts. To validate our framework, we collect 3 large-scale real-world multimodal time series datasets from different domains: e-commerce, health-care and climate science. Comparing to single-modal deep learning models and methods that use LLMs, our approach leads to xx\% improvement in forecasting accuracy. Furthermore, with the addition of text prompts, our framework also enables efficient time series scenario creation in a highly interpretable manner. View details
    Preview abstract Motivated by recent advances in large language models for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of datasets, matches the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time series dataset, and can work well across different forecasting history lengths, prediction lengths and temporal granularities. View details
    Preview abstract Hierarchical forecasting is a key problem in many practical multivariate forecasting applications - the goal is to obtain coherent predictions for a large number of correlated time series that are arranged in a pre-specified tree hierarchy. In this paper, we present a novel probabilistic top-down approach to hierarchical forecasting that uses a novel attention-based RNN model to learn the distribution of the proportions according to which each parent prediction is split among its children nodes at any point in time. It also relies on an independent probabilistic forecasting model for the (univariate) root time series that can be generated using a sequence-to-sequence model, or even from traditional autoregressive-style univariate forecasting models , and The resulting forecasts are computed in a top-down fashion and are naturally coherent, and also support probabilistic predictions over all time series in the hierarchy. We provide theoretical justification for the superiority of our top-down approach compared to traditional bottom-up hierarchical modeling. Finally, we experiment on several public datasets and demonstrate significantly improved probabilistic forecasts, compared to state-of-the-art probabilistic hierarchical models. View details
    Long Horizon Forecasting with TiDE: Time-series Dense Encoder
    Abhimanyu Das
    Andrew Leach
    Rose Yu
    Weihao Kong
    Transactions on Machine Learning Research (2023)
    Preview abstract We propose a simple MLP based encoder decoder architecture for long term time-series forecasting that can handle non-linear dependencies and dynamic covariates. Our method can achieve better results in several long term forecasting benchmarks while being 5-10x faster in terms of training and inference compared to the best transformer based baselines. We also show theoretically and empirically that linear models can be near optimal when the ground truth is generated from an LDS when compared to RNN's and transformers. View details
    Black-Box Optimization of Unimodal Functions
    Abhimanyu Das
    Ashok Cutkosky
    Chansoo Lee
    Weihao Kong
    UAI 2023 (2023)
    Preview abstract We provide an intuitive new algorithm for black-box stochastic optimization of unimodal functions, a function class that we observe empirically can capture hyperparameter-tuning loss surfaces. Our method's convergence guarantee automatically adapts to Lipschitz constants and other problem difficulty parameters, recovering and extending prior results. We complement our theoretical development with experimentally validation on hyperparameter tuning tasks. View details
    Efficient List-Decodable Regression using Batches
    Abhimanyu Das
    Ayush Jain
    Weihao Kong
    Efficient List-Decodable Regression using Batches, ICML (2023)
    Preview abstract We begin the study of list-decodable linear regression using batches. In this setting only an $\alpha \in (0,1]$ fraction of the batches are genuine, each providing a batch of $\ge n$ i.i.d. samples from a common unknown distribution. The remaining batches may contain arbitrary or even adversarial samples. We derive a polynomial time algorithm that for any $n\ge \tilde \Omega(1/\alpha)$ returns a list of size $\mathcal O(1/\alpha)$ such that one of the item in the list is close to the true regression parameter. The algorithm requires only $\tilde\cO(d)$ genuine batches, and works under fairly general assumptions on the distribution. The results demonstrate the utility of batch structure, which allows for the first polynomial time algorithm for list-decodable regression, which may be impossible for non-batch setting, as suggested by a recent SQ lower bound~\cite{diakonikolas2021statistical} for the non-batch setting. View details
    Preview abstract We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the \textit{iterative trimmed maximum likelihood estimator} which is known to be effective against \textit{label corruptions} in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the much more challenging setting of \textit{label and covariate corruptions} and demonstrate its robustness and optimality in that setting as well. View details
    On Learning Mixtures of Linear Regressions in a Non-Realizable Setting
    Arya Mazumdar
    Avishek Ghosh
    Soumyabrata Pal
    On Learning Mixture of Linear Regressions in the Non-Realizable Setting (2022)
    Preview abstract While mixture of linear regression is a well-studied topic, prior works in the literature usually focus on the {\em realizable} setting, i.e., when data is generated by a mixed linear (noisy) model. In this paper we show that a version of the popular AM algorithm finds out the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial point. In general, finding the best fit lines in the dataset involved solving a nonconvex optimization problem. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. In addition, we show that, mixture of linear regression can be used in a {\em list} prediction framework, with small prediction error via solving the aforementioned optimization problem. View details
    ×