Anant Mehta

I'm a Graduate student at Texas A&M University in College Station. My thesis is on Large Scale optimization and improving CLIP training. I am part of OptMAI Lab at TAMU, which is advised by Prof. Tianbao Yang.

Email  /  CV  /  LinkedIn /  Scholar  /  Twitter  /  GitHub

Latest Updates

profile photo

Research and Interests

I'm interested in Deep Learning, Computer Vision, Large Language Models, and Optimization. Most of my research is about using robust optimization to accelerate CLIP training. My work is built over a paper which used "Temperature" as a parameter to optimize Large Foundation Models. You can find the base paper for my research here: To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO.

Education

Texas A&M University (USA)
Master of Science - MS, Computer Science
Thesis: Advancing Contrastive Learning with Scaling Laws and Dynamic Temperature Prediction
Aug 2024 - Jun 2026
GPA: 4.0/4.0

Relevant Coursework:
Generative AI, Deep Learning, Large Scale Optimization for Machine Learning, Natural Language Processing

Thapar Institute of Engineering & Technology (India)
Bachelor of Engineering - BE, Computer Engineering
Sep 2020 - Jun 2024
GPA: 9.77/10.00

Relevant Coursework:
Data Structures, Analysis of Algorithms, Computer Networks, Operating Systems, Database Management Systems, Machine Learning, Conversational AI, Probability & Statistics, Optimization & Numerical Analysis

Research Papers

CAT3D
HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection
WACVW, 2025  (Oral Presentation)
project page / arXiv

In our research, we propose HFMF, a comprehensive two-stage deepfake detection framework that leverages both hierarchical cross-modal feature fusion and multi-stream feature extraction to enhance detection performance against imagery produced by state-of-the-art generative AI models.

CAT3D
An automated diagnosis model for classifying cardiac abnormality utilizing deep neural networks
Multimedia Tools & Applications, Impact Factor: 3.0
Publication

The work proposes a classification system based on the UNet architecture, which processes transformed spectrograms of the PCG signals. The augmented spectrograms have yielded the best results. Specifically, on the PhysioNet 2016 dataset, the proposed model has achieved an accuracy of 96.05%, specificity of 98.82%, and F1 score as 0.91.

CAT3D
AmCLR: Unified Augmented Learning for Cross-Modal Representations
project page / arXiv

We introduce AmCLR and xAmCLR objective functions tailored for bimodal vision-language models to further enhance the robustness of contrastive learning. AmCLR integrates diverse augmentations, including text paraphrasing and image transformations, to reinforce the alignment of contrastive representations, keeping batch size limited to a few hundred samples unlike CLIP which needs batch size of 32,768 to produce reasonable results. xAmCLR further extends this paradigm by incorporating intra-modal alignments between original and augmented modalities for richer feature learning.

CAT3D
HeartBeatNet: Unleashing the Power of Attention in Cardiology
CINS, 2023   (Oral Presentation)
project page / Publication (🏆 Best Paper Award)

This paper proposes a model HeartBeatNet (an attention UNet-based system) for heart sound classification that demonstrates comparatively better performance. The proposed system combines the strengths of attention mechanisms and the UNet architecture to effectively capture relevant features and to make accurate predictions.

CAT3D
A feature extraction and time warping based neural expansion architecture for cloud resource usage forecasting
Cluster Computing, Impact Factor: 3.6
Publication

The current research proposes a computationally less-expensive hybrid approach combining cluster analysis and deep neural learning with transfer learning to estimate the machine-level workload. The method implements clustering to identify the similarity patterns among the non-linear usage profiles of machines present in the input dataset.

CAT3D
Calibrating Machine Learning Models For Accurate Stroke Type Prediction In Low-resource Settings
International Journal of Stroke, Impact Factor: 6.3
Publication (16th World Stroke Congress Proceedings)

This study utilized retrospective and prospective data from AIIMS-New Delhi, totaling 2190 and 92 samples respectively, with a 70% IS and 30% HS split. Stroke classification models were trained and then evaluated on these datasets using three calibration techniques: Platt Scaling, Histogram Binning, and Isotonic Regression, with performance measured by Expected Calibration Error (ECE).

CAT3D
Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans
IEEE BigData Workshop on AI-Driven Agriculture, 2023  (Oral Presentation)
arXiv

Plant breeders and agricultural researchers can increase crop productivity by identifying desirable features, disease resistance, and nutritional content by analysing the Dry Bean dataset. This study analyses and compares different Support Vector Machine (SVM) classification algorithms, namely linear, polynomial, and radial basis function (RBF), along with other popular classification algorithms.

Projects

CAT3D
InDocQ: Intelligent Document Q&A System 🦜🔗
Skills Used: Large Language Models, Langchain, StreamLit, Huggingface, FAISS
project page

InDocQ is an advanced document question-answering system powered by LangChain and Large Language Models (LLMs) and hosted using StreamLit. This application enables users to upload PDF documents and engage in interactive Q&A sessions about the document's content, leveraging the power of semantic search and state-of-the-art language models.

CAT3D
m-Height Generator for Analog ECC
Skills Used: Analysis of Algorithms, PyTorch, CUDA optimization, min-max optimization
project page

This project provides an efficient solution to the problem of finding optimal generator matrices G to minimize the "m-height" of any analog code x. The "m-height" of a codeword c generated with Matrix G and Vector x measures the ratio of the largest and mth largest absolute elements of c. The m-height of the Generator Matrix G is the maximum m-height across all its codewords for all possible x's.to minimize the m-height of an analog code. The implementation combines genetic programming and stochastic optimization techniques to iteratively refine both G-matrices and X-vectors for improved performance.

CAT3D
NCERT Based Search Engine
Skills Used: Information Storage & Retrieval, Django, Huggingface, Elasticsearch, SQL, Haystack
project page / Website Demonstration

This project involved designing and deploying an Extractive Search Engine tailored for the NCERT History textbook. The search engine aimed to provide high-school students with precise, contextually accurate answers to their queries by extracting relevant information from the textbook content. It supports over 1 million students, enhancing their learning experience through quick and accurate answers, reducing the need for manual textbook navigation.

CAT3D
GradFlow: A Custom Automatic Differentiation Library and Neural Network Framework
Skills Used: Deep Learning, PyTorch, CUDA optimization, Graphs
project page

GradFlow is a Python library that implements automatic differentiation from scratch. It provides the core building blocks for constructing and training neural networks.

CAT3D
SmartBreathlyzer: Non-Invasive Tuberculosis (TB) Diagnosis Using VOC Detection
Skills Used: Edge Artificial Intelligence, TensorFlow, Arduino, Nanotechnology
Funded by ICMR, Ministry of Health, India

(Approximate funding of $100,000 spread over a span of 3 years).

The project aimed to revolutionize the diagnostic process for Mycobacterium Tuberculosis (TB) by developing a non-invasive, rapid, and cost-effective diagnostic device. This device detects Volatile Organic Compounds (VOCs) in a patient's breath as biomarkers for TB, eliminating the need for invasive and time-consuming tests like sputum analysis or chest X-rays.


Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead.