profile photo

Rusiru Thushara

Email  /  CV  /  Google Scholar  /  LinkedIn  /  Github

Rusiru is currently pursuing an MSc in Computer Vision at MBZUAI, working under the supervision of Prof. Ivan Laptev. He is collaborating with Dr. Jiawang Bain on research in unconstrained novel view synthesis with sparse input views, using diffusion and depth priors to push the boundaries of 3D vision. This builds upon his earlier work during an internship at the same lab, where he focused on 3D reconstruction with the Unitree Go2 robot.

Before joining MBZUAI, Rusiru was a remote Research Fellow at Harvard University, where he worked with Dr. Dushan Wadduwage and Prof. Bevin P Engelward, contributing to innovative research at the intersection of computer vision and biology.

He completed his BSc in Computer Engineering with first-class honours at the University of Peradeniya.

He is particularly interested in vision-language modeling and 3D vision.


News

Here are some of the recent updates in my academic journey.

[Oct 2024]   Laika Robot demo presented at IROS 2024.
[Sep 2024]   Web2Code paper was accepted at NeurIPS 2024.
[June 2024]   Started Internship as an Research Assistant under Prof. Ivan Laptev.
[June 2024]   Released the paper Web2Code.URL
[Nov 2023]   Released the paper PG-Video-LLaVA. URL
[Aug 2023]   Admitted MBZUAI with a full scholarship for an MSc. in Computer Vision.
[Aug 2023]   Paper acceptance at IEEE 17th International Conference on Industrial and Information Systems (ICIIS). URL
[May 2023]   Abstract sessioned for an oral presentation at Optica Imaging Congress 2023.
[Feb 2023]   Graduated with a BSc. (Hons.) in Computer Engineering with first class honours from the University of Peradeniya.
[Nov 2016]   Won a Gold Medal at the Sri Lankan Physics Olympiad. (National Rank - 2nd)

Research

Rusiru is fascinated by the rapid advancements in computer vision, particularly how models are increasingly able to perceive and understand the world like humans do. This progress, especially in the field of robotics, is enabling machines to recognize complex scenes and navigate environments more naturally, bringing us closer to seamless human-machine interaction.

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
*Sukmin Yun, *Haokun Lin, *Rusiru Thushara, *Mohammad Qazim Bhat, *Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen
* Equal contribution.
NeurIPS 2024
Paper / Code / Project Page / Dataset
  • Description: Addresses the challenge of MLLMs in understanding webpage screenshots and generating HTML code, proposing a large-scale benchmark dataset and evaluation framework. Extensive experiments show significant improvements in web-to-code generation and general visual tasks.
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
*Shehan Munasinghe, *Rusiru Thushara, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Mubarak Shah, Fahad S. Khan
* Equal contribution.
Paper / Code / Project Page
  • Description: Extends image-based LLMs to videos understanding, incorporating audio transcripts for enhancedcontext understanding, introducing a baseline framework and benchmark for conversation-driven spatial grounding.

Quantification of Cells in Native Tissues with Object Detection and Weak Supervision
R. Thushara, J. Pradeepkumar, J.J. Corrigan, B.P. Engelward, and D.N. Wadduwage

Abstract accepted for oral presentation at the Optica Imaging Congress 2023
Paper / Poster
  • Description: Investigation on leveraging deep learning approaches for detecting and quantifying homologousrecombination events in rare fluorescent mutant cells deep within the tissue of RaDR mice Usage of deep learning architectures for object detection, classification, and segmentation.

Real-Time Multiple Dyadic Interaction Detection in Surveillance Videos in the Wild
*IM Insaf, *AAP Perera, Rusiru Thushara, GMRI Godaliyadda, MPB Ekanayake, HMVR Herath, JB Ekanayake
ICIIS 2023
Paper
  • Description: This paper proposes a novel computer vision-based system that identifies multiple co-occurring dyadic (two-person) interactions in a crowded scenario and classifies them into six action classes.

Collision free obstacal robots for Swarm Robots Platform.
Rusiru Thushara, Dinindu Thilakarathne, Heshan Dissanayake, Isuru Navinna, Roshan Ragel
Project Page / Code / Demo Video
  • Description: Obstacal bot system for the existing swarm project of University of Peradeniya. This system mainly contains overhead camera setup to localize the obstacle bots. By using this system the users can place the obstacle bots in disired positions or the disired repititive paths. Then the system positions the bots in relavant places without coliding with other robots. For this we are using Partical Repulsion Theory and model the Obstacal bots as charged particals in a Electric Field.


Experience
Harvard Logo Harvard University, USA
Research Fellow
Aug 2022 - Dec 2023

UNF Logo University of North Florida, USA
External Research Intern
Jan 2022 - Jul 2022

Education
MBZUAI Logo Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Master of Science in Computer Vision
Aug 2023 - Present

UoP Logo University of Peradeniya, Sri Lanka
Bachelor of Science (Engineering) specialized in Computer Engineering
Nov 2017 - Feb 2023
Email | CV | Google Scholar | LinkedIn | GitHub
© 2024 Rusiru Thushara. All Rights Reserved.
Website layout inspired by John Barron's website.