Large Scale Machine Learning for Visual and Multimodal Data

Date: Thursday, November 7, 2024, 10:00-13:00, Cyprus time 

Venue: This training event is held as a hybrid event. You are welcome to join us at the Andreas Mouskos Auditorium, José Mariano Gago Hall, The Cyprus Institute. Otherwise please, connect to our live stream of the discussion, available on Zoom (Password: VsSCz1) 

Language: English 

Registration: Registration for this event is open until Monday, November 4, 2024. Registration form here. 

Prerequisites:
Attendees must have prior experience with Python, as all hands-on exercises will be conducted using this programming language. Familiarity with using a terminal or command line interface is also beneficial.

Requirements:
On-site attendees should bring their own laptop to participate in the hands-on practicum.

Agenda

10:00 - 10:40

Assoc. Prof. Mihalis Nicolaou 

Title: Large Scale Machine Learning for Visual and Multimodal Data

This talk provides an overview of recent advancements in large-scale ML, touching upon generative modelling, recently released foundation models, and commonly employed architectures. The talk will discuss capabilities, limitations, and ongoing research efforts (e.g., in the context of explainable and efficient architectures), primarily in the context of visual data and tasks. 

 

10:40 - 11:40

Mr. Marios Constantinou

Title: Exploring intricate vision tasks with Grounded SAM2

During the presentation we will cover the various tasks we can accomplish with Grounded SAM2. Grounded SAM2 is the successor of Grounded SAM, which integrates Grounding Dino as an open-set object detector, with the original Segment Anything Model (SAM). This combination enables the user to perform object detection and segmentation of a region, based on text inputs. Some applications that can be performed using Grounded SAM2 are automatic image annotation, image editing using stable diffusion, human motion analysis, and more. After the presentation, there will be a hands-on section where we will explore the model itself. 

Coffee Break

12:00 - 13:00

Dr. Kyriaki Kylili

TitleMultimodal Models for Real-World Applications

Discover how scalable machine learning is transforming computer vision tasks through the use of multimodal models. This presentation showcases the 4M: Massively Multimodal Masked Modeling framework and its innovative approach to integrating multiple data modalities for computer vision tasks. Participants will engage in a hands-on session to apply the model in practical scenarios.