A Practical Guide to LLM Deployment and RAG Systems

Date: Wednesday, 10 September, 2025, 10:00-14:30, Cyprus time

Venue:This training event is held as a hybrid event. You are welcome to join us at the Andreas Mouskos Seminar Room, The Cyprus Institute. Otherwise please, connect to our live stream of the discussion, available on Zoom (Password: VsSCz1)

Language: English 

Registration: Registration for this event is open until Monday, September 8th, 2025. Registration form here. 

Pre-requisites: Attendees should be familiar with Python. Some previous experience with PyTorch. Hands on exercises are part of the training and will be provided in Python.

Requirements:Attendees should bring with them their own laptop (with Administrative privileges) to follow the hands-on practical. They should make sure the following software is available on their laptops: a) A web browser and PDF viewer and b) A command line interface or other client that supports SSH.  

Attendees should be able to access the machine prior to the event. Instructions on how to generate the SSH key can be found at Accessing and Navigating Cyclone Tutorial. We recommend generating the SSH key via Git Bash for Windows (2.5.2. Section – Option 2 on the tutorial).

Agenda

9:30-10:00

Hands-on Setup (Optional)

Please use this session to ensure you can access the HPC system. 

10:00-10:45

Dr. Christodoulos Stylianou

Deploying Large Language Models Locally 

This presentation covers the process of deploying large language models on local machines and high-performance computing systems. It focuses on the tools and workflows needed to run models efficiently without relying on cloud infrastructure.

The talk will include practical tips for setting up environments, managing resources, and avoiding common issues during deployment. It will also introduce retrieval-augmented generation (RAG) systems and explain how they can be used to improve model responses with local or custom data. The goal is to provide a clear, practical overview for anyone interested in working with LLMs in a self-hosted environment. 

10:45-12:00

Dr. Nikolaos Bakas

A Practical overview of Transformers, Embeddings and RAG Systems 

In this session, we will present how to set up and use Large Language Models (LLMs) for various tasks, using the Hugging Face Transformers library. We will cover techniques for inference and text generation, including streaming outputs, and utilize embeddings to understand and visualize semantic relationships between words and sentences using cosine similarity. A key focus of the seminar will be on explaining the basics of Retrieval Augmented Generation (RAG), where we will demonstrate how to build a system that retrieves relevant information from a text corpus to answer user questions. By the end of this session, you will have hands-on experience with powerful LLM tools and an understanding of how to build custom LLM applications that combine language generation with information retrieval. 

Break

12:30-14:30

Mr Marios Constantinou

Hands-On: Model Deployment through vLLM, Communication and Creation of RAG Pipelines 

In this hands-on session, participants will deploy large language models on Cyclone, the National High Performance Computing (HPC) infrastructure, using tools like vLLM for efficient inference and Haystack for building retrieval-augmented generation (RAG) pipelines. The session will guide attendees through the end-to-end process of setting up model environments, running local inference, and integrating retrieval components to create responsive, data-aware applications. By working directly on HPC resources, participants will gain practical experience in managing compute workloads, handling model-serving pipelines, and building systems that combine LLM outputs with relevant external knowledge. 

About the speakers

 

Christodoulos (Chris) Stylianou joined the Computation-based Science and Technology Research Center (CaSToRC) at The Cyprus Institute (CyI) as a Research Engineer for the EuroCC2 Project. He holds an undergraduate degree in Electrical & Electronic Engineering (MEng) from Imperial College London, a Masters Degree in High Performance Computing (HPC) with Data Science (MSc) and a PhD in HPC, Computational & Data Science and Software Engineering from EPCC at The University of Edinburgh.  Chris specializes in High Performance Computing and more specifically in Heterogeneous and Distributed Computing, Performance Portability, and the application of Artificial Intelligence in accelerating Sparse Linear Algebra for the next generation hardware architectures. His current research interests are around acceleration of training time via sparsification of Neural Networks. Chris is also skilled in software development, training, and fostering collaboration between academia, industry, and government with a growing interest in product management. 

Dr. Bakas is a Senior Data Scientist at GRNET, with an extensive background in AI, and 90+ publications in areas including Machine Learning, Numerical Methods, Optimization, and Large Language Models. He has served as PI, researcher, and coordinator for 20+ projects in various research centers. With a Ph.D. from the National Technical University of Athens and 10+ years of university teaching experience, he also brings 20+ years of programming expertise in diverse languages and frameworks. The training seminars he organized have reached over 5,000 engineers.

 

Marios Constantinou holds a bachelor’s degree in Computer Science from the Ionian University in Corfu, Greece. His professional expertise centers on Computer Vision and deep learning, with significant experience in object detection, segmentation, and classification. He is also well-versed in Large Language Models (LLMs), including fine-tuning, model serving, Retrieval-Augmented Generation (RAG), and designing end-to-end MLOps pipelines. His work often involves deploying machine learning systems in real-world settings, including drone-based applications and small object detection. Currently a Research Software Engineer at The Cyprus Institute, he focuses on High Performance Computing (HPC), High Performance Data Analytics (HPDA), and large-scale machine learning projects, while supporting academic and industrial stakeholders in computational skill development. Passionate about innovation, he combines technical expertise and research to tackle complex challenges.