LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.
Abstract: Recent improvements in the accuracy of machine learning (ML) models in the language domain have propelled their use in a multitude of products and services, touching millions of lives daily.
Abstract: Federated learning (FL) has emerged as an ideal privacy-preserving learning technique which can train a global model in a collaborative way while preserving the private data in the local.