Deploy the vLLM Inference Engine to Run Large Language Models (LLM) on Koyeb
Learn how to set up a vLLM Instance to run inference workloads and host your own OpenAI-compatible API on Koyeb.
Discover how to build, deploy and run applications in production on Koyeb. The fastest way to deploy applications globally.
Learn how to set up a vLLM Instance to run inference workloads and host your own OpenAI-compatible API on Koyeb.
In this tutorial, we showcase how to deploy a FAQ search service built with Hugging Face's Inference API, pgvector, Koyeb's Managed Postgres. The optimized FAQ Search leverages sentence similarity searching to provide the most relevant results to a user's search terms.