Open Issues Need Help
View All on GitHubA lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
AI Summary: Implement a new command-line parameter, `--max-model-len`, in the vLLM simulator. This parameter will define the maximum context window size (in tokens) for the model. Requests exceeding this limit should return a 400 Bad Request error with a specific error message indicating the context length exceeded.
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.