llm-d/llm-d-inference-sim

Echo mode: support structured content blocks in chat/completions responses 11 days ago

enhancement good first issue

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Return LoRA Model Path in root for /v1/models 14 days ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Add tests for metrics with gRPC 28 days ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Read environment variables into configuration about 2 months ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Align repository with llm-d repo template (12 items) about 2 months ago

help wanted

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Generate correct response IDs prefixes 5 months ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Print real merged configuration to log on simulator initialization 9 months ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Add retries to connect to ZMQ 9 months ago

good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Support --max-model-len config parameter 10 months ago

AI Summary: Implement a new command-line parameter, `--max-model-len`, in the vLLM simulator. This parameter will define the maximum context window size (in tokens) for the model. Requests exceeding this limit should return a 400 Bad Request error with a specific error message indicating the context length exceeded.

Complexity: 4/5

enhancement good first issue

llm-d/llm-d-inference-sim

102

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go

#incubating

Open Issues Need Help