Appearance
pip install vllm
See: https://docs.vllm.ai/en/latest/getting_started/installation.html
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-14B-Chat-AWQ --quantization awq --host 0.0.0.0 --port 13333 --gpu-memory-utilization 0.8 --max-model-len 8192
See: Supported models