-
Your Name authored
- Add flash_attn extraction from global_args in _load_default_model() - Add flash_attn extraction from global_args in _load_model_by_name() - Now --flash-attn flag will properly enable Flash Attention 2 when loading models
7e4ae96f
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| cache | ||
| __init__.py | ||
| capabilities.py | ||
| grammar.py | ||
| manager.py | ||
| parser.py | ||
| templates.py | ||
| tool_call_grammar.gbnf | ||
| utils.py |