16 13 12

kas

shing3232

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

TransMLA: Multi-head Latent Attention Is All You Need

updated a collection 9 days ago

sakura

upvoted an article 18 days ago

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

View all activity

Organizations

None yet

shing3232's activity

upvoted a paper 5 days ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 50

updated a collection 9 days ago

sakura

Collection

4 items • Updated 9 days ago

upvoted an article 18 days ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 242

upvoted a paper 23 days ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published 29 days ago • 107

upvoted a paper 28 days ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published about 1 month ago • 25

liked a model about 1 month ago

SakuraLLM/Sakura-GalTransl-7B-v3

Updated 27 days ago • 13.1k • 56

liked a model about 2 months ago

webbigdata/ALMA-7B-Ja-V2

Text Generation • Updated Nov 3, 2024 • 50 • 18

New activity in agentica-org/DeepScaleR-1.5B-Preview 3 months ago

I have difficulty to trigger thinking process

#12 opened 3 months ago by

shing3232

New activity in tencent/Tencent-Hunyuan-Large 6 months ago

这个模型得什么配置能运行起来啊

#13 opened 6 months ago by

demo001s

updated a model 6 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

Updated Nov 8, 2024 • 43 • 1

upvoted a collection 8 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated 9 days ago • 310

liked a model 10 months ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • Updated Jul 1, 2024 • 7.13k • 124

updated a model 11 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 8 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF 11 months ago

CUDA运行不了BF16模型？

#1 opened 11 months ago by

NeuronAstate

New activity in Qwen/Qwen1.5-7B-Chat-GGUF 11 months ago

Please post f16 quantization.

#1 opened 12 months ago by

ZeroWw

liked a model 12 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 8 • 3

upvoted a paper about 1 year ago

BASS: Batched Attention-optimized Speculative Sampling

Paper • 2404.15778 • Published Apr 24, 2024 • 11

New activity in Qwen/CodeQwen1.5-7B-Chat about 1 year ago

What are the diffences of this with Qwen/CodeQwen1.5-7B

#5 opened about 1 year ago by

Kalemnor

liked a model about 1 year ago

databricks/dbrx-instruct

Text Generation • Updated Apr 19, 2024 • 9.21k • 1.11k

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat about 1 year ago

请问这个版本GPU内存消耗28G与14B对比如何?

#7 opened about 1 year ago by

william0014