Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Tempo14 's Collections
Interpretability
Encoder
Transformer
Diffusion
scaling
self critic
layer
latent reasoning
images
RWKV
Autoregressvie Image Generation
video
World Model
Tools
Reasoning
Attention
interesting
Summary
Long Context
QA
hallucination
small models
Traffic
Code
Fine-Tuning
cpu inference
Prompt Engineering
Mixture of Experts
motion
chain of thought
robotic
new architecture
outperform gpt-4
RLHF
german model
fast
mobile device
efficient inference
alignment
quantization
practical
agents
Synthetic Dataset
mamba
Instruction Tuning
reinforcement learning
compress
Self Improvement
Inpaint
Training
vision
Linear
3D
Math
Embedding
RAG
Stable Diffusion
In-Context
comparison
Molecular
Merging
Pre-Training
Unlearning
Tokenizer
Memory
Spaces
Multimodal
Edit Pictures
Yolo
Music

Pre-Training

updated Nov 10, 2024
Upvote
-

  • Instruction Pre-Training: Language Models are Supervised Multitask Learners

    Paper • 2406.14491 • Published Jun 20, 2024 • 94

  • What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

    Paper • 2410.23743 • Published Oct 31, 2024 • 64

  • NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

    Paper • 2410.20650 • Published Oct 28, 2024 • 17
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs