vishwamai-model / README.md
kasinadhsarma's picture
Update README.md
6f11022 verified
metadata
language:
  - en
tags:
  - pytorch
  - transformer
  - language-model
  - mixture-of-experts
  - tree-of-thoughts
  - neural-memory
datasets:
  - openai/gsm8k
  - cais/mmlu
  - TIGER-Lab/MMLU-Pro
  - openai/MMMLU
  - MMMU/MMMU
  - greengerong/leetcode
  - LimYeri/LeetCode_Python_Solutions_v2
  - newfacade/LeetCodeDataset
  - deepmind/math_dataset
  - google/IFEval
  - Idavidrein/gpqa
  - google/frames-benchmark
  - camel-ai/math
  - camel-ai/code
  - microsoft/SCBench
  - princeton-nlp/SWE-bench_Verified
  - princeton-nlp/SWE-bench
  - wikimedia/wikipedia
  - HuggingFace/C4
  - SamuelYang/bookcorpus
  - sentence-transformers/codesearchnet
  - openai/openai_humaneval
license: mit
pipeline_tag: text2text-generation

VishwamAI

VishwamAI is an enhanced transformer model that combines several cutting-edge techniques to improve reasoning, memory retention, and computational efficiency.

Model Details

  • Developers: VishwamAI Team
  • Architecture: Enhanced Transformer with MoE
  • Release Date: 2024
  • Languages: English
  • Framework: PyTorch
  • License: MIT
  • Model Type: Causal Language Model

Technical Specifications

  • Parameters: 671B
  • Context Length: 32,768 tokens
  • Hidden Size: 8,192
  • Attention Heads: 64
  • Layers: 120
  • Vocabulary Size: 64,000

Key Innovations

  1. Differentiable Cache Augmentation

    • Enhances transformer's key-value cache with learnable embeddings
    • Enables asynchronous reasoning capabilities
    • Implements gated memory updating mechanism
  2. Neural Long-Term Memory

    • Memory layers with read/write/forget gates
    • Multi-head memory attention mechanisms
    • Hierarchical memory organization
  3. Tree of Thoughts Reasoning

    • Multi-path reasoning exploration
    • Beam search for solution paths
    • Intermediate step evaluation

Training Data

The model is being trained on a diverse set of datasets:

  1. GSM8K

    • Grade school math word problems
    • Tests mathematical reasoning capabilities
  2. MMLU (Massive Multitask Language Understanding)

    • Broad knowledge evaluation
    • Multiple academic and professional domains
  3. MMLU-Pro

    • Professional and specialized knowledge
    • Advanced reasoning tasks
  4. MMMLU (Massive Multi-task Multi-token Language Understanding)

    • Extended reasoning capabilities
    • Complex multi-step problems

Training Procedure

Hardware Requirements

  • Minimum: Single NVIDIA A100 (80GB)
  • Recommended: Multiple A100s with NVLink
  • Distributed Training: Supported via FSDP

Software Requirements

  • PyTorch >= 2.0
  • CUDA >= 11.8
  • [Optional] NCCL for distributed training

Optimization

  • FP8 precision training
  • Fully Sharded Data Parallel (FSDP)
  • Gradient checkpointing
  • Mixed precision training
  • CPU offloading capabilities

Intended Use

This model is designed for:

  • Research in language model capabilities
  • Development of reasoning-enhanced applications
  • Exploration of memory-augmented architectures

Primary Intended Uses

  1. Research and Development

    • Study of neural memory mechanisms
    • Investigation of reasoning capabilities
    • Architecture optimization research
  2. Educational Applications

    • Mathematical problem solving
    • Complex reasoning tasks
    • Knowledge retrieval and application

Out-of-Scope Uses

  • Production deployment (currently in research phase)
  • Safety-critical applications
  • Real-time applications requiring low latency

Evaluation Results

Currently in training and evaluation phase. Initial metrics will be published after completion of training.

Limitations

  1. Current Development Status

    • Training in progress
    • Performance metrics are preliminary
    • Features under active development
  2. Technical Limitations

    • High computational requirements
    • Large memory footprint
    • Complex deployment needs
  3. Capability Limitations

    • Reasoning capabilities still being optimized
    • Memory mechanisms under refinement
    • Limited multilingual support

Bias and Ethics

  • Model is currently in research phase
  • Full bias evaluation pending
  • Not recommended for production use
  • Safety measures being implemented

Environmental Impact

Working to minimize environmental impact through:

  • Efficient training procedures
  • Optimized architecture
  • Resource-aware deployment options

Citation

@software{vishwamai2024,
  author = {Kasinadhsarma},
  title = {VishwamAI: Enhanced Transformer with Advanced Reasoning Capabilities},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/VishwamAI/VishwamAI}
}

Example Usage

from vishwamai.model_utils import load_model

# Load model
model = load_model("vishwamai/model", device="cuda")

# Generate output
input_ids = tokenizer.encode("Solve this problem step by step:", return_tensors="pt")
output = model(input_ids)

Additional Information

Acknowledgments

This project builds upon several research papers and open-source projects. We thank the authors and contributors of:

  • Transformer architectures
  • Mixture of Experts implementations
  • Tree of Thoughts reasoning
  • Neural memory architectures