metadata
language:
- code
tags:
- code-generation
- ai-assistant
- code-completion
- python
- machine-learning
- transformer
- gpt
license: mit
datasets:
- github-code
- stackoverflow
- synthetic-code
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: Jaleah AI Code Generator
results:
- task:
type: text-generation
name: Code Generation
dataset:
name: Multi-Source Python Code Corpus
type: mixed
metrics:
- type: code-generation
name: Code Generation Score
value: experimental
- type: syntax-correctness
name: Syntax Correctness Rate
value: high
- type: contextual-relevance
name: Contextual Relevance
value: moderate
parameters:
max_length:
default: 200
range:
- 50
- 500
temperature:
default: 0.7
range:
- 0.1
- 1
top_k:
default: 50
range:
- 1
- 100
top_p:
default: 0.95
range:
- 0.1
- 1
model_type: causal
architectures:
- GPTNeoForCausalLM
training_config:
base_model: microsoft/CodeGPT-small-py
training_objective: causal-language-modeling
compute_environment:
- gpu
- cloud
training_time: ~3 hours
hardware:
- cuda
- t4-gpu
Jaleah AI Code Generation Model
Model Description
Jaleah AI is a fine-tuned version of the Microsoft CodeGPT small Python model, specialized in generating high-quality Python code snippets across various domains.
Model Details
- Developed by: TeckMill AI Research Team
- Base Model: microsoft/CodeGPT-small-py
- Language: Python
- Version: 1.0
Jaleah AI Code Generation Model
Model Description
Jaleah AI is a fine-tuned version of the Microsoft CodeGPT small Python model, specialized in generating high-quality Python code snippets across various domains.
Model Details
- Developed by: TeckMill AI Research Team
- Base Model: microsoft/CodeGPT-small-py
- Language: Python
- Version: 1.0
Jaleah AI Code Generation Model
Model Description
Jaleah AI is a fine-tuned version of the Microsoft CodeGPT small Python model, specialized in generating high-quality Python code snippets across various domains.
Model Details
- Developed by: TeckMill AI Research Team
- Base Model: microsoft/CodeGPT-small-py
- Language: Python
- Version: 1.0
Intended Uses & Limitations
Intended Uses
- Code snippet generation
- Assisting developers with Python programming
- Providing intelligent code suggestions
- Rapid prototyping of Python functions and classes
Limitations
- May generate syntactically incorrect code
- Requires human review and validation
- Performance may vary across different coding domains
- Not suitable for complete project generation
Training Data
Data Sources
The model was trained on a diverse dataset including:
- GitHub trending repositories
- Stack Overflow top-rated code answers
- Open-source Python project codebases
- Synthetic code generation
- Complex algorithmic implementations
Data Preprocessing
- Syntax validation
- Comment and docstring removal
- Length and complexity filtering
Training Procedure
Training Hyperparameters
- Learning Rate: 5e-05
- Batch Size: 4
- Epochs: 12
- Optimizer: AdamW
- Learning Rate Scheduler: Linear
- Weight Decay: 0.01
Training Process
- Fine-tuning of pre-trained CodeGPT model
- Multi-source code collection
- Advanced synthetic code generation
- Rigorous code validation
Evaluation
Detailed evaluation metrics to be added in future versions.
Ethical Considerations
- Designed to assist, not replace, human developers
- Encourages learning and code understanding
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("teckmill/jaleah-ai-model")
tokenizer = AutoTokenizer.from_pretrained("teckmill/jaleah-ai-model")
def generate_code(prompt, max_length=200):
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_length=max_length, num_return_sequences=1)
return tokenizer.decode(output[0], skip_special_tokens=True)