| Vendor / Model | Description | Specifications | When to Use | Pros | Cons |
|---|---|---|---|---|---|
|
Google
Gemini 2.5 Pro
(v2.5)
via Google Cloud
Approved |
Google’s most advanced model with massive context window and multimodal capabilities. Excellent for complex reasoning, long document analysis, and multimodal tasks. |
Context: 1M tokens
Output: 8,192 tokens
Cutoff: Jan 2025
Speed: Moderate (3–6s)
|
|
|
|
|
Google
Gemini 2.5 Flash
(v2.5)
via Google Cloud
Approved |
Fast and efficient Gemini model with large context window. Optimized for speed while maintaining strong capabilities across text and multimodal tasks. |
Context: 1M tokens
Output: 8,192 tokens
Cutoff: Jan 2025
Speed: Fast (1–3s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(3.5 Sonnet)
via AWS Bedrock
Approved |
Flagship model balancing intelligence, speed, and cost. Strong at analysis, coding, content creation, and reasoning with nuanced understanding. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Apr 2024
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(3.5 Haiku)
via AWS Bedrock
Approved |
Fast and efficient Claude model optimized for speed and cost. Best for high-volume processing and quick response times while maintaining quality. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Jul 2024
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(3.7 Sonnet)
via AWS Bedrock
Approved |
Balanced Claude Sonnet model for general-purpose tasks. Enhanced version with improved performance over 3.5. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Oct 2024
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(Sonnet 4)
via AWS Bedrock
Approved |
High-performance Claude Sonnet model with strong reasoning capabilities. Next-generation model with significant improvements. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Mar 2025
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(Sonnet 4.5)
via AWS Bedrock
Approved |
Latest iteration of Claude Sonnet with enhanced reasoning, improved coding capabilities, and better instruction following. Offers superior performance while maintaining the balanced approach of the Sonnet series. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Jul 2024
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(Haiku 4.5)
via AWS Bedrock
Approved |
Fast and efficient Claude model optimized for speed and cost. Latest Haiku version with improved capabilities. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: Feb 2025
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Cohere Command
Command R/R+
via AWS Bedrock
Approved |
Enterprise-focused models optimized for retrieval and business applications. Strong RAG capabilities and multilingual support. |
Context: 128K tokens
Output: 4,096 tokens
Cutoff: Early 2024
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Cohere Embed
(3)
via AWS Bedrock
Approved |
Semantic search and embedding model for document processing and similarity matching. High-quality vector representations. |
Context: N/A
Output: Embeddings
Cutoff: Not applicable
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Cohere Embed
(4)
via AWS Bedrock
Approved |
Enhanced embedding model with improved accuracy and better multilingual capabilities. Latest version with optimized performance. |
Context: N/A
Output: Embeddings
Cutoff: Not applicable
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Cohere Rerank
(3.5)
via AWS Bedrock
Approved |
Specialized model for reranking search results and relevance scoring. Optimized for improving search quality. |
Context: N/A
Output: Rankings
Cutoff: Not applicable
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Llama
(2)
via AWS Bedrock
Approved |
Open source model with versatile capabilities. Community-driven with good performance for general tasks. |
Context: 4K tokens
Output: 2,048 tokens
Cutoff: Sep 2022
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Llama
(3)
via AWS Bedrock
Approved |
Improved version of Llama with better reasoning and enhanced performance. Stronger capabilities than Llama 2. |
Context: 8K tokens
Output: 4,096 tokens
Cutoff: Mar 2023
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
Mistral
(7B)
via AWS Bedrock
Approved |
Efficient open-source model with 7 billion parameters. Good balance of performance and resource usage. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: Early 2023
Speed: Very Fast (<2s)
|
|
|
|
|
Amazon Bedrock
Mistral
(8x7B)
via AWS Bedrock
Approved |
Mixture of Experts model with efficient 8x7B parameter architecture. Better performance than standard 7B. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: Early 2024
Speed: Very Fast (<2s)
|
|
|
|
|
GCP
Mistral OCR
(25.05)
via GCP
Pending |
Specialized optical character recognition model for document processing and text extraction. |
Context: N/A
Output: Text extraction
Cutoff: Not applicable
Speed: Fast (1–3s)
|
|
|
|
|
Amazon Bedrock
OpenAI GPT OSS
(20B)
via AWS Bedrock
Approved |
Mid-size open source model with 20 billion parameters. Good balance for self-hosting scenarios. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: Jun 2024
Speed: Fast (2–4s)
|
|
|
|
|
Amazon Bedrock
OpenAI GPT OSS
(120B)
via AWS Bedrock
Approved |
Large open source model with 120 billion parameters. High capability for complex tasks. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: Jun 2024
Speed: Slower (5–10s)
|
|
|
|
|
Amazon Bedrock
DeepSeek
(R1)
via AWS Bedrock
Approved |
Advanced reasoning model optimized for complex analytical tasks, mathematical problems, and logical reasoning. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: Mid 2024
Speed: Moderate (3–5s)
|
|
|
|
|
GCP
Imagen
(4)
via AWS Bedrock
Approved |
Advanced image generation model with high-quality output and customizable options. Fast rendering capabilities. |
Context: N/A
Output: Images
Cutoff: Not applicable
Speed: Fast (3–5s)
|
|
|
|
|
Black Forest Labs
Flux
(.1 schnell)
via GCP
Requested |
Advanced image generation model with high fidelity and text-to-image capabilities. Professional-grade output quality. |
Context: N/A
Output: Images
Cutoff: Not applicable
Speed: Moderate (4–7s)
|
|
|
|
|
Krea
Flux
(.1 krea)
via Direct API
Requested |
Fast image generation model optimized for speed. Quick iterations and rapid prototyping. |
Context: N/A
Output: Images
Cutoff: Not applicable
Speed: Fast (3–5s)
|
|
|
|
|
GCP
Veo
(3, 3 Fast)
via GCP
Requested |
Krea’s variant of Flux optimized for creative image generation with artistic style control. |
Context: N/A
Output: Images
Cutoff: Not applicable
Speed: Moderate (4–7s)
|
|
|
|
|
Azure
GPT
(5)
via GCP
Requested |
Google’s advanced video generation model with multiple format support and quality output options. |
Context: N/A
Output: Video
Cutoff: Not applicable
Speed: Slow (30–60s)
|
|
|
|
|
Self-hosted
Qwen3-coder
(480B-A35B-Instruct)
via Self-hosted / Ollama
Approved |
Code-focused large model with extensive context. Self-hosted for privacy-preserving code generation. |
Context: 128K tokens
Output: 8,192 tokens
Cutoff: Mid 2024
Speed: Moderate (4–8s)
|
|
|
|
|
Self-hosted
Qwen3-coder
(30-A3B-Instruct)
via Self-hosted / Ollama
Approved |
Efficient coding model with fast inference. Self-hosted for internal development use. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: Mid 2024
Speed: Fast (2–4s)
|
|
|
|
|
Self-hosted
Codellama
(7b, 13b, 34b, 70b)
via Self-hosted / Ollama
Pending determination |
Code generation models in multiple sizes. Open source with flexible deployment options. |
Context: 16K tokens
Output: 4,096 tokens
Cutoff: Early 2023
Speed: Fast (2–5s)
|
|
|
|
|
Self-hosted
Codegemma
(2b, 7b)
via Self-hosted / Ollama
Approved |
Lightweight coding models for fast performance. Easy deployment for code assistance. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: Early 2024
Speed: Very Fast (<2s)
|
|
|
|
|
Self-hosted
Codestral
(22b)
via Self-hosted / Ollama
Denied |
Advanced coding model with high quality output. Requires commercial license for use. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: Mid 2024
Speed: Moderate (3–6s)
|
|
|
|
|
Self-hosted
DeepSeek-coder
(v2)
via Self-hosted / Ollama
Approved |
Strong code generation model with reasoning capabilities. Self-hosted for secure environments. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: Mid 2024
Speed: Moderate (3–5s)
|
|
|
|
|
Self-hosted
Granite-code
via Self-hosted / Ollama
Requested |
IBM enterprise-focused code generation model. Designed for regulated environments and corporate projects. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
Llama4Scout
via Self-hosted / Ollama
Requested |
Exploration variant of Llama 4. Versatile open source model for research and experimentation. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
Llama4Maverick
via Self-hosted / Ollama
Requested |
Advanced variant of Llama 4 with enhanced features. Open source with flexible deployment. |
Context: 32K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
Llama3-gradient
via Self-hosted / Ollama
Requested |
Specialized Llama 3 variant optimized with gradient techniques. Research-focused model. |
Context: 16K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
Kimi-K2
via Self-hosted / Ollama
Requested |
Long context multilingual model. Self-hosted for document analysis and multilingual tasks. |
Context: 200K tokens
Output: 8,192 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
GPT-OSS
(20B)
via Self-hosted / Ollama
Requested |
Open source GPT model for self-hosting. Community-driven with flexible deployment. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
GPT-OSS
(120B)
via Self-hosted / Ollama
Requested |
Large open source GPT model with advanced capabilities. Self-hosted for high-performance tasks. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Azure
GPT-OSS-Safeguard
(120B)
via Azure
Requested |
Safety-enhanced large GPT model with moderation capabilities. Enterprise-ready for regulated environments. |
Context: 128K tokens
Output: 8,192 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Azure
GPT-OSS-Safeguard
(20B)
via Azure
Requested |
Efficient safeguard model with moderation. Cost-effective with security features. |
Context: 64K tokens
Output: 4,096 tokens
Cutoff: TBD
Speed: TBD
|
|
|
|
|
Self-hosted
BGE-BASE-EN
(1.5)
via Self-hosted
Approved |
English-focused embedding model. Self-hosted for semantic search and document retrieval. |
Context: N/A
Output: Embeddings
Cutoff: Not applicable
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
clip-ViT-B-32
(32)
via Self-hosted
Denied |
Vision-language model for image-text matching and cross-modal retrieval. |
Context: N/A
Output: Embeddings
Cutoff: Not applicable
Speed: Fast (1-3s)
|
|
|
|
|
Amazon Bedrock
Anthropic Claude
(Opus 4.5)
via AWS Bedrock
Approved |
Premium performance Claude model with maximum capability. Advanced reasoning and enterprise features for mission-critical tasks. |
Context: 200K tokens
Output: 16,384 tokens
Cutoff: Aug 2024
Speed: Moderate (4-8s)
|
|
|
|
|
GCP
Gemini
(3)
via GCP
Approved |
Next-generation Gemini model with advanced multimodal capabilities and enhanced reasoning. |
Context: 1M tokens
Output: 8,192 tokens
Cutoff: Feb 2025
Speed: Moderate (3-6s)
|
|
|
|
|
GCP
NanoBanana
(1)
via GCP
Requested |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
HunyuanOCR
—
via Self-hosted
Requested |
Enhanced version of NanoBanana with better performance while maintaining efficiency. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
SigLIP-base
(16-384)
via Self-hosted
Approved |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
all-MinniLM-L6-v2
(2)
via Self-hosted
Approved |
Enhanced version of NanoBanana with better performance while maintaining efficiency. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
Openai/whisper-base
(20250625)
via Self-hosted
Approved |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Self-hosted
Salesforce/blip-image
(16-Dec-25)
via Self-hosted
Approved |
Enhanced version of NanoBanana with better performance while maintaining efficiency. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Nova 2 Lite
via AWS Bedrock
Approved |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Nova 2 Pro
via AWS Bedrock
Approved |
Enhanced version of NanoBanana with better performance while maintaining efficiency. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Nova 2 Omni
via AWS Bedrock
Approved |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Nova 2 Sonic
via AWS Bedrock
Approved |
Enhanced version of NanoBanana with better performance while maintaining efficiency. |
Context: 8K tokens
Output: 2,048 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
|
Amazon Bedrock
Nova Multimodal Embeddings
via AWS Bedrock
Approved |
Lightweight efficient model for edge deployment and resource-limited environments. |
Context: 4K tokens
Output: 1,024 tokens
Cutoff: TBD
Speed: Very Fast (<1s)
|
|
|
|
AI Models
Approved AI models for custom developed GenAI applications in OneAI platform.
General Requirements & Compliance
- All models must comply with company data governance policies and security standards
- Usage must adhere to vendor-specific terms of service and acceptable use policies
- PII and sensitive data handling requires additional approval and encryption
- Production deployments require security review and compliance sign-off
- Cost monitoring and budget allocation must be configured before use
- Models with specific legal requirements (shown in red below) require additional documentation