AI Model Comparison

AI Models

Approved AI models for custom developed GenAI applications in OneAI platform.

General Requirements & Compliance

All models must comply with company data governance policies and security standards
Usage must adhere to vendor-specific terms of service and acceptable use policies
PII and sensitive data handling requires additional approval and encryption
Production deployments require security review and compliance sign-off
Cost monitoring and budget allocation must be configured before use
Models with specific legal requirements (shown in red below) require additional documentation

Vendor / Model	Description	Specifications	When to Use	Pros	Cons
Google Gemini 2.5 Pro (v2.5) via Google Cloud Approved	Google’s most advanced model with massive context window and multimodal capabilities. Excellent for complex reasoning, long document analysis, and multimodal tasks.	Context: 1M tokens Output: 8,192 tokens Cutoff: Jan 2025 Speed: Moderate (3–6s)	Long context analysis Multimodal tasks Complex reasoning Research & analysis	Massive 1M context Excellent multimodal Latest knowledge Strong reasoning Advanced capabilities	Higher cost Moderate speed Newer model Less ecosystem maturity
Google Gemini 2.5 Flash (v2.5) via Google Cloud Approved	Fast and efficient Gemini model with large context window. Optimized for speed while maintaining strong capabilities across text and multimodal tasks.	Context: 1M tokens Output: 8,192 tokens Cutoff: Jan 2025 Speed: Fast (1–3s)	Fast processing Long documents Cost optimization Real-time apps High-volume tasks	Very fast 1M context window Cost-effective Good multimodal Balanced performance	Less capable than Pro Newer model Limited complex reasoning
Amazon Bedrock Anthropic Claude (3.5 Sonnet) via AWS Bedrock Approved	Flagship model balancing intelligence, speed, and cost. Strong at analysis, coding, content creation, and reasoning with nuanced understanding.	Context: 200K tokens Output: 8,192 tokens Cutoff: Apr 2024 Speed: Fast (2–4s)	Complex reasoning Code review Content creation Data analysis Long conversations Safety-critical apps	Excellent reasoning Strong safety 200K context window Nuanced understanding Good code generation Balanced cost	More expensive than smaller models May be cautious at times Limited real-time info Smaller ecosystem No image generation
Amazon Bedrock Anthropic Claude (3.5 Haiku) via AWS Bedrock Approved	Fast and efficient Claude model optimized for speed and cost. Best for high-volume processing and quick response times while maintaining quality.	Context: 200K tokens Output: 8,192 tokens Cutoff: Jul 2024 Speed: Very Fast (<1s)	Simple tasks Quick responses High throughput Cost-sensitive apps Real-time interactions	Very fast Low cost Good for simple tasks 200K context window Reliable performance	Limited complex reasoning Less capable than Sonnet Simpler outputs
Amazon Bedrock Anthropic Claude (3.7 Sonnet) via AWS Bedrock Approved	Balanced Claude Sonnet model for general-purpose tasks. Enhanced version with improved performance over 3.5.	Context: 200K tokens Output: 8,192 tokens Cutoff: Oct 2024 Speed: Fast (2–4s)	General purpose tasks Content creation Code review Analysis work Document processing	Good balance 200K context Cost-effective Improved over 3.5 Reliable	Not the latest version Older knowledge cutoff Less advanced than 4.x series
Amazon Bedrock Anthropic Claude (Sonnet 4) via AWS Bedrock Approved	High-performance Claude Sonnet model with strong reasoning capabilities. Next-generation model with significant improvements.	Context: 200K tokens Output: 8,192 tokens Cutoff: Mar 2025 Speed: Fast (2–4s)	Complex reasoning Code generation Analysis tasks Advanced workflows Enterprise applications	Strong performance 200K context window Latest knowledge cutoff Advanced reasoning Better code quality	Higher cost May be overkill for simple tasks Premium pricing
Amazon Bedrock Anthropic Claude (Sonnet 4.5) via AWS Bedrock Approved	Latest iteration of Claude Sonnet with enhanced reasoning, improved coding capabilities, and better instruction following. Offers superior performance while maintaining the balanced approach of the Sonnet series.	Context: 200K tokens Output: 8,192 tokens Cutoff: Jul 2024 Speed: Fast (2–4s)	Advanced reasoning tasks Complex code generation Agentic workflows Multi-step problem solving Technical documentation Data transformation	Enhanced reasoning over 4.0 Better code quality Improved instruction following Strong agentic capabilities 200K context window Maintains safety standards Best-in-class for complex tasks	Higher cost than earlier versions May be overkill for simple tasks Similar speed to 4.0 Limited ecosystem vs OpenAI No image generation
! Limitation #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Amazon Bedrock Anthropic Claude (Haiku 4.5) via AWS Bedrock Approved	Fast and efficient Claude model optimized for speed and cost. Latest Haiku version with improved capabilities.	Context: 200K tokens Output: 8,192 tokens Cutoff: Feb 2025 Speed: Very Fast (<1s)	High-volume processing Quick responses Cost-sensitive apps Real-time interactions Simple automated tasks	Extremely fast Cost-effective 200K context window Latest version Good for high throughput	Less capable than Sonnet Limited complex reasoning Simpler outputs
Amazon Bedrock Cohere Command Command R/R+ via AWS Bedrock Approved	Enterprise-focused models optimized for retrieval and business applications. Strong RAG capabilities and multilingual support.	Context: 128K tokens Output: 4,096 tokens Cutoff: Early 2024 Speed: Fast (2–4s)	RAG applications Business tasks Information retrieval Multilingual processing Enterprise search	Excellent for RAG Enterprise features Strong retrieval Multilingual support Cost-effective	Limited compared to GPT-4 Niche use cases Smaller context window
Amazon Bedrock Cohere Embed (3) via AWS Bedrock Approved	Semantic search and embedding model for document processing and similarity matching. High-quality vector representations.	Context: N/A Output: Embeddings Cutoff: Not applicable Speed: Very Fast (<1s)	Search applications Similarity matching Document clustering Recommendation systems Semantic search	High-quality embeddings Fast processing Multilingual support Cost-effective Specialized purpose	Single purpose only Not a generative model Requires integration
Amazon Bedrock Cohere Embed (4) via AWS Bedrock Approved	Enhanced embedding model with improved accuracy and better multilingual capabilities. Latest version with optimized performance.	Context: N/A Output: Embeddings Cutoff: Not applicable Speed: Very Fast (<1s)	Advanced search Cross-lingual tasks Large-scale retrieval Semantic analysis Production embeddings	Enhanced embeddings Improved accuracy Better multilingual Optimized performance Latest version	Single purpose only Higher cost than v3 Not a generative model
Amazon Bedrock Cohere Rerank (3.5) via AWS Bedrock Approved	Specialized model for reranking search results and relevance scoring. Optimized for improving search quality.	Context: N/A Output: Rankings Cutoff: Not applicable Speed: Very Fast (<1s)	Search reranking Relevance scoring Information retrieval Search optimization Result refinement	Excellent reranking Very fast Specialized purpose Improves search quality Easy integration	Single purpose only Not a general LLM Requires search system
Amazon Bedrock Llama (2) via AWS Bedrock Approved	Open source model with versatile capabilities. Community-driven with good performance for general tasks.	Context: 4K tokens Output: 2,048 tokens Cutoff: Sep 2022 Speed: Fast (2–4s)	General tasks Research Development Experimentation	Open source Versatile Community support Self-hostable	Older knowledge Limited context Outdated information Less capable than newer models
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Amazon Bedrock Llama (3) via AWS Bedrock Approved	Improved version of Llama with better reasoning and enhanced performance. Stronger capabilities than Llama 2.	Context: 8K tokens Output: 4,096 tokens Cutoff: Mar 2023 Speed: Fast (2–4s)	Advanced tasks Research projects Development Testing	Improved capabilities Better reasoning Enhanced performance Open source	Still older knowledge Smaller context than competitors Limited vs GPT-4
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Amazon Bedrock Mistral (7B) via AWS Bedrock Approved	Efficient open-source model with 7 billion parameters. Good balance of performance and resource usage.	Context: 32K tokens Output: 4,096 tokens Cutoff: Early 2023 Speed: Very Fast (<2s)	Cost-sensitive apps Self-hosting Simple tasks High-volume processing	Open source Very cost-effective Fast processing Efficient Good for simple tasks	Limited capabilities Older knowledge Smaller context than newer models Less sophisticated
Amazon Bedrock Mistral (8x7B) via AWS Bedrock Approved	Mixture of Experts model with efficient 8x7B parameter architecture. Better performance than standard 7B.	Context: 32K tokens Output: 4,096 tokens Cutoff: Early 2024 Speed: Very Fast (<2s)	Efficient processing Balanced performance Cost optimization Production workloads	Efficient MoE architecture Good performance/cost Fast processing Better than 7B Open source	Limited vs larger models Smaller context Less capable than premium models
GCP Mistral OCR (25.05) via GCP Pending	Specialized optical character recognition model for document processing and text extraction.	Context: N/A Output: Text extraction Cutoff: Not applicable Speed: Fast (1–3s)	Document scanning Text extraction Image to text Form processing	Specialized OCR Good accuracy Fast processing Multi-format support	Single purpose Not a general LLM Limited to OCR tasks
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Amazon Bedrock OpenAI GPT OSS (20B) via AWS Bedrock Approved	Mid-size open source model with 20 billion parameters. Good balance for self-hosting scenarios.	Context: 64K tokens Output: 4,096 tokens Cutoff: Jun 2024 Speed: Fast (2–4s)	Medium-scale tasks Self-hosting Balanced capability Private deployment	Open source Good balance Self-hostable Privacy control Cost-effective	Requires infrastructure Limited vs GPT-4 Smaller context
Amazon Bedrock OpenAI GPT OSS (120B) via AWS Bedrock Approved	Large open source model with 120 billion parameters. High capability for complex tasks.	Context: 64K tokens Output: 4,096 tokens Cutoff: Jun 2024 Speed: Slower (5–10s)	Complex tasks Self-hosting High capability needs Advanced processing	Open source Large capacity Strong performance Self-hostable Privacy control	High infrastructure needs Slower processing Resource intensive Complex deployment
Amazon Bedrock DeepSeek (R1) via AWS Bedrock Approved	Advanced reasoning model optimized for complex analytical tasks, mathematical problems, and logical reasoning.	Context: 64K tokens Output: 4,096 tokens Cutoff: Mid 2024 Speed: Moderate (3–5s)	Complex reasoning Mathematical problems Analytical tasks Code generation Technical analysis	Strong reasoning Good analytical performance Cost-effective Math capabilities Code generation	Limited context window Smaller ecosystem Less versatile
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
GCP Imagen (4) via AWS Bedrock Approved	Advanced image generation model with high-quality output and customizable options. Fast rendering capabilities.	Context: N/A Output: Images Cutoff: Not applicable Speed: Fast (3–5s)	Image generation Creative visuals Fast rendering Marketing materials Design prototypes	High quality images Fast generation Customizable Good control Versatile styles	Image only Not a text model Content policy restrictions Usage limits
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Black Forest Labs Flux (.1 schnell) via GCP Requested	Advanced image generation model with high fidelity and text-to-image capabilities. Professional-grade output quality.	Context: N/A Output: Images Cutoff: Not applicable Speed: Moderate (4–7s)	Professional graphics Marketing Design Content creation High-quality visuals	High fidelity output Text-to-image Style control Professional quality Google integration	Image only Higher cost Moderate speed Requires GCP
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Krea Flux (.1 krea) via Direct API Requested	Fast image generation model optimized for speed. Quick iterations and rapid prototyping.	Context: N/A Output: Images Cutoff: Not applicable Speed: Fast (3–5s)	Rapid prototyping Quick visuals Iterative design Time-sensitive projects Fast rendering	Fast generation High quality images Good control Quick iterations Efficient	Image only Not a text model Limited availability
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
GCP Veo (3, 3 Fast) via GCP Requested	Krea’s variant of Flux optimized for creative image generation with artistic style control.	Context: N/A Output: Images Cutoff: Not applicable Speed: Moderate (4–7s)	Creative projects Artistic content Design exploration Visual research Style experiments	Creative control High quality Artistic style Flexible output Unique aesthetics	Image only Slower than Schnell Niche provider
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Azure GPT (5) via GCP Requested	Google’s advanced video generation model with multiple format support and quality output options.	Context: N/A Output: Video Cutoff: Not applicable Speed: Slow (30–60s)	Video generation Animation Creative video content Marketing videos Presentations	Video generation High quality output Multi-format Flexible duration Advanced AI	Video only Slow processing High cost Resource intensive
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Qwen3-coder (480B-A35B-Instruct) via Self-hosted / Ollama Approved	Code-focused large model with extensive context. Self-hosted for privacy-preserving code generation.	Context: 128K tokens Output: 8,192 tokens Cutoff: Mid 2024 Speed: Moderate (4–8s)	Code generation Development Private projects Internal tools Secure environments	Code-focused Large context Self-hosted Privacy-preserving No data sharing	Requires infrastructure Resource intensive Complex deployment Maintenance overhead
Self-hosted Qwen3-coder (30-A3B-Instruct) via Self-hosted / Ollama Approved	Efficient coding model with fast inference. Self-hosted for internal development use.	Context: 64K tokens Output: 4,096 tokens Cutoff: Mid 2024 Speed: Fast (2–4s)	Code assistance Development tasks Private deployment Internal use Quick coding help	Efficient Fast inference Self-hosted Resource-efficient Privacy control	Requires infrastructure Smaller than 480B Less capable for complex tasks
Self-hosted Codellama (7b, 13b, 34b, 70b) via Self-hosted / Ollama Pending determination	Code generation models in multiple sizes. Open source with flexible deployment options.	Context: 16K tokens Output: 4,096 tokens Cutoff: Early 2023 Speed: Fast (2–5s)	Development Code completion Learning Prototyping Experimentation	Multiple sizes Open source Flexible deployment Good code generation Community support	Older model Limited context Usage unclear Pending determination
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Codegemma (2b, 7b) via Self-hosted / Ollama Approved	Lightweight coding models for fast performance. Easy deployment for code assistance.	Context: 8K tokens Output: 2,048 tokens Cutoff: Early 2024 Speed: Very Fast (<2s)	Code assistance Development Learning Quick tasks Fast completion	Lightweight Fast performance Open source Easy deployment Resource efficient	Small context Limited capabilities Basic features Less sophisticated
Self-hosted Codestral (22b) via Self-hosted / Ollama Denied	Advanced coding model with high quality output. Requires commercial license for use.	Context: 32K tokens Output: 4,096 tokens Cutoff: Mid 2024 Speed: Moderate (3–6s)	Professional development Enterprise projects Advanced coding Production use	Advanced coding High quality Specialized Professional features	Requires commercial license Currently denied Cost barrier License complexity
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted DeepSeek-coder (v2) via Self-hosted / Ollama Approved	Strong code generation model with reasoning capabilities. Self-hosted for secure environments.	Context: 64K tokens Output: 4,096 tokens Cutoff: Mid 2024 Speed: Moderate (3–5s)	Code development Problem-solving Internal projects Secure environments Private deployment	Strong code generation Reasoning Self-hosted Privacy-focused Good performance	Requires infrastructure Moderate speed Self-management needed
Self-hosted Granite-code via Self-hosted / Ollama Requested	IBM enterprise-focused code generation model. Designed for regulated environments and corporate projects.	Context: 32K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Enterprise development Corporate projects Regulated environments Internal tools	Enterprise-focused Code generation Self-hostable IBM support	Not yet approved Limited information Pending review
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Llama4Scout via Self-hosted / Ollama Requested	Exploration variant of Llama 4. Versatile open source model for research and experimentation.	Context: 32K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Research Experimentation Development Testing	Exploration model Versatile Open source Self-hosted	Not yet approved Unknown capabilities Limited documentation
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Llama4Maverick via Self-hosted / Ollama Requested	Advanced variant of Llama 4 with enhanced features. Open source with flexible deployment.	Context: 32K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Advanced projects Research Development Innovation	Advanced variant Enhanced features Open source Flexible	Not yet approved Unknown capabilities Pending release
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Llama3-gradient via Self-hosted / Ollama Requested	Specialized Llama 3 variant optimized with gradient techniques. Research-focused model.	Context: 16K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Research applications Specialized tasks Development Experimentation	Specialized variant Gradient-optimized Self-hosted Research-focused	Not yet approved Niche use case Limited info
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted Kimi-K2 via Self-hosted / Ollama Requested	Long context multilingual model. Self-hosted for document analysis and multilingual tasks.	Context: 200K tokens Output: 8,192 tokens Cutoff: TBD Speed: TBD	Document analysis Multilingual tasks Research Long-form content	Long context Multilingual Self-hosted Research model	Not yet approved Limited availability Unknown performance
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted GPT-OSS (20B) via Self-hosted / Ollama Requested	Open source GPT model for self-hosting. Community-driven with flexible deployment.	Context: 64K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	General tasks Research Development Private deployment	Open source Self-hosted Community-driven Flexible	Not yet approved Requires infrastructure Unknown capabilities
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted GPT-OSS (120B) via Self-hosted / Ollama Requested	Large open source GPT model with advanced capabilities. Self-hosted for high-performance tasks.	Context: 64K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Complex tasks Research Enterprise projects Advanced applications	Large model Advanced capabilities Self-hosted High performance	Not yet approved High resource needs Complex deployment
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Azure GPT-OSS-Safeguard (120B) via Azure Requested	Safety-enhanced large GPT model with moderation capabilities. Enterprise-ready for regulated environments.	Context: 128K tokens Output: 8,192 tokens Cutoff: TBD Speed: TBD	Regulated environments Safety-critical apps Compliance-focused Enterprise use	Safety-enhanced Moderation Secure Enterprise-ready Compliance features	Not yet approved Unknown cost Pending availability
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Azure GPT-OSS-Safeguard (20B) via Azure Requested	Efficient safeguard model with moderation. Cost-effective with security features.	Context: 64K tokens Output: 4,096 tokens Cutoff: TBD Speed: TBD	Moderated applications Safety requirements Compliance needs Controlled environments	Efficient Moderation Cost-effective Secure	Not yet approved Smaller than 120B Limited info
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Self-hosted BGE-BASE-EN (1.5) via Self-hosted Approved	English-focused embedding model. Self-hosted for semantic search and document retrieval.	Context: N/A Output: Embeddings Cutoff: Not applicable Speed: Very Fast (<1s)	Semantic search Document retrieval Similarity matching Internal applications	English-focused Self-hosted Efficient Fast processing Privacy control	Single language Single purpose Not generative Requires infrastructure
Self-hosted clip-ViT-B-32 (32) via Self-hosted Denied	Vision-language model for image-text matching and cross-modal retrieval.	Context: N/A Output: Embeddings Cutoff: Not applicable Speed: Fast (1-3s)	Image search Cross-modal retrieval Visual applications Content matching	Vision-language Image-text matching Multimodal Versatile	License not specified Currently denied Legal uncertainty Deployment blocked
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
Amazon Bedrock Anthropic Claude (Opus 4.5) via AWS Bedrock Approved	Premium performance Claude model with maximum capability. Advanced reasoning and enterprise features for mission-critical tasks.	Context: 200K tokens Output: 16,384 tokens Cutoff: Aug 2024 Speed: Moderate (4-8s)	Critical applications Strategic projects High-value tasks Premium use cases Complex analysis	Premium performance Maximum capability Advanced reasoning Enterprise features Largest output Best quality	Highest cost Slower than other models May be overkill Premium pricing
! LIMITATION #1 Must include attribution in commercial applications. Data retention policy applies. Enhanced compliance monitoring required.
GCP Gemini (3) via GCP Approved	Next-generation Gemini model with advanced multimodal capabilities and enhanced reasoning.	Context: 1M tokens Output: 8,192 tokens Cutoff: Feb 2025 Speed: Moderate (3-6s)	Advanced applications Multimodal projects Research Innovation Long context tasks	Next-gen model Advanced multimodal Enhanced reasoning Improved performance 1M context	Higher cost Moderate speed Very new model Limited track record
GCP NanoBanana (1) via GCP Requested	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated
Self-hosted HunyuanOCR — via Self-hosted Requested	Enhanced version of NanoBanana with better performance while maintaining efficiency.	Context: 8K tokens Output: 2,048 tokens Cutoff: TBD Speed: Very Fast (<1s)	Production workloads Scalable solutions Efficient operations Balanced tasks	Enhanced version Better performance Efficient Balanced Cost-effective	Still limited vs full models Smaller context Basic capabilities
Self-hosted SigLIP-base (16-384) via Self-hosted Approved	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated
Self-hosted all-MinniLM-L6-v2 (2) via Self-hosted Approved	Enhanced version of NanoBanana with better performance while maintaining efficiency.	Context: 8K tokens Output: 2,048 tokens Cutoff: TBD Speed: Very Fast (<1s)	Production workloads Scalable solutions Efficient operations Balanced tasks	Enhanced version Better performance Efficient Balanced Cost-effective	Still limited vs full models Smaller context Basic capabilities
Self-hosted Openai/whisper-base (20250625) via Self-hosted Approved	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated
Self-hosted Salesforce/blip-image (16-Dec-25) via Self-hosted Approved	Enhanced version of NanoBanana with better performance while maintaining efficiency.	Context: 8K tokens Output: 2,048 tokens Cutoff: TBD Speed: Very Fast (<1s)	Production workloads Scalable solutions Efficient operations Balanced tasks	Enhanced version Better performance Efficient Balanced Cost-effective	Still limited vs full models Smaller context Basic capabilities
Amazon Bedrock Nova 2 Lite via AWS Bedrock Approved	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated
Amazon Bedrock Nova 2 Pro via AWS Bedrock Approved	Enhanced version of NanoBanana with better performance while maintaining efficiency.	Context: 8K tokens Output: 2,048 tokens Cutoff: TBD Speed: Very Fast (<1s)	Production workloads Scalable solutions Efficient operations Balanced tasks	Enhanced version Better performance Efficient Balanced Cost-effective	Still limited vs full models Smaller context Basic capabilities
Amazon Bedrock Nova 2 Omni via AWS Bedrock Approved	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated
Amazon Bedrock Nova 2 Sonic via AWS Bedrock Approved	Enhanced version of NanoBanana with better performance while maintaining efficiency.	Context: 8K tokens Output: 2,048 tokens Cutoff: TBD Speed: Very Fast (<1s)	Production workloads Scalable solutions Efficient operations Balanced tasks	Enhanced version Better performance Efficient Balanced Cost-effective	Still limited vs full models Smaller context Basic capabilities
Amazon Bedrock Nova Multimodal Embeddings via AWS Bedrock Approved	Lightweight efficient model for edge deployment and resource-limited environments.	Context: 4K tokens Output: 1,024 tokens Cutoff: TBD Speed: Very Fast (<1s)	Edge deployment Resource-limited Quick tasks Efficient processing	Lightweight Efficient Fast Resource-friendly Low cost	Limited capabilities Small context Basic features Less sophisticated