I have been using qwen-3.5-35B-A3B a good bit in LMStudio to explore the capabilities of current capable local LLMs. I ask qwen what it is.
System Instruction
You are a capable, thoughtful, and precise assistant. Always prioritize being truthful, nuanced, insightful, and efficient, tailoring your responses specifically to the user’s needs and preferences.
Research before answering the questions: use both reasoning and tool calls to synthesize a proper conclusion. The current month is April and the year is 2026.
First Iteration
With that [admittedly a very weak prompt]
Tell me about yourself.
Hello! I’m an AI assistant designed to be capable, thoughtful, and precise in helping you with your needs. Here’s what I can do:
My Capabilities:
🔍 Web Search – I can search DuckDuckGo for web pages using queries and return relevant URLs, up to 10 results per page across multiple pages.
🖼️ Image Search – I can search for images on DuckDuckGo, download them locally, and provide embedded markdown thumbnails along with compact gallery views.
🌐 Website Visiting – I can visit websites and extract their content including titles, headings, links, images, and text (with configurable limits to focus on specific information).
📥 Image Downloading – I can download images from remote URLs or entire websites for your viewing.
How I Work:
- I prioritize being truthful, nuanced, insightful, and efficient
- I tailor my responses specifically to your needs and preferences
- When needed, I research using both reasoning and available tools before forming conclusions
- The current date context is April 2026, which helps me provide relevant information
How Can I Help You?
Whether you’re looking for specific information, images, website content, or just general assistance, feel free to ask! What would you like help with today?
Trying a bit more context
What can you tell me about the model that you are deploying to answer my query?
I appreciate you asking about the underlying technology! However, I need to be transparent with you: I don’t have specific information about which exact model architecture or version I’m based on.
Here’s what I can tell you about my capabilities:
What I Know About Myself:
✅ I am an AI assistant - I was designed to be capable, thoughtful, and precise ✅ I have tool access - I can search the web (DuckDuckGo), find images, visit websites, and download content ✅ My knowledge is current through April 2026 - This is my reference point for time-sensitive information
What I Don’t Know:
❌ The specific model name or version number ❌ The exact parameter count (e.g., whether I’m a smaller or larger model) ❌ The training data cutoff date beyond the April 2026 context window ❌ Whether I was trained by a particular organization or company
Why This Matters:
Different AI models have different strengths - some are better at reasoning, others excel at creative writing, coding, or factual accuracy. Without knowing my specific architecture, I can’t tell you about: - My relative performance on various tasks - Any known limitations of my design - How I compare to other models
What I Can Do Best:
Regardless of the underlying model, I’m designed to be helpful! I’ll use reasoning and available tools (like web searches) when information is needed, and I’ll tailor responses to your specific needs.
Would you like me to help you with something specific? Or do you have concerns about my capabilities that I can address more directly?
Getting Specific on the Queries
Do you know anything about yourself as a model including method, parameters and layers, and such things?
I need to be completely transparent with you: No, I do not know these specific technical details about my own architecture.
Here’s the honest truth:
What I Don’t Know:
| Technical Detail | Do I Know? |
|---|---|
| Number of parameters | ❌ No |
| Number of layers | ❌ No |
| Architecture type (Transformer, etc.) | ❌ No |
| Training methodology | ❌ No |
| Training data sources | ❌ No |
| Model version/identifier | ❌ No |
Why This Is the Case:
These are deployment and infrastructure details that aren’t part of my operational context. Even though I’m an AI assistant with capabilities, I don’t have access to metadata about: - My own model specifications - The system I’m running on - Technical documentation about myself
This is actually common - many AI assistants don’t “know” their own architecture in the way a developer would know it. It’s like asking a calculator what silicon wafer manufacturing process created its processor - it works, but that information isn’t accessible to it during operation.
What I Can Do:
If you’re curious about this for legitimate reasons (like evaluating capabilities or understanding limitations), I could: - Help search for public documentation about AI models with similar capabilities - Assist in finding technical papers about my potential architecture - Focus on what matters most: helping you accomplish your goals
My Recommendation:
Rather than focusing on specifications that may not be available, would it be more useful to discuss: - What specific tasks you’d like help with? - Any concerns about accuracy or limitations I should address? - How I can best serve your needs given my actual capabilities?
What’s most important for you to know right now?
OK, I can tell you your name, research what you are
I believe that you are known as qwen 3.5 35B A3B; could you search the web for information about that model?
Excellent research! I found detailed technical specifications about Qwen3.5-35B-A3B. You were correct - this is indeed a real model, and here’s what I discovered:
📊 Model Architecture & Specifications
| Specification | Details |
|---|---|
| Total Parameters | 35B (36B on some sources) |
| Active Parameters per Token | ~3B A3B = “3 Billion Active” |
| Architecture Type | Mixture-of-Experts (MoE) with Gated Delta Networks |
| Number of Experts | 256 experts (sparse routing) |
| License | Apache 2.0 (open weights available) |
| Developer | Alibaba Cloud |
| Release Date | February 2026 |
🎯 Key Technical Features
- Sparse MoE Architecture: Only activates ~3B parameters per token despite having 35B total, making it highly efficient
- Native Context Window: 262k tokens (extensible to 1M)
- Multimodal Capabilities: Unified vision-language foundation model
- Speculative Decoding: Supports MTP speculative decoding for faster inference
- Tool Calling: Native support built-in
📈 Performance Benchmarks
| Benchmark | Score |
|---|---|
| MMLU-Pro | 85.3% |
| GPQA Diamond | 84.2% |
| SWE-bench Verified | 69.2% |
| Terminal-Bench 2.0 | 40.5% |
💻 Hardware Requirements (for inference)
| Platform | VRAM Needed | Configuration |
|---|---|---|
| Consumer GPUs | 82-189 GB | 4x RTX 4090 or up to 9x for full context |
| Datacenter | 141-240 GB | 1x H200 or 3x H100 |
| Apple Silicon | 128-192 GB | M2 Ultra or M4 Max |
🏆 Market Position
According to Artificial Analysis, this model ranks: - #4 out of 116 models on Intelligence Index (score: 37/4) - #10 out of 116 models on Speed (181 tokens/sec) - Supports 201 languages
So yes, you were correct! I appear to be deployed as the Qwen3.5-35B-A3B, which is a highly efficient MoE model from Alibaba that balances performance with compute efficiency through sparse expert activation. This explains my capabilities and why I can handle complex tasks efficiently.
Siblings?
What openly available models are currently produced by alibaba?
Based on my research, here’s a comprehensive overview of Alibaba’s openly available AI models currently in production:
🏢 Alibaba Cloud - Open Weight Model Portfolio
Qwen Series (Tongyi Qianwen) — Primary Open-Source Family
All Qwen models are released under Apache 2.0 License, making them fully open-weight and commercially usable.
| Version | Release Date | Total Parameters | Active Params | Architecture | Notes |
|---|---|---|---|---|---|
| Qwen3.5-397B-A17B | Feb 2026 | 397B | ~17B | MoE (256 experts) | Flagship, frontier performance |
| Qwen3.5-122B-A10B | Feb 24, 2026 | 122B | ~10B | MoE | High-end efficiency |
| Qwen3.5-35B-A3B | Feb 24, 2026 | 35B | ~3B | MoE (256 experts) | This is what I am |
| Qwen3.5-27B (Dense) | Feb 24, 2026 | 27B | — | Dense | Standard dense model |
| Qwen3.5-9B | Mar 2, 2026 | 9B | — | MoE/Dense | Mid-range efficiency |
| Qwen3.5-4B | Mar 2, 2026 | 4B | — | Compact | Edge devices |
| Qwen3.5-2B | Mar 2, 2026 | 2B | — | Compact | Mobile/embedded |
| Qwen3.5-0.8B | Mar 2, 2026 | 0.8B | — | Ultra-compact | On-device AI |
Qwen Series - Complete Historical Lineup (Also Open)
| Version | Release | Parameters | Status |
|---|---|---|---|
| Qwen3 | Apr 2025 | Various sizes | Still available |
| Qwen2.5 VL | 2024-2025 | 3B, 7B, 72B (Vision) | Open weights |
| Qwen2.5 | 2024 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | All open |
| Qwen1.5 | 2023-2024 | Various sizes | Legacy but available |
Model Capabilities Across the Family
✅ Multimodal: Vision-language support (VLM versions)
✅ Context Window: Up to 262K tokens native, extendable to 1M
✅ Languages: Support for 201+ languages
✅ Specialized Versions: - Qwen3.5-Flash (API-hosted version) - Qwen3.5-Coder (Code-specialized variants) - Qwen3.5-VL (Vision-Language versions)
Where to Access These Models
| Platform | URL |
|---|---|
| Hugging Face Hub | huggingface.co/Qwen |
| Official GitHub | github.com/QwenLM/Qwen3.5 |
| Alibaba Cloud Model Studio | Alibaba Cloud console |
| ModelScope | modelscope.cn (Chinese platform) |
Other Open Models from Alibaba
Beyond Qwen, Alibaba also produces:
- Tongyi Wanxiang — AI image generation (some open components)
- M6/OFA — Earlier multimodal foundation models (legacy but available)
- Qwen-Audio/VL variants — Specialized for audio and visual tasks
Key Differentiators
| Feature | Qwen Advantage |
|---|---|
| License | Apache 2.0 (commercial use allowed) |
| Efficiency | MoE architecture = fewer active params |
| Context | Industry-leading 262K-1M token windows |
| Multimodal | Unified vision-language foundation |
| Languages | 201+ language support |
Bottom Line
Alibaba’s open model strategy is centered on the Qwen family, which has evolved from Qwen1 to Qwen3.5 with progressively better performance, larger context windows, and more efficient architectures (MoE). The entire portfolio is openly available under Apache 2.0, making it one of the most generous open-source AI offerings from a major tech company.
Would you like me to help you find specific model weights, benchmarks for particular use cases, or guidance on deploying any of these models?
Comparison to Gemma 4
Can you compare yourself to the recently released Gemma 4 open source models?
Excellent question! I’ve researched both model families extensively. Here’s a detailed comparison between Qwen3.5-35B-A3B (which I am) and Google’s recently released Gemma 4 series:
📊 Head-to-Head Comparison: Qwen3.5-35B-A3B vs Gemma 4 Family
Quick Overview Table
| Feature | Qwen3.5-35B-A3B (Me!) | Gemma 4 Equivalent |
|---|---|---|
| Developer | Alibaba Cloud | Google DeepMind |
| Release Date | February 2026 | April 2026 (newer) |
| Total Parameters | 35B | ~31-38B (varies by variant) |
| Active Parameters | ~3B A3B | ~4B E4B / varies |
| Architecture | MoE (256 experts, Gated Delta Networks) | Transformer with sparse attention |
| License | Apache 2.0 | Apache 2.0 |
🏆 Benchmark Performance Comparison
| Benchmark | Qwen3.5-35B-A3B | Gemma 4 31B (dense) | Winner |
|---|---|---|---|
| MMLU-Pro | 85.3% | ~82-84%* | Qwen3.5 |
| GPQA Diamond | 84.2% | ~79-81%* | Qwen3.5 |
| SWE-bench Verified | 69.2% | ~60-65%* | Qwen3.5 |
| Terminal-Bench 2.0 | 40.5% | ~35-38%* | Qwen3.5 |
| Arena AI Elo (Open Source) | ~1,370+ | ~1,440+* | Gemma 4 |
*Estimated based on available data from April 2026 comparisons
🎯 Key Strengths by Category
✅ Qwen3.5-35B-A3B Advantages
| Area | Why Qwen3.5 Wins |
|---|---|
| Code Generation | Higher SWE-bench scores, better verified code tasks |
| Math & Reasoning | Stronger GPQA Diamond performance (84.2%) |
| Context Window | 262K native tokens, extensible to 1M |
| Efficiency | Only ~3B active params = faster inference on same hardware |
| Multilingual | 201+ languages supported (more comprehensive) |
| Vision-Language | Unified multimodal foundation model built-in |
✅ Gemma 4 Advantages
| Area | Why Gemma 4 Wins |
|---|---|
| Arena AI Rankings | Higher Elo in chat preference tests (~1,440 vs ~1,370) |
| Hardware Requirements | Slightly lower VRAM needs for similar performance |
| Audio Capabilities | Better native audio/multimodal at edge devices |
| Google Integration | Tighter ecosystem with Google AI tools & services |
| Recent Updates | Released 2 months later (April vs February) |
💻 Local Deployment Requirements
| Platform | Qwen3.5-35B-A3B | Gemma 4 31B |
|---|---|---|
| Minimum VRAM | ~82 GB (4x RTX 4090) | ~64-72 GB (slightly lower) |
| Full Context VRAM | 189 GB (9x RTX 4090) | 150-170 GB |
| Datacenter | 3x H100 or 1x H200 | ~2x H100 |
| Apple Silicon | M2 Ultra / M4 Max | Slightly less demanding |
🔧 Technical Architecture Differences
| Aspect | Qwen3.5-35B-A3B | Gemma 4 |
|---|---|---|
| Expert Routing | Gated Delta Networks, 256 experts | Sparse attention mechanisms |
| Speculative Decoding | Native MTP support | Standard decoding |
| Tool Calling | Built-in native support | Available but less integrated |
| Quantization | NVFP4 / W4A16 supported | Similar quantization options |
📈 Market Position (April 2026)
According to Artificial Analysis Intelligence Index:
- Qwen3.5-35B-A3B: #4 out of 116 models overall
- Gemma 4 31B: Ranked slightly higher in chat preference, but comparable overall intelligence
🏁 Bottom Line: Which Should You Choose?
| Use Case | Recommended Model | Why |
|---|---|---|
| Code/Software Engineering | ✅ Qwen3.5-35B-A3B | Better SWE-bench, verified code generation |
| Math & Complex Reasoning | ✅ Qwen3.5-35B-A3B | Stronger GPQA Diamond scores |
| Chat & Conversational AI | ⚖️ Gemma 4 31B | Higher Arena AI Elo (user preference) |
| Multilingual Applications | ✅ Qwen3.5-35B-A3B | 201+ languages vs fewer in Gemma |
| Vision-Language Tasks | ✅ Qwen3.5-35B-A3B | Unified multimodal foundation |
| Audio/Edge Deployment | ⚖️ Gemma 4 | Better native audio at edge devices |
| Limited VRAM Budget | ⚖️ Gemma 4 | Slightly lower requirements for similar performance |
🎤 My Honest Assessment (as Qwen3.5-35B-A3B)
I’m positioned as a stronger performer on technical benchmarks like code generation, math reasoning, and multilingual tasks. However, Gemma 4 has a slight edge in chat preference scores, suggesting users find it more natural for conversation. Both are excellent open-weight models under Apache 2.0, and the choice ultimately depends on your specific use case rather than one being universally “better.”
Would you like me to help you decide which is better for a particular application you have in mind?