caca-AI

Hugging Face
License

📋 Deskripsi

Caca adalah arsitektur Large Language Model (LLM) generasi terbaru yang menggabungkan berbagai teknik state-of-the-art dalam deep learning. Model ini dirancang dengan fokus pada efisiensi, skalabilitas, dan performa tinggi.

Caca itu eksperimen open-source Indonesian LLM yang dibuat dari nol secara individual dan bertahap. Bukan kompetitor siapa-siapa, cuma pengen eksplorasi apa yang bisa dilakukan dengan budget terbatas, passion unlimited, dan mindset collaborative. Kalau berguna buat orang lain, alhamdulillah. Kalau enggak, ya tetap fun kok.

Ini proyek eksplorasi, jadi kalau gagal ya bagian dari proses belajar. Kalau berhasil, itu bonus.

📊 Perbandingan dengan Arsitektur Lain

Fitur Caca LLaMA 2 Mistral IndoGPT GPT-2
🏗️ Arsitektur Dasar
Status ⚠️ Untrained ✅ Trained ✅ Trained ✅ Trained ✅ Trained
Ukuran Model 60+ variant
1M - 1T (semoga)
7B / 13B / 70B 7B 117M 117M - 1.5B
Tipe Arsitektur Decoder-only Decoder-only Decoder-only Decoder-only Decoder-only
Fungsi Aktivasi SwiGLU SwiGLU SwiGLU GELU GELU
Normalisasi RMSNorm RMSNorm RMSNorm LayerNorm LayerNorm
Tahun Release 2025 2023 2023 2020 2019
👁️ Mekanisme Attention
Tipe Attention GQA (configurable) GQA GQA MHA MHA
Position Encoding RoPE + variants RoPE RoPE Learned Learned
Max Context 8K - 16K 4K 32K 1K 1K
Sliding Window ✅ Optional ✅ 4K window
Flash Attention ✅ Flash Attn 2 ✅ Supported ✅ Supported
KV Cache Efficiency 75% reduction
(GQA 4:1)
~60% reduction 75% reduction No optimization No optimization
🚀 Fitur Lanjutan
Mixture of Experts ✅ Optional
TopK + ExpertChoice

(Mixtral variant)
Multimodal ✅ Native
Vision + Audio

(LLaVA separate)
Config Flexibility ✅ 50+ parameters
Toggle semua fitur
⚠️ Limited ⚠️ Limited ❌ Fixed ❌ Fixed
Layer Scale ✅ Optional
Stochastic Depth ✅ Optional
⚡ Performa & Optimisasi
Inference Speed
(7B model, A100)
⚠️ TBD
(belum trained)
~75 tok/s ~78 tok/s ~150 tok/s
(jauh lebih kecil)
~120 tok/s
(jauh lebih kecil)
Memory Footprint
(7B, BF16)
~14GB
(dengan GQA)
~14GB ~14GB ~500MB ~500MB
Gradient Checkpointing ✅ Full support ✅ Supported ✅ Supported ⚠️ Manual ⚠️ Manual
Quantization ✅ 8-bit/4-bit built-in ⚠️ Via external tools ⚠️ Via external tools ❌ Limited support ❌ Limited support
Multi-Backend Support ✅ 4 backends
Flash/xFormers/SDPA/Standard
⚠️ 2 backends ⚠️ 2 backends ❌ Standard only ❌ Standard only
🌏 Dukungan Bahasa
Bahasa Indonesia ⚠️ Belum trained
Designed for ID
❌ Poor
English-heavy
❌ Poor
English-heavy
✅ Native ❌ Minimal
English ⚠️ TBD
Bilingual design
✅ Excellent ✅ Excellent ⚠️ Limited ✅ Good
Training Data ⚠️ To be trained
User's choice
2T tokens
English-heavy
Unknown
English-heavy
23GB
Indonesian
40GB
WebText
Vocab Size 32K
(configurable)
32K 32K 50K 50K
👨‍💻 Developer Experience
Error Messages ✅ Helpful + solutions
Detailed debugging
⚠️ Standard PyTorch ⚠️ Standard PyTorch ❌ Basic errors ❌ Basic errors
Config Validation ✅ Comprehensive
Auto-check conflicts
⚠️ Basic ⚠️ Basic ❌ Minimal ❌ Minimal
Documentation ✅ Extensive
ID + EN, with examples
✅ Good
Official docs
⚠️ Medium
Community-driven
❌ Limited
Minimal docs
✅ Extensive
OpenAI docs
Code Examples ✅ 50+ examples
Training to deployment
✅ Many examples ⚠️ Some examples ❌ Few examples ✅ Many examples
HuggingFace Integration ✅ Full native
Auto-registered
✅ Official ✅ Official ✅ Available ✅ Standard
🌍 Ketersediaan & Lisensi
License ✅ Apache 2.0
Fully permissive
⚠️ LLaMA 2 License
Commercial OK
✅ Apache 2.0 ✅ MIT ✅ MIT
Commercial Use ✅ Allowed
No restrictions
✅ Allowed ✅ Allowed ✅ Allowed ✅ Allowed
Weights Available ❌ Not trained
Architecture only
✅ All sizes
7B/13B/70B
✅ 7B ✅ 117M ✅ All sizes
Self-Hosting ✅ Designed for it
Full control
✅ Yes ✅ Yes ✅ Yes ✅ Yes
Training Required ❌ Yes
From scratch
✅ No
Ready to use
✅ No
Ready to use
✅ No
Ready to use
✅ No
Ready to use
🎯 Use Cases
Production Ready ❌ Not yet
After training
✅ Yes ✅ Yes ⚠️ Limited
Too small
⚠️ Limited
Outdated
Research ✅ Excellent
Modular design
✅ Good ✅ Good ⚠️ Limited ✅ Classic baseline
Indonesian NLP ⚠️ After training
High potential
❌ Poor
Needs fine-tuning
❌ Poor
Needs fine-tuning
✅ Native
But limited
❌ Poor
Education ✅ Excellent
Learn modern LLMs
✅ Good ⚠️ Medium ✅ Good
Simple architecture
✅ Classic
Well-documented

📝 Catatan Penting:

  • Caca adalah arsitektur modern yang belum dilatih - perlu training dari nol dengan dataset Indonesian
  • LLaMA 2 & Mistral sangat bagus untuk English, tapi poor untuk Indonesian tanpa fine-tuning
  • IndoGPT adalah satu-satunya dedicated Indonesian LLM, tapi arsitektur sudah outdated (GPT-2 era)
  • GPT-2 dimasukkan sebagai baseline klasik - arsitektur yang sudah proven tapi tidak modern

✨ Keunggulan Unik Caca:

  • 🎯 Modular Design: Toggle 50+ fitur tanpa rewrite code
  • 🔧 Developer-Friendly: Error messages helpful + config validation
  • 🚀 Modern Architecture: GQA + Flash Attention + SwiGLU + RMSNorm
  • 🎨 Multimodal Native: Vision & Audio built-in (bukan add-on)
  • 📚 Extensive Docs: Bahasa Indonesia + English dengan banyak contoh
  • Optimization Focus: 4 attention backends, auto-fallback, quantization ready
  • 🔬 Research-Oriented: MoE, Mixture of Depths, Layer Scale, dll.

⚠️ Keterbatasan Realistis:

  • Belum trained - output akan random sampai di-training
  • Belum ada tokenizer - perlu training tokenizer sendiri untuk Indonesian
  • Butuh resources besar - training 7B model perlu GPU kelas A100
  • Belum teruji - perlu extensive evaluation setelah training
  • Community masih kecil - tidak sebesar LLaMA/Mistral ecosystem
Daily Quote

🔗 Links