Silvretta Research is dedicated to advancing the paradigms of conditional computing for foundational machine learning models. Grounded in rigorous experimental and theoretical methods, our research focuses on Dynamic Mixtures of Experts (MoEs) that adapt computational budget and inference precision based on input complexity. Central to our vision is a neural architecture with the following properties:
- Adaptive compute scaling – both inference and training resource usage dynamically adjust to the input's difficulty. More challenging instances activate deeper or wider computational paths, while simpler inputs consume fewer resources, enhancing efficiency.
- Progressive solution refinement – outputs are generated in a cascade: a rapid, coarse approximation followed by increasingly fine-grained, time-consuming refinements. This multi-stage inference is crucial for latency-sensitive domains such as robotics, enabling anytime-use trade-offs between speed and precision.
- Specialization-driven accelerated training – experts specialize through iterative sampling and path-focused learning, reducing the number of training epochs by 10–40% without relying on explicit load-balancing or gating regularization.
~ 0 ~
Publications
- Mixture of Raytraced Experts (2025) introduces a stacked MoE architecture that selects expert sequences dynamically. It can grow both depth and width in response to input complexity, yielding increasing accuracy with each refinement cycle. Notably, the model achieves comparable or superior accuracy on standard benchmarks while requiring 10–40% fewer training epochs—all without explicit load-balancing mechanisms [arXiv].
- Ray‑Tracing for Conditionally Activated Neural Networks (2025) outlines the RayTracing framework: a hierarchical, sampling-guided MoE architecture whose active inference paths vary by input. It demonstrates reduction in parameter count correlated to input difficulty, with no auxiliary penalty objectives [arXiv].
~ 0 ~
Research Objectives
Silvretta Research is committed to exploring and developing neural systems that fundamentally shift how we allocate computational resources:
- Compute-dynamic inference — research and design of systems where resource use scales with problem difficulty.
- Anytime architecture — models capable of delivering fast approximate predictions with optional iterative refinement.
- Training acceleration — focusing on expert-level specialization to reduce training time without supervisory overhead.
Our ultimate aim is a new generation of foundational models that are not only more efficient and scalable but also adaptable to the real-time demands of complex systems like autonomous agents and robotics.
~ 0 ~
Resources
- Code for MoEs with adaptive Width & Depth
- US Patent Number US-12321860-B1
- US Patent Number US-12175355-B1
- US Patent Number US-11823027-B1
- [PDF] Mixture of Raytraced Experts
- [PDF] Ray-Tracing for Conditionally Activated Neural Networks