Tyr
Autonomous
Driving
Made Easy
3.2 CUDA-free Petaflops
High-Performance computing for the vehicle
Handling the Power Gap
Most discussions tend to shy away from examining or even mentioning the effective compute power. This represents the percentage of the nominal gross compute power that can be used at any given instance, and it is algorithm dependent. For example, if a one PetaFlops processor can only deliver 20% of that nominal power at any given instance, its effective compute power would be 200TFlops.
Effective compute power becomes critical in L4/L5 systems, as the total processing power requirement is massive. It impacts the system cost in multiple ways – power consumption, silicon cost, PCB area, cooling provisions, etc. In addition, the effective compute power must be delivered within a very restrictive timing budget, typically less than 25 ms.
The Tyr family of chips was designed around the VSORA scalable, multi-core architecture and integrates ISO26262 / ASIL-B/D functional safety elements such as lockstep. The result is a solution that lends itself just as easily to easy integration in existing systems as to completely new, radical designs. Tyr is fully re-programmable making over-the-air updates simple.
The family is algorithm and host processor agnostic and is capable of handling virtually any algorithm with near theory usable efficiency. New algorithms like Transformers and BEVformer are fully supported and everything is developed using high-level language, which means that it is easy and fast, but also guarantees that anything that can be expressed in this high-level language will run on any of the chips in the family.
Having developed your code aiming at the Tyr4 and finding it is too powerful for the application is not an issue. Run the code on the Tyr1 or Tyr2 without any issues and at the same time save both power and money!
VSORA Solution Features
MIXED MODE & SPARSITY
- Handles AI instructions & GP Instructions
- fp8/fp16/fp32/int8/int16/int32
- Quantization conversion on the fly
- Layerwise quantization possible
Multifunctional Cores
- Supports non-regular sparsity (random)
- Data sparsity handled on the fly
- Activation functions handled on the fly
- IEEE754 floating point: fp8 / fp16 / fp32
FULLY PROGRAMMABLE
- Supports new generation of algorithms
- Supports SDV
- Programmable activation functions
- Supports federated learning
RAPID DEVELOPMENT
- Very high implementation efficiency
- Linear development flow
- Reduces development cost and development risk
- Reduces Time-To-Market and Time-To-Money
Tyr Family - Next Generation AD/ADAS
Fully programmable Companion Chip
Any Algorithm
Any Host processor
AI & DSP on same chip, selectable on layer-by-layer basis
- Minimizes latency and power consumption
- Increases flexibility
Fully programmable
- High-level programming throughout
- Supports Software Defined Vehicle
- Handles next generation algorithms
Very high performance
- 40W / Petaflops
- Close to theory implementation efficiency
IEEE754 floating point / Integer
- fp8 / fp16 / fp32
- int8 / int16 / int32
ISO26262 / ASIL-D ready
Low power
Core Architecture
Product Selection
Performance numbers at 1.6 GHz
Tyr1
- 2 cores
- 800* Tflops (fp8 Tensorcore)
- 200* Tflops (fp16 Tensorcore)
- 12 Tflops (fp8)
- 6 Tflops (fp16)
- 3 Tflops (fp32)
- 10W (peak)
- 16GB on-chip memory
* sparsity
Tyr2
- 4 cores
- 1,600* Tflops (fp8 Tensorcore)
- 400* Tflops (fp16 Tensorcore)
- 24 Tflops (fp8)
- 12 Tflops (fp16)
- 6 Tflops (fp32)
- 30W (peak)
- 16GB on-chip memory
Tyr4
- 8 cores
- 3,200* Tflops (fp8 Tensorcore)
- 800* Tflops (fp16 Tensorcore)
- 48 Tflops (fp8)
- 24 Tflops (fp16)
- 12 Tflops (fp32)
- 60W (peak)
- 16GB on-chip memory