Tyr

Autonomous
Driving
Made Easy

3.2 CUDA-free Petaflops
High-Performance computing for the vehicle

Handling the Power Gap

Most discussions tend to shy away from examining or even mentioning the effective compute power. This represents the percentage of the nominal gross compute power that can be used at any given instance, and it is algorithm dependent. For example, if a one PetaFlops processor can only deliver 20% of that nominal power at any given instance, its effective compute power would be 200TFlops.

Effective compute power becomes critical in L4/L5 systems, as the total processing power requirement is massive. It impacts the system cost in multiple ways – power consumption, silicon cost, PCB area, cooling provisions, etc. In addition, the effective compute power must be delivered within a very restrictive timing budget, typically less than 25 ms.

The Tyr family of chips was designed around the VSORA scalable, multi-core architecture and integrates ISO26262 / ASIL-B/D functional safety elements such as lockstep. The result is a solution that lends itself just as easily to easy integration in existing systems as to completely new, radical designs. Tyr is fully re-programmable making over-the-air updates simple.

The family is algorithm and host processor agnostic and is capable of handling virtually any algorithm with near theory usable efficiency. New algorithms like Transformers and BEVformer are fully supported and everything is developed using high-level language, which means that it is easy and fast, but also guarantees that anything that can be expressed in this high-level language will run on any of the chips in the family.

Having developed your code aiming at the Tyr4 and finding it is too powerful for the application is not an issue. Run the code on the Tyr1 or Tyr2 without any issues and at the same time save both power and money!

VSORA Solution Features

MIXED MODE & SPARSITY

Multifunctional Cores

FULLY PROGRAMMABLE

RAPID DEVELOPMENT

Tyr Family - Next Generation AD/ADAS

Fully programmable Companion Chip
Any Algorithm
Any Host processor

AI & DSP on same chip, selectable on layer-by-layer basis

Minimizes latency and power consumption
Increases flexibility

Fully programmable

High-level programming throughout
Supports Software Defined Vehicle
Handles next generation algorithms

Very high performance

40W / Petaflops
Close to theory implementation efficiency

IEEE754 floating point / Integer

fp8 / fp16 / fp32
int8 / int16 / int32

ISO26262 / ASIL-D ready

Low power

Core Architecture

Product Selection

Performance numbers at 1.6 GHz

2 cores
800* Tflops (fp8 Tensorcore)
200* Tflops (fp16 Tensorcore)
12 Tflops (fp8)
6 Tflops (fp16)
3 Tflops (fp32)

10W (peak)
16GB on-chip memory

* sparsity

4 cores
1,600* Tflops (fp8 Tensorcore)
400* Tflops (fp16 Tensorcore)
24 Tflops (fp8)
12 Tflops (fp16)
6 Tflops (fp32)

30W (peak)
16GB on-chip memory

8 cores
3,200* Tflops (fp8 Tensorcore)
800* Tflops (fp16 Tensorcore)
48 Tflops (fp8)
24 Tflops (fp16)
12 Tflops (fp32)

60W (peak)
16GB on-chip memory

Mobility in 2030 will be autonomous, digital, smart, sustainable and safe. Autonomous driving will become a mass market.

Herbert Diess

ex-CEO Volkswagen AG, 13 July 2021

Tyr

Autonomous
Driving
Made Easy

3.2 CUDA-free Petaflops
High-Performance computing for the vehicle

Handling the Power Gap