Welcome to OAAX

OAAX serves as a bridge between popular AI frameworks and diverse hardware accelerators. Models developed in frameworks such as TensorFlow, PyTorch, Hugging Face, and others are first exported to the ONNX format, a widely adopted standard for interoperability. OAAX then connects ONNX models to a variety of hardware backends—including CPUs, Intel, NVIDIA, DEEPX, EdgeCortix, Hailo, and more—enabling seamless deployment across heterogeneous compute platforms without requiring framework- or vendor-specific integration.

This is achieved by providing a unified conversion and runtime interfaces, enabling developers to convert ONNX models into hardware-specific formats and run them seamlessly across different platforms using a standardized API.

Terminology

Before delving into the OAAX standard, it's important to understand some key terms that are frequently used:

OAAX: Open AI Accelerator eXchange, a standard (specification, recipe) for deploying AI models across different hardware accelerators.
XPU, AI accelerator or AI hardware: Any processing unit that can execute AI models, such as GPUs, TPUs, or specialized AI accelerators.
Compile or Convert: The process of converting an ONNX model into a format optimized and supported for a specific XPU.
Runtime: The shared library (.so or .dll) that implements the OAAX runtime interface to interact with an XPU.
Conversion Toolchain: The software that compiles ONNX models.
Input/Output Tensors: Data structures that hold the input and output data for the model.
Host or Runtime host: The software that interacts with runtime to offload model computation to the AI hardware.
OAAX-compliant XPU: Refers to any AI accelerator that has an implementation of the OAAX standard, including both the conversion toolchain and the runtime library.

Usage workflow

To run an AI model on an OAAX-compliant XPU, a typical workflow looks like this:

Convert the ONNX model into an XPU-specific OAAX bundle/binary using the provided toolchain.
In the host application, load the OAAX runtime appropriate to the XPU.
Initialize the runtime by calling runtime_initialization().
Load the model with runtime_model_loading("./model.oaax").
Exchange data asynchronously: a. Send inputs with send_input(input_tensors). b. Retrieve outputs with receive_output(output_tensors_holder).
When finished, clean up resources by calling runtime_destruction().

Outline

To get started with using OAAX, please check out the hello world example.
To learn about the OAAX specification, please check out the OAAX Specification Document.
To check if an AI Accelerator is compliant with OAAX, please refer to this list.
To contribute a new OAAX implementation, improve an existing one, develop an example or propose new change to the standard, please refer to the Contributing Guide.
Have more questions? Please check out the FAQ.