[CS.AI] ANEForge: Python for Direct Computation on Apple ...

ANEForge is a Python package that directly programs the Apple Neural Engine (ANE), the fixed-function neural accelerator on every recent Apple device, without using CoreML. In production, the engine is only accessible through CoreML, which treats it as a scheduling option: no configuration requires the ANE, allowing models to run silently on the CPU or GPU instead.

ANEForge compiles a lazy tensor graph built from 58 fused operators and 19 native bridge operators into a single ANE program. This program is dispatched through the same ANE daemon and kernel-driver stack as Apple's internal framework. Beyond inference, the package accesses the engine's native fused attention, streams int8, int4, and sparse weights, keeps decoder and optimizer state resident across steps, and runs the forward pass, backward pass, and optimizer update on the engine.

A small fused program completes a call in about 90us, close to the engine's 70us per-program dispatch floor, and a pretrained ResNet-18 forward runs end-to-end in 0.33ms. ResNet-18, a sentence encoder, and a Vision Transformer run end-to-end against framework references, and a Stable Diffusion U-Net validates its forward pass. ANEForge targets Apple Silicon under macOS 14 and later, with each release verified against a recorded macOS and ANE-compiler version.

Blogger's Review: The launch of ANEForge provides an efficient way to program the neural engine on Apple devices, particularly by avoiding the limitations of CoreML. Developers can now leverage the powerful performance of the ANE for deep learning tasks directly. Its support for fused operations and efficient call times makes it highly promising for practical applications, especially in edge computing scenarios. The optimization of this tool may offer better performance for future deep learning applications.

[CS.AI] ANEForge: Python for Direct Computation on Apple Neural Engine