FPGA · AI · Edge Computing

LLM Intelligence
On The Edge // FPGA-Accelerated AI Inference · No Cloud Required

Rona Technology builds FPGA-based AI accelerators that run large language models locally — ultra-low latency, minimal power, zero data exposure. Deploy Qwen-class intelligence inside any device.

RT-1 AI ACCELERATOR FPGA · INT8
10× Faster than CPU
<5W Power Consumption
INT4 Quantization Support
0ms Cloud Latency
1.8B Max Model Params
// Core Capabilities

Built for the Edge.
Powered by AI.

Our FPGA accelerator architecture delivers server-class LLM inference in an embedded form factor — fully customizable, fully yours.

Hardware-Level Acceleration

Custom matrix multiply and attention mechanism blocks implemented in RTL. Runs 10× faster than ARM Cortex-A on the same task.

🔒
Air-Gap Privacy

All inference runs on-device. No data leaves your system, no cloud dependency, no subscription keys. Critical for industrial and medical use cases.

🧠
Qwen / DeepSeek Ready

Optimized for quantized versions of Qwen 1.5 and DeepSeek models. INT4/INT8 quantization preserves >95% model accuracy at a fraction of compute cost.

🔧
Fully Programmable

Based on Xilinx Zynq Ultrascale+. Reconfigure the accelerator architecture without hardware replacement — adapt to new model architectures as AI evolves.

📦
Modular Form Factor

Small enough to integrate into existing products. PCIe and M.2 interface options available. Drop-in acceleration for existing hardware platforms.

🌐
China-Compliant Stack

Built entirely on China-accessible components and software. No dependency on sanctioned technology. Supports domestic FPGA alternatives (Gowin, PANGO).

How It Works

From model training to edge deployment — a complete pipeline designed for hardware efficiency.

01
Model Quantization
Qwen 1.5-0.5B or 1.8B is quantized to INT4/INT8 using GPTQ, reducing model size by 4–8× with minimal accuracy loss.
PyTorch · Brevitas · GPTQ
02
ONNX Export & Optimization
Model exported to ONNX, operator fusion applied. Attention heads and FFN layers mapped to custom FPGA IP blocks.
ONNX · Vitis AI · FINN
03
RTL Synthesis
Custom HLS-generated accelerator core synthesized via Vivado. Pipelined MAC arrays with 4K parallel multiply units.
Vitis HLS · Vivado · Verilog
04
Edge Deployment
Bitstream deployed to Zynq SoC. ARM CPU handles tokenization; FPGA fabric runs inference. Result in <100ms latency.
Custom SoC · PYNQ Runtime
// System Architecture
HOST APPLICATION User Input / Application Layer
ARM CPU (PS) Tokenizer · KV Cache Manager · Output Decoder
↓ AXI4 High-Speed Bus ↓
FPGA FABRIC (PL) — AI ACCELERATOR CORE ┌─────────────────────────────┐
│ INT8 Matrix Multiply Array │
│ Attention Mechanism Block │
│ Softmax / LayerNorm Units │
│ Weight Cache (BRAM/HBM) │
└─────────────────────────────┘
MEMORY SUBSYSTEM DDR4 · 4GB · Model Weight Storage
// Product Line

Evaluation Boards
& Development Kits

Start prototyping with our development kits. Volume pricing and custom modules available for OEM integration.

Preliminary Release Date: July 1, 2026
RT-Lite
Entry Evaluation Kit

Perfect for proof-of-concept and developer evaluation. Runs Qwen-0.5B at INT8 precision.

  • 2GB onboard memory
  • USB 3.0 + Ethernet
  • Qwen-0.5B INT8
  • ~8 tokens/sec
RT-OEM
Custom Integration Module

Tailored to your hardware requirements. Compact M.2 or custom PCB form factor for direct product integration.

  • Custom FPGA selection
  • Configurable memory
  • Any interface (M.2/PCIe/UART)
  • Model fine-tuning support
  • NDA + IP protection
Custom / contact us

Building the Future of Edge AI

Rona Technology is a Guangzhou-based company dedicated to bringing large language model intelligence to embedded systems. We design FPGA accelerators from the ground up — from RTL architecture to production deployment.

01
Founded in Guangzhou, 2024 Born out of research at the intersection of AI hardware design and large language models.
02
Industrial · Medical · Robotics Targeting sectors where data privacy, low latency, and offline reliability are non-negotiable.
03
China-Compliant, Export-Ready Built on a fully domestic-accessible stack — no dependency on sanctioned components or cloud services.
// Work With Us

Let's Build Together

Whether you're exploring evaluation kits or planning a full OEM integration, our team is ready to help you bring edge AI to your hardware.

Get in Touch →

Start Your Edge AI
Project Today

Whether you need an evaluation kit, custom OEM module, or technical consultation — we're ready to help you deploy AI at the edge.

Location
Guangzhou, China
Tianhe District
Email
admin@rona-technology.com
Response within 24 hours
Partnerships
OEM · Distributor · Research Collaboration
We welcome partnerships across all sectors