How We Made Flutter Image Compression 26× Faster Using Rust

Table of Contents

Building content...

At LOGIQUE, we build Flutter applications for clients who require high performance in real-world environments. This article documents how our team implemented Flutter Rust FFI for image compression and achieved a 26× speedup compared to the baseline Dart implementation. Repository: The full benchmark source code is available on GitHub

This repository can be run locally to compare Dart and Rust implementations, and tested using your own set of images. Learn how we implemented Flutter Rust FFI for significantly faster image compression below.

The Problem: Compression as a UX Bottleneck

Image compression before upload sounds routine. Shrink the file, save bandwidth, move on. For most apps handling one or two photos at a time, it is fine. But in our case, users could attach large batches of documentation photos in a single workflow, and that’s when it turned into a serious UX problem.

The application we built for the client is used in the field to upload photos as report attachments. The pipeline was straightforward: select images → compress → upload. Nobody noticed a problem when uploading two or three photos.

The problem surfaced when users started attaching larger batches images in a single session. Because every image had to be fully proccessed before the upload could begin, users were left watching a loading screen for an uncomfortably long time.

Key Insight: From the user’s perspective, the app wasn’t “compressing images”, it was just frozen. A technical implementation detail had become a perceived reliability problem.

What We Tried in Dart First

Before reaching for native code, we exhausted the obvious Dart-level optimizations. This is the right order of operations: measure first, optimize the existing stack, then escalate.

Isolates for concurrency

Moving compression to isolates kept the UI thread responsive. Scroll animations stayed smooth, buttons still responded. But isolates don’t magically make the underlying processing free. They improve concurrency and responsiveness, and with bounded parallelism they can improve throughput, but only up to a point.

Earlier resizing

By resizing images earlier in the pipeline (before the full decode + re-encode cycle), we reduced the amount of data each step had to process. Useful, but still not enough to remove the wait time in larger batches.

Quality tuning and pipeline cleanup

Adjusting compression settings and cleaning up unnecessary work helped reduce overhead. These are worthwhile improvements, but still incremental.

After those optimizations, our Dart implementation improved from 16.24 seconds to 8.16 seconds for a 21-image batch in release mode. Better, but still much slower than we wanted.

Why Dart Has a Ceiling Here

After profiling, the picture became clear. Image compression is fundamentally CPU-bound. Each image goes through three expensive operations:

  1. Decode the raw image bytes into an in-memory representation
  2. Resize the image
  3. Re-encode it with the target quality and constraints
  4. Generate additional outputs such as thumbnails or processed variants

A single image might not feel expensive on its own. But repeat that across 20 images and the total wait becomes very noticeable. The limiting factor isn’t how we structured the code. It’s the raw compute throughput available to Dart’s runtime. To go faster, we needed to get closer to the metal.

Why Rust (Not C, Not Kotlin/Swift Native)

When moving work to native code, the practical options were C/C++, Kotlin (Android), Swift (iOS), or Rust. We chose Rust for four reasons:

  • Performance comparable to C/C++
    No managed runtime, direct machine code, no GC pauses.
  • Memory safety at compile time
    No buffer overflows or use-after-free bugs; the compiler enforces safety rules
  • First-class FFI support
    Rust exposes C-compatible symbols, and Flutter’s dart:ffi can call those directly
  • A strong ecosystem for systems-style image processing
    Enough control to build a focused native pipeline cleanly

Writing one cross-platform Rust library was also more maintainable than keeping separate Kotlin and Swift implementations in sync.

Architecture: Flutter Orchestrates, Rust Executes

The integration follows a clean separation of concerns. Flutter remains responsible for everything product-facing; Rust handles only the compute-heavy transformation.

In this benchmark project, we deliberately pass raw image bytes across the FFI boundary along with a small processing config. Rust processes the bytes and returns result buffers plus metadata back to Dart.

Implementation

The Dart Side : calling Rust via FFI

Flutter loads the compiled Rust library and binds a single function. It passes image bytes and a small config payload. That is the actual FFI surface in this project.

typedef ProcessImageNative = Pointer<CProcessedImage> Function(
  Pointer<Uint8> imageBytes,
  Uint32 imageLen,
  Uint32 maxWidth,
  Uint32 maxHeight,
  Uint8 quality,
  Uint32 targetSizeKb,
  Uint32 thumbnailSize,
);

final processImageFunc = _library
    .lookup<NativeFunction<ProcessImageNative>>('rust_process_image')
    .asFunction<Pointer<CProcessedImage> Function(
      Pointer<Uint8>,
      int,
      int,
      int,
      int,
      int,
      int,
    )>();

final result = processImageFunc(
  imagePtr,
  imageBytes.length,
  config.maxWidth,
  config.maxHeight,
  config.quality,
  config.targetSizeKb,
  config.thumbnailSize,
);

The Rust Side : the compression engine

Rust exposes a C-compatible function. It receives raw image bytes plus config, runs the native processing pipeline, and returns a pointer to a processed result structure.

#[no_mangle]
pub extern "C" fn rust_process_image(
    image_bytes: *const c_uchar,
    image_len: c_uint,
    max_width: c_uint,
    max_height: c_uint,
    quality: c_uchar,
    target_size_kb: c_uint,
    thumbnail_size: c_uint,
) -> *mut CProcessedImage {
    let run = || -> Result<*mut CProcessedImage, String> {
        if image_bytes.is_null() {
            return Err(String::from("image_bytes pointer is null"));
        }
        if image_len == 0 {
            return Err(String::from("image_len is 0"));
        }

        let input = unsafe { std::slice::from_raw_parts(image_bytes, image_len as usize) };
        let config = CompressionConfig {
            max_width: clamp_u32(max_width, 1),
            max_height: clamp_u32(max_height, 1),
            quality: clamp_quality(quality),
            target_size_kb: clamp_u32(target_size_kb, 1),
            thumbnail_size: clamp_u32(thumbnail_size, 1),
        };

        let result = process_image(input, config)?;
        Ok(convert_result_to_c(result))
    };

    match std::panic::catch_unwind(run) {
        Ok(Ok(ptr)) => { clear_last_error(); ptr }
        Ok(Err(message)) => { set_last_error(message); ptr::null_mut() }
        Err(_) => { set_last_error("Rust panic while processing image"); ptr::null_mut() }
    }
}

Benchmark Results

We measured three implementations against the same batch in a release build run.

Batch size: 21 images

ImplementationTotal TimeAvg/ImageImage/SecSpeedup
Dart Original16.24s773.1ms1.291.00x
Dart Optimized 8.16s388.4ms2.571.99x vs baseline
Rust via FFI624 ms29.7ms33.6526.02x vs baseline

The full benchmark project used for these measurements is available in the GitHub repository above, so readers can reproduce the run on their own devices.

Benchmark screenshots from the release build

The screenshots below show the benchmark cards from the same release run used in the table above. They are useful as visual proof that the reported numbers came from an actual device run.

Dart (Baseline)

Dart (Optimized)

Rust (FFI)

Speedup Summary

What the Numbers Actually Mean

A few points are worth calling out.

First, the Dart optimization work was not wasted. Moving from 16.24s to 8.16s is already a meaningful improvement, and it shows that isolate-based concurrency can help throughput when applied carefully.

Second, Rust is where the major jump happened. In this release run, the Rust FFI path finished the same 21-image batch in 624ms, which is 26.02× faster than the Dart baseline and 13.07× faster than the optimized Dart version.

Third, this benchmark is mainly demonstrating throughput improvement, not necessarily better compression ratio on this particular dataset.

Important Note: In this run, the processed output was actually larger than the original total input. That can happen when the source images are already small or already compressed efficiently, and then get re-encoded with the chosen settings. So the key result here is speed, not size reduction.

When to Use Dart vs. Rust

This wasn’t a case for “rewrite everything in Rust.” The right approach is surgical: keep Dart for what it’s good at, and escalate to Rust only when profiling confirms a genuine CPU bottleneck.

Stay with Dart when:

  • Users upload 1–3 images at a time
  • Performance is already acceptable
  • Simplicity matters more than raw speed
  • Compression is infrequent

Consider Rust when:

  • Batch sizes are large
  • Profiling confirms a CPU-bound bottleneck
  • Wait time is degrading the user experience
  • The operation is pure computation with no UI involvement

Key Takeaways

  1. Measure before optimizing.
    Profiling revealed that the problem was CPU throughput, not code structure. Intuition alone would have led us in the wrong direction.
  2. Exhaust your current stack first.
    The Dart optimizations were worth doing. They improved the experience for users who upload a few images.
  3. Isolates help a lot, but they don’t remove the ceiling. 
    They improve responsiveness and can improve throughput, but native execution still wins on this kind of repeated CPU-heavy workload.
  4. Keep the FFI surface narrow and explicit. 
    A small, well-defined boundary made the integration easier to reason about.
  5. Rust + Flutter is a practical, production-ready combination. 
    With proper error handling and ownership boundaries, it is a strong option for hot-path optimization.

Conclusion

mage processing looked like a footnote in our upload feature. It became a serious performance bottleneck the moment batch size grew.

The solution wasn’t to rewrite the app or adopt a new framework. It was to identify exactly which part of the system was CPU-constrained, move that specific operation to Rust, and leave everything else exactly as it was.

In this benchmark project, batch latency went from 16.24 seconds in the Dart baseline to 624ms with Rust via FFI in release mode. That is a 26.02× throughput improvement on the same run.

Sometimes the best architectural decision is knowing which part of your stack deserves a better tool — and having the discipline to change only that part.

If your team is facing similar performance bottlenecks in Flutter, or needs help deciding when to move to native code, the LOGIQUE engineering team is open to discussion. Contact us to get started.

LOGIQUE helps your business grow through targeted digital transformation. We provide IT consulting, website development, web and mobile app development, system development, and digital marketing services.

JAGAWEB BY LOGIQUE

All-in-one Solution for Website Security and Operations

JagaWeb is designed for businesses that require stable, secure, and continuously evolving website performance—without the hassle of managing multiple vendors.

PENTEST CHECKUP BY ETHICAL HACKER

Pentesting to Detect Security Risks Early

A rapid security assessment service by OSCP-certified pentesters, designed to identify real-world risks and provide clear remediation recommendations.