The Mireplica Visual-Processing Architecture addresses a wide range of vision-processing applications – including AI – programmed in conventional C++. A single program thread enables the efficient use of hundreds of datapaths, operating simultaneously on contexts that are logically fully shared, to the extent of an entire image frame of any dimensions. Despite the use of vector abstractions in the programming model, the architecture supports on-the-fly processing on streams – without requiring that large vectors, arrays, or line buffers be assembled in memory before processing. The architecture employs a new form of instruction generation and execution: asynchronous distributed single-instruction, multiple data (AD-SIMD). Execution is highly out-of-order, but conforms to a single-threaded model implemented using a standard CPU, such as RISC-V, with custom instructions emitted by class-library abstractions. The platforms are highly customizable using C++ high-level synthesis, and approach the power, performance, and area metrics of hardware ASIC blocks, but with full programmability.