RISC-V public test platform released · 7-zip test

Introduction

7-Zip is an open-source compression and decompression tool with high compression ratio and fast decompression. In addition to the normal file compression and decompression functions, 7-Zip also provides a benchmark function to evaluate the processing power and performance of the system by compressing and decompressing large files.

7-Zip provides a way to benchmark at different compression levels and multi-threading settings, allowing users to test system performance according to their needs, which means you can freely choose the file size compressed and decompressed in the test, and the number of cores/threads used. Benchmarks will provide performance metrics such as compression and decompression speeds, as well as corresponding MIPS (million instructions per second) values to compare performance across configurations and hardware.

We can view other processing parameters on the 7-Zip https://www.7-cpu.com/ official website.

Platform Environment

【Hardware parameters】

Processor: Sophon SG2042 x 1

Number of cores: 64 cores

L1 Cache: I: 64KB and D:64KB

L2 Cache: 1MB/Cluster

L3 Cache: 64MB System Cache

DRAM: DDR4 16Gx4

【Software Environment】

linux version: 22.10

gcc version: 12.2.0

7-Zip version: 16.02

Test project introduction

Compression

Compression speed depends heavily on memory (RAM) latency, data cache size/speed, and TLB. The tests also use simple 32-bit integer instructions: “shift”, “add”, “multiply”, etc. In addition, the out-of-order execution characteristics of the CPU are also important for this test.

Decompression

Decompression speed depends heavily on CPU integer operations. The most important things for this test are: branch misprediction penalty (pipeline length) and latency of 32-bit instructions (“multiply”, “shift”, “add”, etc.). The decompression test has a large number of unpredictable branches. Note that some CPU architectures (such as 32-bit ARM) support instructions that can be executed conditionally. Therefore, in many cases, such CPUs can work without branches (and without pipeline flushes) in LZMA decompression code. Such CPUs have some speed advantages over other architectures that do not support complex conditional execution.

Test

  #-mmt=32 represents the number of threads executed
  
  ubuntu@perfxlab:~$ 7z b -mmt=32

The 7-Zip test with 1/2/4/32/64 threads was carried out on SG2042, and the test results are as follows:

Performance comparison

We selected 3 CPUs for comparison, SiFive FU740, Loongson 3A5000, Ryzen 3950X (Zen2).

Data Source:

https://www.7-cpu.com/

The following are SiFive FU740 test results:

The following are the Loongson 3A5000 test results:

The following are Ryzen 3950X (Zen2) test results:

We can see that under this test condition, the single-core performance of sg2042 and starfive FU740 are basically the same, and sg2042 leads the way in 64-thread performance.

The single-core performance of the LoongArch architecture 3A5000 is still good. Does it mean that Loongson’s 64 times single core will be very powerful? uncertain! Multi-core is also an art.

We curiously made a comparison with the Ryzen 3950X (Zen2). Unsurprisingly, all data were much better than the SG2042. In terms of parameters, the main frequency of Ryzen 3950X (Zen2) is 4.7GHz, which is much higher than the 2GHz SG2042. As the world’s first RISC-V processor chip, SG2042 is not easy to reach this level. Seeing the gap clearly, we rolled up our sleeves and worked hard.

Finally, it should be emphasized that the optimization of the sg2042 basic software has just begun, and there is still great potential for optimization.

End of Text
1 Like