Skip to content

clpeak like program for Tenstorrent devices using tt-Metalium

License

Notifications You must be signed in to change notification settings

TT-Bounty-Hunters/ttPEAK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ttPEAK

A synthetic benchmarking tool to measure peak capabilities of Tenstorrent devices. It only measues the peak metrics that can be easily measured and easy to program for. It does not represent the actual performance of the device in real world applications.

How to build

This section needs to be cleaned up and provide clearer instructions

Before building this project. Setup the enviroment variables that Metalium needs

export ARCH_NAME=grayskull                                                                   
export TT_METAL_HOME=/path/to/your/root/of/tt-metal/
export TT_METAL_ENV=dev

Then build the project

cd ttPEAK
mkdir build
cd build
cmake ..
make

To run the program (the output is from my system with a Grayskull e75 on x4 PCIe Gen4)

➜ cd .. # Move to the root of the project. NOT the build directory
➜ build/ttpeak 
                  Metal | INFO     | Initializing device 0
                 Device | INFO     | Opening user mode device driver
2024-03-28 08:52:16.488 | INFO     | SiliconDriver   - Detected 1 PCI device : {0}
                  Metal | INFO     | AI CLK for device 0 is:   1300 MHz
Device info:
  Architecture                    : GRAYSKULL
  Device ID                       : 0
  # of hardware command queues    : 1
  L1 memory per core              : 1024 KiB
  Logical core grid size          : 12x8
  Compute with storage grid size  : 11x8
  DRAM banks                      : 8
  DRAM bank size                  : 1023 MiB
  DRAM channels                   : 8
  DRAM size per channel           : 1024 MiB
  Machine epsilon                 : 0.00195312

Bandwidth (GB/s):
  DRAM read bandwidth (1 core)     : 22.99
  DRAM read bandwidth (all cores)  : 67.45
  Adjacent core NoC write          : 38.15
  Adjacent core NoC read           : 38.47
  SRAM scalar r/w (per core)       : 0.98 (this is expected to be slow)

Compute (BFP16, GFLOPS): 
  Element wise math (1 core)       : 113.32
  Element wise math (all cores)    : 7926.89
  Matrix multiplcation (1 core)    : 940.12
  Matrix multiplcation (all cores) : 64805.11

Transfer bandwidth (GB/s):
  Download          : 2.77
  Upload            : 1.13

Empty program launch latency: 63793 ns
                  Metal | INFO     | Closing device 0
                 Device | INFO     | Closing user mode device drivers

About

clpeak like program for Tenstorrent devices using tt-Metalium

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published