Release v0.46.0 · tenstorrent/tt-metal

📦 Uncategorized

user-triggerable C++ post-commit suite
- PR: #6626
#6406: add missing position_ids/attention_mask to bert demo
- PR: #6617
#6282: Add AdamW
- PR: #6333
#6315: Fix dprint tests for T3000
- PR: #6599
FD2: prefetch stall, dispatch wait, linear read, delay and cleanup
- PR: #6620
#6609: update wording in demo section of main README.md
- PR: #6639
#6364: Autocomplete for pybinded types
- PR: #6440
Asarje/ttnn rn50 b20
- PR: #6629
FD2.0 Test - Fix l1 buffer not page-size aligned in after FD-on-eth changes to L1_UNRESERVED_BASE
- PR: #6646
#6593: Add resharding to Llama2 model when possible.
- PR: #6595
#6572: Fix ttnn.repeat_interleave example in documentation
- PR: #6574
#5780: Re-enable 100K enqueue program stress test on grayskull
- PR: #6648
Enable basic width sharding support in all-gather
- PR: #6642
Alex/metal/remove cb wait markers
- PR: #6628
#6657: Use sysmem manager cq size instead of recomputing it each time…
- PR: #6658
#0: (MINOR) Add Grayskull purchase link and update version to 0.46.0
- PR: #6667
#5063: add TopK API to metal
- PR: #6563
#5480: FD2.0 Test - Fix test_prefetcher for dram paged read test (-t 3) on whb0
- PR: #6663
Fix logit low pcc
- PR: #6538
Backward op - Fixed ldexp, hardsigmoid and asin
- PR: #6542
#6598: Fix softplus
- PR: #6675
Add support for BFP4_B tensor serialization
- PR: #6545
Eltwise mul for different batch size
- PR: #6587
#6575: Split docs into separate Metalium and nn docs
- PR: #6666
#0: Add two separate links for documentation (tt-metalium/ttnn) on README
- PR: #6697
#6361: Update ttnn repeat to use correct shapes when formatting output
- PR: #6526
#0: Sayonaraaaaaaa
- PR: #6702
FD2.0 Test fix test_prefetcher add_paged_dram_data_to_worker_data dropping start_page
- PR: #6703
#5785: Watcher ringbuffer implementation
- PR: #6652
Add FD 2.0 WriteHost Command
- PR: #6614
#0: Put back frequent api tests because I'm an idiot
- PR: #6698
Optimize All Gather Interleaved Worker send/receive
- PR: #6706
#0: changing all #include common/* to #include tt_metal/common/*
- PR: #6669
#6676: Fix issues related to unary lte and gte
- PR: #6685
#5817: Fix lerp
- PR: #6630
#6589: Fix for relu_bw
- PR: #6631
#6633: Backward test update
- PR: #6679
#0: Skip logit, logiteps test
- PR: #6714
#0: Testing CI fix
- PR: #6708
#5480: Update test_prefetcher to pass added hugepage args to dispatch kernel
- PR: #6717
Fix l1 acc, add whb0 optimized conv tests
- PR: #6668
Alignment fix for eth core kernels
- PR: #6696
Add data parallel (multi-chip) for Falcon7b (prefill/decode) model and corresponding tests
- PR: #6656
CQ_DISPATCH_CMD_WRITE_PAGED support in test_dispatcher and passing tests
- PR: #6641
#6647: disable failing ci cpp tests and reenable cpp pipeline on CI
- PR: #6704
Backward test updates
- PR: #6692
Ngrujic/check bugs
- PR: #6688
Add Llama matmul perf tests to main
- PR: #6690
TTLIB: removing working tests from broken
- PR: #6718
#6443: Update backward asin and addcdiv logic
- PR: #6715
#0: Fix output cb size calculation in reshard op for bfp8b
- PR: #6739
#0: use smart ptrs in allocator
- PR: #6719
Jvasilje docs 0322
- PR: #6745
DRAM based device profiler with Tracy support
- PR: #6460
#6553: Fix ttnn.reshape(..) handling for bfloat16, TILE_LAYOUT
PR: #6746
Add Llama2 demo to tt-metal docs
- PR: #6682
Mistral-7B WH demo
- PR: #6501
Revert "#0: Put back frequent api tests because I'm an idiot"
- PR: #6755
FP32 support
- PR: #6747
#0: Add back frequent api tests to run.sh
- PR: #6756
Bteng/watcher ci3
- PR: #6530
Remove cpuprof
- PR: #6758
logo update
- PR: #6762
#6184: sharded row major silu support.
- PR: #6643
#6443: Update div_bw and backward ops test file
- PR: #6742
#6705: Relax forcing of keyword argument in ttnn.open_device
- PR: #6707
Forward op tests
- PR: #6730
#6691: Allow blocking of inner dim within a core for shaded in0 for 2d and 1d systolic matmuls
- PR: #6640
#6662: Width Sharding support for eltwise OP
- PR: #6671
Stable diffusion python API level perf improvements
- PR: #6681
Add get_compute_kernel_config_args function
- PR: #6768
#0: Add fd-2/main triggers for pull_request and push for post-commit
- PR: #6709
#5480: FD2 refactor for pre/dis patch variants
- PR: #6655
#6654: Add perf tests for ttnn ResNet50
- PR: #6673
#5480: Fix fd gtest unit test test_write_host
- PR: #6778
#0: Set myself as setup.py owner
- PR: #6779
#6780: Add mistral7b to demos list in getting started
- PR: #6781
#4003: re-added TTNN_ENABLE_LOGGING as runtime flag
- PR: #6750
#0: Fix semaphore address gen bug
- PR: #6233
#6769: Disable program caching for failing Llama tests.
- PR: #6770
#5480: Fix zero sized write transaction request that could occur in write_linear_host
- PR: #6784
#6077: Fix unet pcc issues
- PR: #6660
Remove DstSync from llk api templates
- PR: #6753
FP32 Support
- PR: #6785
#6680: Reverting move op change
- PR: #6811
#6443: Update asinh and softsign backward
- PR: #6773
Backward tests with updated test modules
- PR: #6765
Ngrujic/check bugs 1
- PR: #6734
#6654: Moving init for self.compute_kernel_config
- PR: #6782
#6805: reproduce the bug with sharded split_query_key_value_and_split_heads
- PR: #6806
#6832: Account for tile-padding in softmax for mistral 7B
- PR: #6833
Enable support for uint32 format to be consumed by SFPU (issue #4624)
- PR: #6796
#4252: fix clang build error since std::log2 only constexpr in gcc
- PR: #6835
#4003: log, debug and add pre- and post- hooks only for top-level ttnn ops
- PR: #6841
#6823: Fix core count to not include dispatch cores in op reprot
- PR: #6831
#6197: Align pages for interleaved <-> sharded.
- PR: #6828
METALIUM_GUIDE
- PR: #6846
Bteng/watcher post commit
- PR: #6760
#6443: update backward test file for relational ops and concat op
- PR: #6817
Revert "Bteng/watcher post commit"
- PR: #6866
#6443: Update backward ops
- PR: #6826
Backward test updates
- PR: #6822
#0: Add the dim 0 support repeat backward
- PR: #5596
Update hard related test ops
- PR: #6816
#6757: Remove set_profiler_location
- PR: #6824
#6443: Update backward ops erfinv elu hypot cos sin
- PR: #6827
#6861: Enable Watcher/dprint tests on T3000 CI
- PR: #6869
Update Mistral perf regression for CI, until issue is resolved
- PR: #6883
Mamba/perf v1
- PR: #6744
#0: remove data movement ops related to silu in SD
- PR: #6798
#4003: added proper fallback for getitem of ttnn.Tensor. Slice the tensor only on the tile boundary but set the shape based on whatever user provided
- PR: #6886
#4003: added proper fallbacks for every op that falls back to torch
- PR: #6888
#6731: add fix to LN width sharding
- PR: #6891
#5797: add back sweep test for ln
- PR: #6893
Integrate GroupNorm V2 to SD model
- PR: #6862
METALIUM_GUIDE.md updates
- PR: #6863
[Falcon7b] Fix bugs with inference throughput measurements in demo
- PR: #6884
#0: shallow unet add perf_mode
- PR: #6904
#6154: 2d matmul in0 height, in1 width sharding
- PR: #6821
#5249: Various Falcon40b test and demo cleanup
- PR: #6764
#0: fix incremental build
- PR: #6914
#0: remove upsample spill to DRAM
- PR: #6905
[Llama2 Prefill] Model Functionality completed
- PR: #6800
Watcher alignment checking for PCIe/DRAM <-> L1
- PR: #6901
#6920: fixed the error in whisper
- PR: #6921
Update METALIUM_GUIDE.md
- PR: #6902
#6644: save l1 buffers to data base
- PR: #6856
Update usage.rst
- PR: #6929
#6804: fix ttnn falcon7b demo regression + add to CI regressions
- PR: #6924
#6285: Add backward support for floor round and div_no_nan
- PR: #6290
[skip ci] Update INSTALLING.md
- PR: #6936
#6873: Add more test combinations to tt_lib sweeps add, add_unary, su…
- PR: #6887
Ngrujic/check bugs 3
- PR: #6951
#6882: Updated Mistral-7b perf estimate
- PR: #6892
#6850: Update install links in Sphinx docs to point directly to INSTALLING.md
- PR: #6953
#6619: Fix per op profiler sum
- PR: #6955
#6644: sync before calling print l1 buffers
- PR: #6958
Barsic/ttlib ops check
- PR: #6772
Barsic/ttlib params fix
- PR: #6944
#6962: Move cd tt-metal earlier in the command list of INSTALLING.md
- PR: #6966
#6819: Add support for CreateKernel absolute file paths
- PR: #6922
#6356: Remove half-half grid logic for bmms
- PR: #6968
#4003: added a flag to disable ttnn fallbacks. Don't throw an error w…
- PR: #6961
#0: Correct FW versions, tt-smi versions, and add note about tt-topology
- PR: #6971
#0: Capitalize tt to TT consistently for marketing
- PR: #6973
#0: Add myself as CODEOWNER for INSTALLING.md
- PR: #6974
#6644: ttnn visualizer
- PR: #6935
#6847: Allow disabling individual watcher features
- PR: #6855
#6889: Support printing/padding/tilizing multi-device tensors
- PR: #6976
#4003: removed ttnn.print_l1_buffers and consolidated all ttnn flags into a CONFIG class
- PR: #6980
#6217: tt_lib async mode support (single chipp tensors supported)
- PR: #6700
Reshard With Ranges
- PR: #6919
#4003: updated buffer report to show the input/output tensors, buffer report of the previous operation and the buttons to go to the reports of previous/next operations. Load ttnn.CONFIG from a json file and override it using a single environment variable
- PR: #6994
#4003: disable all tests in test_reports
- PR: #6997
New TTNN sweeps
- PR: #6632
#0: Put sfpi/ CODEOWNERS directive back on separate line because I'm an idiot and broke it
- PR: #7002
#6957: Upload artifacts regardless of the device perf results
- PR: #6975
#5592: Optimize Falcon 7b lm head matmul
- PR: #6956
#4003: set delete_reports_on_start to false in the visualizer
- PR: #7005
#6969: Split watcher noc alignment checks for reads vs writes
- PR: #6979
#7012: Add support for sharding in Mamba model
- PR: #7011
#6217: Async Mode Changes
- PR: #7010
#6886: ttnn slicing bug for padded input
- PR: #6999
#7023: Use bfloat8 weights in Mamba block MLPs
- PR: #7024
#6937: Silu fix for multiple calls. Bug fix. Some name changes.
- PR: #7022
#6306: Enable N150,N300 ttnn unit tests in CI Regressions; disable failing ones
- PR: #7016
Fix minor grammatical errors in METALIUM-GUIDE.md
- PR: #7027
#4003: ttnn visualizer
- PR: #7025
#4003: re-enabled test_reports
- PR: #7034
Sharded attention in stable diffusion.
- PR: #7013
#7041: GS watcher error
- PR: #7042
#7041: GS watcher error
- PR: #7043
#0: update path to watcher.log
- PR: #7046
Ngrujic/check bugs
- PR: #7001
build C++ tests in release mode
- PR: #7053
#6443: Update backward ops
- PR: #6877
#6443: Update backward ops
- PR: #6946
#6443: Update backward ops
- PR: #6912
[skip ci] Update CODEOWNERS
- PR: #7029
frequent pipeline updates
- PR: #7055
Clean up Mamba unit tests and configs
- PR: #7062
#6873: TTLIB modified sweeps GS and WH
- PR: #7004
#6443: Update Unary Div backward
- PR: #6878
More aggressive deallocation, fewer spills to DRAM.
- PR: #7076
#4003: use reports_path instead of tmp_path
- PR: #7074
#6838: Add tracy timeout for op reprots
- PR: #6852
#6873: Add more sweep combinations for tt_lib bcast and sum operations
- PR: #7060
#0: Add link to programming guide (METALIUM_GUIDE.md) instead of the bad paragraph we had before
- PR: #7093
#5489: re-enable profiler regression on N300
- PR: #7079
TTNN sweep tests - zeros, zeros like, nexafter, empty, attention softmax inlace
- PR: #6551

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.46.0

📦 Uncategorized