Skip to content

v0.46.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 05 Apr 13:57
· 1819 commits to main since this release

📦 Uncategorized

  • user-triggerable C++ post-commit suite
  • #6406: add missing position_ids/attention_mask to bert demo
  • #6282: Add AdamW
  • #6315: Fix dprint tests for T3000
  • FD2: prefetch stall, dispatch wait, linear read, delay and cleanup
  • #6609: update wording in demo section of main README.md
  • #6364: Autocomplete for pybinded types
  • Asarje/ttnn rn50 b20
  • FD2.0 Test - Fix l1 buffer not page-size aligned in after FD-on-eth changes to L1_UNRESERVED_BASE
  • #6593: Add resharding to Llama2 model when possible.
  • #6572: Fix ttnn.repeat_interleave example in documentation
  • #5780: Re-enable 100K enqueue program stress test on grayskull
  • Enable basic width sharding support in all-gather
  • Alex/metal/remove cb wait markers
  • #6657: Use sysmem manager cq size instead of recomputing it each time…
  • #0: (MINOR) Add Grayskull purchase link and update version to 0.46.0
  • #5063: add TopK API to metal
  • #5480: FD2.0 Test - Fix test_prefetcher for dram paged read test (-t 3) on whb0
  • Fix logit low pcc
  • Backward op - Fixed ldexp, hardsigmoid and asin
  • #6598: Fix softplus
  • Add support for BFP4_B tensor serialization
  • Eltwise mul for different batch size
  • #6575: Split docs into separate Metalium and nn docs
  • #0: Add two separate links for documentation (tt-metalium/ttnn) on README
  • #6361: Update ttnn repeat to use correct shapes when formatting output
  • #0: Sayonaraaaaaaa
  • FD2.0 Test fix test_prefetcher add_paged_dram_data_to_worker_data dropping start_page
  • #5785: Watcher ringbuffer implementation
  • Add FD 2.0 WriteHost Command
  • #0: Put back frequent api tests because I'm an idiot
  • Optimize All Gather Interleaved Worker send/receive
  • #0: changing all #include common/* to #include tt_metal/common/*
  • #6676: Fix issues related to unary lte and gte
  • #5817: Fix lerp
  • #6589: Fix for relu_bw
  • #6633: Backward test update
  • #0: Skip logit, logiteps test
  • #0: Testing CI fix
  • #5480: Update test_prefetcher to pass added hugepage args to dispatch kernel
  • Fix l1 acc, add whb0 optimized conv tests
  • Alignment fix for eth core kernels
  • Add data parallel (multi-chip) for Falcon7b (prefill/decode) model and corresponding tests
  • CQ_DISPATCH_CMD_WRITE_PAGED support in test_dispatcher and passing tests
  • #6647: disable failing ci cpp tests and reenable cpp pipeline on CI
  • Backward test updates
  • Ngrujic/check bugs
  • Add Llama matmul perf tests to main
  • TTLIB: removing working tests from broken
  • #6443: Update backward asin and addcdiv logic
  • #0: Fix output cb size calculation in reshard op for bfp8b
  • #0: use smart ptrs in allocator
  • Jvasilje docs 0322
  • DRAM based device profiler with Tracy support
  • #6553: Fix ttnn.reshape(..) handling for bfloat16, TILE_LAYOUT
  • PR: #6746
  • Add Llama2 demo to tt-metal docs
  • Mistral-7B WH demo
  • Revert "#0: Put back frequent api tests because I'm an idiot"
  • FP32 support
  • #0: Add back frequent api tests to run.sh
  • Bteng/watcher ci3
  • Remove cpuprof
  • logo update
  • #6184: sharded row major silu support.
  • #6443: Update div_bw and backward ops test file
  • #6705: Relax forcing of keyword argument in ttnn.open_device
  • Forward op tests
  • #6691: Allow blocking of inner dim within a core for shaded in0 for 2d and 1d systolic matmuls
  • #6662: Width Sharding support for eltwise OP
  • Stable diffusion python API level perf improvements
  • Add get_compute_kernel_config_args function
  • #0: Add fd-2/main triggers for pull_request and push for post-commit
  • #5480: FD2 refactor for pre/dis patch variants
  • #6654: Add perf tests for ttnn ResNet50
  • #5480: Fix fd gtest unit test test_write_host
  • #0: Set myself as setup.py owner
  • #6780: Add mistral7b to demos list in getting started
  • #4003: re-added TTNN_ENABLE_LOGGING as runtime flag
  • #0: Fix semaphore address gen bug
  • #6769: Disable program caching for failing Llama tests.
  • #5480: Fix zero sized write transaction request that could occur in write_linear_host
  • #6077: Fix unet pcc issues
  • Remove DstSync from llk api templates
  • FP32 Support
  • #6680: Reverting move op change
  • #6443: Update asinh and softsign backward
  • Backward tests with updated test modules
  • Ngrujic/check bugs 1
  • #6654: Moving init for self.compute_kernel_config
  • #6805: reproduce the bug with sharded split_query_key_value_and_split_heads
  • #6832: Account for tile-padding in softmax for mistral 7B
  • Enable support for uint32 format to be consumed by SFPU (issue #4624)
  • #4252: fix clang build error since std::log2 only constexpr in gcc
  • #4003: log, debug and add pre- and post- hooks only for top-level ttnn ops
  • #6823: Fix core count to not include dispatch cores in op reprot
  • #6197: Align pages for interleaved <-> sharded.
  • METALIUM_GUIDE
  • Bteng/watcher post commit
  • #6443: update backward test file for relational ops and concat op
  • Revert "Bteng/watcher post commit"
  • #6443: Update backward ops
  • Backward test updates
  • #0: Add the dim 0 support repeat backward
  • Update hard related test ops
  • #6757: Remove set_profiler_location
  • #6443: Update backward ops erfinv elu hypot cos sin
  • #6861: Enable Watcher/dprint tests on T3000 CI
  • Update Mistral perf regression for CI, until issue is resolved
  • Mamba/perf v1
  • #0: remove data movement ops related to silu in SD
  • #4003: added proper fallback for getitem of ttnn.Tensor. Slice the tensor only on the tile boundary but set the shape based on whatever user provided
  • #4003: added proper fallbacks for every op that falls back to torch
  • #6731: add fix to LN width sharding
  • #5797: add back sweep test for ln
  • Integrate GroupNorm V2 to SD model
  • METALIUM_GUIDE.md updates
  • [Falcon7b] Fix bugs with inference throughput measurements in demo
  • #0: shallow unet add perf_mode
  • #6154: 2d matmul in0 height, in1 width sharding
  • #5249: Various Falcon40b test and demo cleanup
  • #0: fix incremental build
  • #0: remove upsample spill to DRAM
  • [Llama2 Prefill] Model Functionality completed
  • Watcher alignment checking for PCIe/DRAM <-> L1
  • #6920: fixed the error in whisper
  • Update METALIUM_GUIDE.md
  • #6644: save l1 buffers to data base
  • Update usage.rst
  • #6804: fix ttnn falcon7b demo regression + add to CI regressions
  • #6285: Add backward support for floor round and div_no_nan
  • [skip ci] Update INSTALLING.md
  • #6873: Add more test combinations to tt_lib sweeps add, add_unary, su…
  • Ngrujic/check bugs 3
  • #6882: Updated Mistral-7b perf estimate
  • #6850: Update install links in Sphinx docs to point directly to INSTALLING.md
  • #6619: Fix per op profiler sum
  • #6644: sync before calling print l1 buffers
  • Barsic/ttlib ops check
  • Barsic/ttlib params fix
  • #6962: Move cd tt-metal earlier in the command list of INSTALLING.md
  • #6819: Add support for CreateKernel absolute file paths
  • #6356: Remove half-half grid logic for bmms
  • #4003: added a flag to disable ttnn fallbacks. Don't throw an error w…
  • #0: Correct FW versions, tt-smi versions, and add note about tt-topology
  • #0: Capitalize tt to TT consistently for marketing
  • #0: Add myself as CODEOWNER for INSTALLING.md
  • #6644: ttnn visualizer
  • #6847: Allow disabling individual watcher features
  • #6889: Support printing/padding/tilizing multi-device tensors
  • #4003: removed ttnn.print_l1_buffers and consolidated all ttnn flags into a CONFIG class
  • #6217: tt_lib async mode support (single chipp tensors supported)
  • Reshard With Ranges
  • #4003: updated buffer report to show the input/output tensors, buffer report of the previous operation and the buttons to go to the reports of previous/next operations. Load ttnn.CONFIG from a json file and override it using a single environment variable
  • #4003: disable all tests in test_reports
  • New TTNN sweeps
  • #0: Put sfpi/ CODEOWNERS directive back on separate line because I'm an idiot and broke it
  • #6957: Upload artifacts regardless of the device perf results
  • #5592: Optimize Falcon 7b lm head matmul
  • #4003: set delete_reports_on_start to false in the visualizer
  • #6969: Split watcher noc alignment checks for reads vs writes
  • #7012: Add support for sharding in Mamba model
  • #6217: Async Mode Changes
  • #6886: ttnn slicing bug for padded input
  • #7023: Use bfloat8 weights in Mamba block MLPs
  • #6937: Silu fix for multiple calls. Bug fix. Some name changes.
  • #6306: Enable N150,N300 ttnn unit tests in CI Regressions; disable failing ones
  • Fix minor grammatical errors in METALIUM-GUIDE.md
  • #4003: ttnn visualizer
  • #4003: re-enabled test_reports
  • Sharded attention in stable diffusion.
  • #7041: GS watcher error
  • #7041: GS watcher error
  • #0: update path to watcher.log
  • Ngrujic/check bugs
  • build C++ tests in release mode
  • #6443: Update backward ops
  • #6443: Update backward ops
  • #6443: Update backward ops
  • [skip ci] Update CODEOWNERS
  • frequent pipeline updates
  • Clean up Mamba unit tests and configs
  • #6873: TTLIB modified sweeps GS and WH
  • #6443: Update Unary Div backward
  • More aggressive deallocation, fewer spills to DRAM.
  • #4003: use reports_path instead of tmp_path
  • #6838: Add tracy timeout for op reprots
  • #6873: Add more sweep combinations for tt_lib bcast and sum operations
  • #0: Add link to programming guide (METALIUM_GUIDE.md) instead of the bad paragraph we had before
  • #5489: re-enable profiler regression on N300
  • TTNN sweep tests - zeros, zeros like, nexafter, empty, attention softmax inlace