Skip to content

coremltools 8.0

Latest
Compare
Choose a tag to compare
@junpeiz junpeiz released this 16 Sep 20:50
· 4 commits to main since this release
7b13371

Release Notes

Compare to 7.2 (including features from 8.0b1 and 8.0b2)

  • Support for Latest Dependencies
    • Compatible with the latest protobuf python package which improves serialization latency.
    • Support torch 2.4.0, numpy 2.0, scikit-learn 1.5.
  • Support stateful Core ML models
    • Updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
    • Adds a toy stateful attention example model to show how to use in-place kv-cache.
  • Increase conversion support coverage for models produced by torch.export
    • Op translation support is at 56% parity with our mature torch.jit.trace converter
    • Representative deep learning models (mobilebert, deeplab, edsr, mobilenet, vit, inception, resnet, wav2letter, emformer) have been supported
    • Representative foundation models (llama, stable diffusion) have been supported
    • The model quantized by ct.optimize.torch could be exported by torch.export and then convert.
  • New Compression Features
    • coremltools.optimize
      • Support compression with more granularities: blockwise quantization, grouped channel wise palettization
      • 4 bit weight quantization and 3 bit palettization
      • Support joint compression modes (8 bit look-up-tables for palettization, pruning+quantization/palettization)
      • Vector palettization by setting cluster_dim > 1 and palettization with per channel scale by setting enable_per_channel_scale=True.
      • Experimental activation quantization (take a W16A16 Core ML model and produce a W8A8 model)
      • API updates for coremltools.optimize.coreml and coremltools.optimize.torch
    • Support some models quantized by torchao (including the ops produced by torchao such as _weight_int4pack_mm).
    • Support more ops in quantized_decomposed namespace, such as embedding_4bit, etc.
  • Support new ops and fixes bugs for old ops
    • compression related ops: constexpr_blockwise_shift_scale, constexpr_lut_to_dense, constexpr_sparse_to_dense, etc
    • updates to the GRU op
    • SDPA op scaled_dot_product_attention
    • clip op
  • Updated the model loading API
    • Support optimizationHints.
    • Support loading specific functions for prediction.
  • New utilities in coremltools.utils
    • coremltools.utils.MultiFunctionDescriptor and coremltools.utils.save_multifunction, for creating an mlprogram with multiple functions in it, that can share weights.
    • coremltools.models.utils.bisect_model can break a large Core ML model into two smaller models with similar sizes.
    • coremltools.models.utils.materialize_dynamic_shape_mlmodel can convert a flexible input shape model into a static input shape model.
  • Various other bug fixes, enhancements, clean ups and optimizations
  • Special thanks to our external contributors for this release: @sslcandoit @FL33TW00D @dpanshu @timsneath @kasper0406 @lamtrinhdev @valfrom @teelrabbit @igeni @Cyanosite