Skip to content

Commit

Permalink
Update docs from d068cf9
Browse files Browse the repository at this point in the history
  • Loading branch information
olivedevteam committed Apr 18, 2024
1 parent e9370fc commit bc6459a
Show file tree
Hide file tree
Showing 6 changed files with 28 additions and 16 deletions.
1 change: 0 additions & 1 deletion _sources/examples.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
||red pajama|[Link](https://github.com/microsoft/Olive/tree/main/examples/red_pajama)| `CPU`: with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
||bert|[Link](https://github.com/microsoft/Olive/tree/main/examples/bert)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model<br>`CPU`: with PyTorch QAT Customized Training Loop and ONNX Runtime optimizations for optimized ONNX INT8 model<br>`GPU`: with ONNX Runtime optimizations for CUDA EP<br>`GPU`: with ONNX Runtime optimizations for TRT EP
||deberta|[Link](https://github.com/microsoft/Olive/tree/main/examples/deberta)|`GPU`: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
||dolly_v2|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2)|`GPU`: with Optimum conversion and merging and ONNX Runtime optimizations with DirectML EP
||gptj|[Link](https://github.com/microsoft/Olive/tree/main/examples/gptj)|`CPU`: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
|Audio|whisper|[Link](https://github.com/microsoft/Olive/tree/main/examples/whisper)|`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP16<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8
||audio spectrogram<br>transformer|[Link](https://github.com/microsoft/Olive/tree/main/examples/AST)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
Expand Down
9 changes: 9 additions & 0 deletions api/passes.html
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,15 @@
<p><strong>searchable_values:</strong> None</p>
</dd></dl>

<dl class="std option">
<dt class="sig sig-object std" id="cmdoption-arg-num_key_value_heads">
<span class="sig-name descname"><span class="pre">num_key_value_heads</span></span><span class="sig-prename descclassname"></span><a class="headerlink" href="#cmdoption-arg-num_key_value_heads" title="Permalink to this definition"></a></dt>
<dd><p>Number of key/value attention heads.</p>
<p><strong>type:</strong> int</p>
<p><strong>default_value:</strong> 0</p>
<p><strong>searchable_values:</strong> None</p>
</dd></dl>

<dl class="std option">
<dt class="sig sig-object std" id="cmdoption-arg-hidden_size">
<span class="sig-name descname"><span class="pre">hidden_size</span></span><span class="sig-prename descclassname"></span><a class="headerlink" href="#cmdoption-arg-hidden_size" title="Permalink to this definition"></a></dt>
Expand Down
23 changes: 9 additions & 14 deletions examples.html
Original file line number Diff line number Diff line change
Expand Up @@ -184,56 +184,51 @@ <h1>Examples<a class="headerlink" href="#examples" title="Permalink to this head
<td><p><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization</p></td>
</tr>
<tr class="row-odd"><td><p></p></td>
<td><p>dolly_v2</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with Optimum conversion and merging and ONNX Runtime optimizations with DirectML EP</p></td>
</tr>
<tr class="row-even"><td><p></p></td>
<td><p>gptj</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/gptj">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model</p></td>
</tr>
<tr class="row-odd"><td><p>Audio</p></td>
<tr class="row-even"><td><p>Audio</p></td>
<td><p>whisper</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/whisper">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime optimizations for all-in-one ONNX model in INT8<br><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8<br><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimizations for all-in-one ONNX model in FP16<br><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimizations for all-in-one ONNX model in INT8</p></td>
</tr>
<tr class="row-even"><td><p></p></td>
<tr class="row-odd"><td><p></p></td>
<td><p>audio spectrogram<br>transformer</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/AST">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model</p></td>
</tr>
<tr class="row-odd"><td><p>Vision</p></td>
<tr class="row-even"><td><p>Vision</p></td>
<td><p>stable diffusion <br> stable diffusion XL</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimization for DirectML EP<br><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimization for CUDA EP<br><code class="docutils literal notranslate"><span class="pre">Intel</span> <span class="pre">CPU</span></code>: with OpenVINO toolkit</p></td>
</tr>
<tr class="row-even"><td><p></p></td>
<tr class="row-odd"><td><p></p></td>
<td><p>squeezenet</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/directml/squeezenet">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">GPU</span></code>: with ONNX Runtime optimizations with DirectML EP</p></td>
</tr>
<tr class="row-odd"><td><p></p></td>
<tr class="row-even"><td><p></p></td>
<td><p>mobilenet</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/mobilenet">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Qualcomm</span> <span class="pre">NPU</span></code>: with ONNX Runtime static QDQ quantization for ONNX Runtime QNN EP</p></td>
</tr>
<tr class="row-even"><td><p></p></td>
<tr class="row-odd"><td><p></p></td>
<td><p>resnet</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/resnet">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime static/dynamic Quantization for ONNX INT8 model<br><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with PyTorch QAT Default Training Loop and ONNX Runtime optimizations for ONNX INT8 model<br><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with PyTorch QAT Lightning Module and ONNX Runtime optimizations for ONNX INT8 model<br><code class="docutils literal notranslate"><span class="pre">AMD</span> <span class="pre">DPU</span></code>: with AMD Vitis-AI Quantization<br><code class="docutils literal notranslate"><span class="pre">Intel</span> <span class="pre">GPU</span></code>: with ONNX Runtime optimizations with multiple EPs</p></td>
</tr>
<tr class="row-odd"><td><p></p></td>
<tr class="row-even"><td><p></p></td>
<td><p>VGG</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/vgg">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Qualcomm</span> <span class="pre">NPU</span></code>: with SNPE toolkit</p></td>
</tr>
<tr class="row-even"><td><p></p></td>
<tr class="row-odd"><td><p></p></td>
<td><p>inception</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/inception">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Qualcomm</span> <span class="pre">NPU</span></code>: with SNPE toolkit</p></td>
</tr>
<tr class="row-odd"><td><p></p></td>
<tr class="row-even"><td><p></p></td>
<td><p>super resolution</p></td>
<td><p><a class="reference external" href="https://github.com/microsoft/Olive/tree/main/examples/super_resolution">Link</a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">CPU</span></code>: with ONNX Runtime pre/post processing integration for a single ONNX model</p></td>
Expand Down
9 changes: 9 additions & 0 deletions genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -658,6 +658,8 @@ <h2 id="C">C</h2>
<li><a href="api/passes.html#cmdoption-arg-num_epochs">num_epochs</a>
</li>
<li><a href="api/passes.html#cmdoption-arg-num_heads">num_heads</a>
</li>
<li><a href="api/passes.html#cmdoption-arg-num_key_value_heads">num_key_value_heads</a>
</li>
<li><a href="api/search-algorithms.html#cmdoption-arg-0">num_samples</a>, <a href="api/search-algorithms.html#cmdoption-arg-num_samples">[1]</a>
</li>
Expand Down Expand Up @@ -1818,6 +1820,13 @@ <h2 id="N">N</h2>

<ul>
<li><a href="api/passes.html#cmdoption-arg-num_heads">command line option</a>
</li>
</ul></li>
<li>
num_key_value_heads

<ul>
<li><a href="api/passes.html#cmdoption-arg-num_key_value_heads">command line option</a>
</li>
</ul></li>
<li>
Expand Down
Binary file modified objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit bc6459a

Please sign in to comment.