Fast Resource Sharing in VLSI Routing

Müller, Dirk

Volltext

Dokument öffnen (2.1MB)

Autor

Müller, Dirk

Art der Hochschulschrift

Dissertation

Prüfungsdatum

18.12.2009

Datum der Veröffentlichung

04.02.2010

Erstgutachter

Vygen, Jens

Zweitgutachter

Korte, Bernhard

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/4503
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5N-19896

Inhalt

Routing is the last major step in chip design. It naturally splits into global and detailed routing: In global routing, an approximate layout of electrical connections is computed, subject to constraints on space and signal propagation delays. Among the feasible solutions, a solution with (near-)optimum power consumption, manufacturing yield or a combination of both is to be found. Detailed routing generates an exact layout subject to constraints on the geometrical arrangement of metal shapes (so-called design rules), restricting search space to corridors defined by the approximate solution obtained in global routing. BonnRoute^®, the routing tool developed at the Reasearch Institute for Discrete Mathematics at the University of Bonn, follows this partitioning and contains a global and a detailed routing module.
In detailed routing, BonnRoute^® discretizes routing space by a track graph in order to simplify the problem. Routing tracks can be left if necessary, e.g. for pin access, but most wires are supposed to run on-track in order to reduce search space. We show how track positions can be found such that routing space is optimally used in this discretized representation. In general, track-to-track distances are not uniform with optimum track positions. We present data structures for representing the track graph and gridless routing space, respectively, which are fast and memory efficient also with irregular track-to-track distances. To achieve this, both data structures have to be aligned with each other in a certain way. With our fast grid data structure for representing the track graph, we obtain a runtime improvement of more than a factor 5 in the core path search algorithm of the BonnRoute^® detailed router.
At global routing scale, most design rules are irrelevant; only minimum spacing requirements are translated into resource consumptions of edge capacities in a global routing graph which is used as a much coarser representation of routing space than in detailed routing. Signal delays, power consumption and manufacturing yield depend non-linearly on the spacing between wires. Because all these dependencies are given by convex functions, we can show that a fractional relaxation of the global routing problem can be formulated as min-max resource sharing problem, which is defined as follows: Given finite sets R of resources and C of customers, a convex set B_c, called block, and a continuous convex function g_c: B_c → (R₊)^R for c ∈ C (ℜ₊ being the set of non-negative real numbers), the task is to find b_c ∈ B_c (c ∈ C) approximately attaining
λ^* := inf { max_r∈R ∑_c∈C (g_c(b_c))_r : b_c ∈ B_c (c ∈ C) }
Given a constant ε₀ = 0 and oracle functions f_c : (R₊)^R → B_c, called block solvers, which for c ∈ C and ω ∈ (R₊)^R return an element b_c ∈ B_c with
ω^Tg_c(b_c) = (1+ε₀) inf_{b∈ B_c} ω^Tg_c(b),
we describe a simple algorithm which solves this problem with an approximation guarantee 1+ε₀+ε for any ε > 0, and whose running time is
O(|C|θρ log |R| (log |R| + ε^-2))
for any fixed ε₀ = 0, whereθ is the time for an oracle call and
ρ := max {1, sup { (g_c(b))_r / λ^* : r ∈ R, c ∈ C, b ∈ B_c } }.
While the fastest previously known algorithms have a linear runtime dependency on |R|, the runtime of our algorithm grows only logarithmically with |R|. In contrast to many previous results which require strong block solvers, i.e. ε₀ = 0 or ε₀ > 0 can be chosen arbitrarily small, our result applies to both strong and weak block solvers, i.e. there is no constraint on the choice of ε₀.
In the context of global routing, customers are sets of pins (called nets) to be connected with each other, blocks are convex hulls of incidence vectors of Steiner forests, and block solvers are implemented by an algorithm for the (group) Steiner tree algorithm in graphs. Our result is of immediate significance for global routing: Here, as in many other practical applications, ρ is bounded by a small constant (in most cases even ρ = 1), and weak block solvers are used to achieve fast runtimes. With our algorithm, it is possible for the first time to find solutions to large instances of the global routing problem with provably near-optimum power consumption or manufacturing yield. In the largest instances today, up to 10 million customers must be coordinated to concurrently use 40 million resources in a (near-)optimum way.
In order to further reduce runtime, we present an efficient parallelized version of our algorithm designed for shared-memory parallel computers. It allows block solvers to work on outdated data, making use of the approximation parameter e to accomodate for communication delays between processors in a shared-memory multi-processor computer. The algorithm detects cases in which solutions returned by a block solver have been computed based on too old data and in these cases schedules the corresponding block solver for sequential recomputation. This makes it possible to prove correctness of the algorithm despite block solvers working with outdated information. It turns out that the time spent on sequential recomputations is sufficiently low to obtain speedups of up to 8x on 8 processors and 14x on 16 processors, in particular on large global routing instances.
We discuss the global routing problem in detail and show how it can be (fractionally) solved by the resource sharing algorithm developed in this thesis. As we are interested in an integral solution, i.e. a Steiner forest for each net, we combine our algorithm with randomized rounding. A feasible integral solution can then be found easily by an iterative refinement procedure in practice. We generalize Steiner forests to tree hierarchies with topology restrictions and present a block solver algorithm which for given resource prices computes a tree hierarchy of minimum cost. This allows to address an important step in hierarchical design methodologies, namely the port assignment at hierarchy boundaries. We discuss many implementation details of the new BonnRoute^® global router developed within the scope of this thesis and propose an approach to implement very fast block solvers by fast tree enumeration. Finally, experimental results on a large number of recent industrial chips demonstrate that optimizing yield or power with our algorithm significantly improves these objectives. Our algorithm reduces critical area, which measures the defect sensitivity of a chip, by up to 30 percent, increasing manufacturing yield correspondingly. Thanks to the efficient parallelization of the resource sharing algorithm, the runtimes to achieve these results are drastically cut down. Even the largest chips with millions of customers and resources can be optimized in a few minutes to a few hours, depending on the optimization objective.

Schlagwörter

Approximation Algorithms, Global Routing, Parallel Algorithms, Packing Problems, Electronic Design Automation

Klassifikation (DDC)

004 Informatik

510 Mathematik

Zitiervorschlag
BibTeX

Müller, Dirk: Fast Resource Sharing in VLSI Routing. - Bonn, 2010. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5N-19896

@phdthesis{handle:20.500.11811/4503,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5N-19896,
author = {{Dirk Müller}},
title = {Fast Resource Sharing in VLSI Routing},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2010,
month = feb,
note = {Routing is the last major step in chip design. It naturally splits into global and detailed routing: In global routing, an approximate layout of electrical connections is computed, subject to constraints on space and signal propagation delays. Among the feasible solutions, a solution with (near-)optimum power consumption, manufacturing yield or a combination of both is to be found. Detailed routing generates an exact layout subject to constraints on the geometrical arrangement of metal shapes (so-called design rules), restricting search space to corridors defined by the approximate solution obtained in global routing. BonnRoute^®, the routing tool developed at the Reasearch Institute for Discrete Mathematics at the University of Bonn, follows this partitioning and contains a global and a detailed routing module.
In detailed routing, BonnRoute^® discretizes routing space by a track graph in order to simplify the problem. Routing tracks can be left if necessary, e.g. for pin access, but most wires are supposed to run on-track in order to reduce search space. We show how track positions can be found such that routing space is optimally used in this discretized representation. In general, track-to-track distances are not uniform with optimum track positions. We present data structures for representing the track graph and gridless routing space, respectively, which are fast and memory efficient also with irregular track-to-track distances. To achieve this, both data structures have to be aligned with each other in a certain way. With our fast grid data structure for representing the track graph, we obtain a runtime improvement of more than a factor 5 in the core path search algorithm of the BonnRoute^® detailed router.
At global routing scale, most design rules are irrelevant; only minimum spacing requirements are translated into resource consumptions of edge capacities in a global routing graph which is used as a much coarser representation of routing space than in detailed routing. Signal delays, power consumption and manufacturing yield depend non-linearly on the spacing between wires. Because all these dependencies are given by convex functions, we can show that a fractional relaxation of the global routing problem can be formulated as min-max resource sharing problem, which is defined as follows: Given finite sets R of resources and C of customers, a convex set B_c, called block, and a continuous convex function g_c: B_c → (R₊)^R for c ∈ C (ℜ₊ being the set of non-negative real numbers), the task is to find b_c ∈ B_c (c ∈ C) approximately attaining
λ^* := inf { max_r∈R ∑_c∈C (g_c(b_c))_r : b_c ∈ B_c (c ∈ C) }
Given a constant ε₀ = 0 and oracle functions f_c : (R₊)^R → B_c, called block solvers, which for c ∈ C and ω ∈ (R₊)^R return an element b_c ∈ B_c with
ω^Tg_c(b_c) = (1+ε₀) inf_{b∈ B_c} ω^Tg_c(b),
we describe a simple algorithm which solves this problem with an approximation guarantee 1+ε₀+ε for any ε > 0, and whose running time is
O(|C|θρ log |R| (log |R| + ε^-2))
for any fixed ε₀ = 0, whereθ is the time for an oracle call and
ρ := max {1, sup { (g_c(b))_r / λ^* : r ∈ R, c ∈ C, b ∈ B_c } }.
While the fastest previously known algorithms have a linear runtime dependency on |R|, the runtime of our algorithm grows only logarithmically with |R|. In contrast to many previous results which require strong block solvers, i.e. ε₀ = 0 or ε₀ > 0 can be chosen arbitrarily small, our result applies to both strong and weak block solvers, i.e. there is no constraint on the choice of ε₀.
In the context of global routing, customers are sets of pins (called nets) to be connected with each other, blocks are convex hulls of incidence vectors of Steiner forests, and block solvers are implemented by an algorithm for the (group) Steiner tree algorithm in graphs. Our result is of immediate significance for global routing: Here, as in many other practical applications, ρ is bounded by a small constant (in most cases even ρ = 1), and weak block solvers are used to achieve fast runtimes. With our algorithm, it is possible for the first time to find solutions to large instances of the global routing problem with provably near-optimum power consumption or manufacturing yield. In the largest instances today, up to 10 million customers must be coordinated to concurrently use 40 million resources in a (near-)optimum way.
In order to further reduce runtime, we present an efficient parallelized version of our algorithm designed for shared-memory parallel computers. It allows block solvers to work on outdated data, making use of the approximation parameter e to accomodate for communication delays between processors in a shared-memory multi-processor computer. The algorithm detects cases in which solutions returned by a block solver have been computed based on too old data and in these cases schedules the corresponding block solver for sequential recomputation. This makes it possible to prove correctness of the algorithm despite block solvers working with outdated information. It turns out that the time spent on sequential recomputations is sufficiently low to obtain speedups of up to 8x on 8 processors and 14x on 16 processors, in particular on large global routing instances.
We discuss the global routing problem in detail and show how it can be (fractionally) solved by the resource sharing algorithm developed in this thesis. As we are interested in an integral solution, i.e. a Steiner forest for each net, we combine our algorithm with randomized rounding. A feasible integral solution can then be found easily by an iterative refinement procedure in practice. We generalize Steiner forests to tree hierarchies with topology restrictions and present a block solver algorithm which for given resource prices computes a tree hierarchy of minimum cost. This allows to address an important step in hierarchical design methodologies, namely the port assignment at hierarchy boundaries. We discuss many implementation details of the new BonnRoute^® global router developed within the scope of this thesis and propose an approach to implement very fast block solvers by fast tree enumeration. Finally, experimental results on a large number of recent industrial chips demonstrate that optimizing yield or power with our algorithm significantly improves these objectives. Our algorithm reduces critical area, which measures the defect sensitivity of a chip, by up to 30 percent, increasing manufacturing yield correspondingly. Thanks to the efficient parallelization of the resource sharing algorithm, the runtimes to achieve these results are drastically cut down. Even the largest chips with millions of customers and resources can be optimized in a few minutes to a few hours, depending on the optimization objective.},
url = {https://hdl.handle.net/20.500.11811/4503}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: