VLSI - Physical Design: January 2019

Sunday, January 27, 2019

Placement

It is the third step in Physical design flow. During Placement, all standard cells are placed automatically on the chip core based on timing, die size and power constraints.

Placement takes place in two steps .

Coarse/Global placement
Legalization/Detail Placement.

During Coarse placement design gets divided into small squared equal sized boxes, called GRC's (Global Routing cells). And standard cells are just thrown inside randomly but not optimized at this stage. They may not be placed over grids and may be overlapping .

During Legalization(second step), standard cells are placed properly over grids and any overlaps are removed based on optimum timing and congestion.

If there would not be proper placement of cells than this would effect the chip performance.

Placements determines the die utilization . A typical utilization of 60% is reached after placement(depends on technology and design size). The placement follows certain boundary conditions while placing cells.They are :

1. Timing

2. Design Rule

OBJECTIVES OF PLACEMENT

To minimize the all critical net delay.

To minimize the total estimated interconnect length/wire length. This helps in minimizing the cost and chip size.
To minimize the power dissipation as possible. It involves distributing the locations of standard cells components as to reduce the overall power consumption.
To minimize the congestion. Congestion means excessive crowding. If there is less space and the standard cells are more. This is one thing which always avoid while placing standard cells.
To minimize the power dissipation as possible. It involves distributing the locations of standard cells components as to reduce the overall power consumption.

PLACEMENT OPTIONS AVAILABLE IN THE TOOL ARE

Congestion driven
Timing Driven
For Timing Driven placement , following timing constraints are read through SDC (Synopsis design constraint) file-
Clock period constraint
Input/Output delays
Uncertainty values

POST PLACEMENT CHECKS:

During the post placement stage , we need to observe the timing report. In timing report a particular start point , end point and the rest logic between them is given.
Observe the clock delay. Before the CTS clock should be ideal.
Check if operating conditions(Setup, Hold Time etc.) are set.

PLACEMENT OPTIMIZATION:

We must reserve placement space for more than 5% of targeted final design utilization to ensure that there is room to add buffers and remap the network to meet timing requirement, before performing placement optimization.

The method generally adopted by tool for optimizing.

Adding/Deleting Buffers.
Remapping logic.
Resizing gate/Up sizing/Downsizing.

ADVANTAGES OF UP SIZING:

Increases speed & Driving strength.
Up sizing means increasing the width of the transistor that is increasing the width of the diffusion. This helps to solve setup and hold violation in timing analysis.
Because up sizing reduces the delay of the std cell. After doing up sizing standard cell become faster and after doing downsizing standard cells become slower.

SWAPPING

involves replacing a kind of threshold voltage cell with another kind.

LVT → MVT → HVT

Advantage of LVT : Faster.

Advantage of HVT: less Leakage power.
Disadvantage: LVT's has more Leakage power. We prefer LVT at timing critical part.

TIMING :
Until the placement and placement optimization stage the clock is ideal, all the clock pins are connected to the clock source, factors like skew , transition and the network delay assume a value 0.

Congestion Analysis:

After Placement first thing we check for Congestion ( local and global congestion) and then Timing (DRV's = transition, capacitance and fanout), utilization etc.

Definition of Congestion: When number of available routing tracks are less than the required number of tracks in particular area,it is said to be congested area.

Local congestion: when congested cells are localized in some particular area, it leads to the creation of local Hotspots i.e. highly congested area. Which is called as local congestion

Reasons: larger count of AOI/OAI cells in particular area.

High Pin density & timing critical, high utilization designs.

To do the analysis ,we need to open the congestion analysis in congestion tab over gui. Now we can see the the whole design divided into GRC's (Global routing cell). Each GRC can be highlighted with green ,red or yellow colors.

Here highly congested areas will be highlighted with red color.

Reason may be more cell density as well as Pin density. we need to move cursor on each and every red colored GRC to know the type of cell (whether it is OAI or AOI cell) and no. of pins available on each cell.

OAI and AOI cells has more no. Of pins than their area, so called complex gates too.

For more detailed analysis, we can further check how the paths are traveling over gui. There may be the paths with single start point and multiple endpoints.

Solution: spreading out the cells and cell padding .

To do the cell padding we need to filter out the cells with no. of pins > 4 and then need to add padding for them. We can use set_keepout_margin ( ) (manual way) or through custom script which will add cell padding of 2 or 4 for filtered count of cells.

Global Congestion: when high density(cell/pin) regions are spread all over the design, it is called as global congestion.

Reasons: When cells of particular modules are not placed all together due to any reason, like bad macro placement etc.

Solution: improve macro placement and module padding/instance padding.

Padding is a method to fool the tool, just to make room available next to particular cell during timing optimization to add any buffer etc.

Thursday, January 24, 2019

Power Planning

Now it is time to plan and create power and ground structures for both I/O pads and core logic.
The I/O pads’ power and ground buses are built into the pad itself and will be connected by abutment.
For core logic, there is a core ring enclosing the core with one or more sets of power and ground rings.
A horizontal metal layer is used to define the top and bottom sides, or any other horizontal segment, while the vertical metal layer is utilized for left, right, and any other vertical segment.
These vertical and horizontal segments are connected through an appropriate via cut.
The next consideration is to construct the standard cell power and ground that is internal to the core logic.
Each of these power and ground strips run vertically, horizontally, or in both directions.
If these strips run both vertically and horizontally at regular intervals, then the style is known as power mesh.
The total number of strips and interval distance is solely dependent on the ASIC core power consumption. As the ASIC core power consumption (dynamic and static) increases, the distance of power and ground strip intervals increases.
This increase in the power and ground strip intervals is used mainly to reduce overall ASIC voltage drop, thereby improving ASIC design performance.
In addition to the core power and ground ring, macro power and ground rings need to be created using proper vertical and horizontal metal layers.
A macro ring encloses one or more macros, completely or partially, with one or more sets of power and ground rings.
It is strongly recommended to check for power and ground connectivity and/or any physical design rule violations after construction of the entire power and ground network.

Goal of power Planning is to provide power to all macros and standard cells within the given IR-Drop limit.

Power planning management can be divided in two major category :

core cell power management
I/O cell power management.

In core cell power planning power rings are formed around the core and macro.

In IO cell power planning power rings are formed for I/O cells and trunks are created between core power ring and power pads.

Power planning is part of floor plan stage.

Input Required In Power Planning

1. Database with valid floorplan.

2. power rings and power straps width.

3. Spacing between VDD and VSS Straps.

Output Of Power Planning

Design with Power Structure

Floorplanning

Once we are done with the initial Sanity checks with PD inputs, next step we move to the Floorplan .

It includes:

a. Die size estimation

b. IO placement

c. Macro Placement

d. Power routing

Floorplan is a critical and important step in PNR. A bad floor plan can cause all kind of issues related to timing, congestion, EM, IR, routing and noise.
Basic understanding of design, data-flow and module interactions is must to come up with a optimum floorplan.
Adding Physical cells, special cells , blockages and power mesh is part of flow. Placement of IO and macro is crucial and it needs thorough understanding and analysis.

for Top-down approach, we gets block size and shape from Top level. Where as in Bottom down, we need to come up with best block size based on instance count and other design requirements by our-self.

Die size consist of below important terms

• Core area = Standard cell area + memory area + macro area

• IO area = Total number of signal pads + total number of power pads

Design could become:

Core limited : Where the number of pads determines the die size.
Pad limited : Where the core logic determines the die size.
Using both pad limited and core limited for a square die.

Important Formulas:

• Aspect Ratio = Height/Width

• Total Utilization = (Area of standard cells + macros + IO)/ Total area of die

FLOORPLAN CONTROLLING PARAMETERS:

1. Aspect Ratio = Horizontal routing resources/Vertical routing resources= H/V

A.R. =1, defines a square shape for the block

2. Utilization:

Core utilization: is the ratio of area occupied by standard cell, macros and blockages to the total core area. i.e.

Core utilization = (standard cell area+ macro cells area+blockages)/ total core area

A core utilization of 0.8 means that 80% of the area is available for placement of cells, whereas 20% is left free for routing.

STD Cell utilization: ratio of area occupied by std cells to the total core area and channel area.

Std cell utilization = std cell area/ (total cell area + channel area)

For Die Size Estimation

Technology Inputs:

Gate Density per sq. mm = D

Number of Horizontal Layers = H

Number of Vertical Layers = V

Design Inputs:

Gate count (excluding memories, macros & subchips) = G

IO area, in sq. mm = I

Memory + Macros + Subchips area, in sq.mm = M

Target Utilization, in percentage = U %

Additional gate count for CTS, timing closure etc, in percentage = T %

Additional gate count for ECOs, in percentage = E %

Die area calculation:

Die Area in sq.mm = {[(Gate count + Additional gate count for CTS & ECO) / Gate density] + IO area + Mem, Macro area} / Target utilization

Die Area = {[(G + T + E) / D] + I + M} / U

Aspect ratio, width, height calculation:

Aspect Ratio: AR = width / height

= Number of horizontal resources / Number of vertical resources

AR = H / V

Height : AR = W / H

W = H * AR ----- (1)

Area = W * H

= H * H * AR (Expressing W in terms of H from (1)

H2 = Area / AR

H = SQRT (Die Area / AR)

Width: W = H * AR

Correct I/O pad placement and selection is important for the correct function of any ASIC design.

For a given ASIC design there are three types of I/O pads. These pads are power, ground, and signal.
It is critical to functional operation of an ASIC design to insure that the pads have adequate power and ground connections and are placed properly in order to eliminate electromigration and current-switching noise related problems

BOND PADS

Pad placements are of two types.

▪Inline Bonding:

All pads are of same height.
Generally used in most of the designs.
Can be used when design is not pad limited.

▪Staggered Bonding :

Pads are of different heights.
Can be used when design is pad limited.
An area-bump bonded chip (or flip-chip), the chip is turned upside down and solder bumps connect the pads to the lead frame.

MACRO PLACEMENT

Manual macro placement is a preferred method for memory placement with the help of

Fly-line analysis
Data flow diagram
Grouping (Macro Grouping, logic hierarchy)

Though we have Auto-placement option also available on tool .We can use it for initial memory placement when number of macros are huge and can refine it later.

Fly-line analysis:

By enabling Fly lines during macro placement, with the help of GUI, we can check and analyze how various modules/standard cells and IO's are communicating to each other.

Data flow diagram

Data flow diagram is provided by synthesis team, which helps in memory placement by providing an overview how data transfer is happening in design.

Grouping

in GUI under hierarchy tab we can get all necessary information required to know before memory placement by highlighting different modules with different colors for different depths.

Guidelines to place the Macros :

Place the macros around the core as much as possible.
Do not place the macros in the middle of the core ,which would create congestion problem.
Leave sufficient macros clearance for power routes and macro signal routes.
Place the macros such that macros pins face towards the core area, in most of the cases the macro pins interact with standard cells.
Place the macros close to IO pins, if it interacts with the specific IO's.
Group the macros based on hierarchy and place all those close together.
Check the fly line connection and place the the macros based on connectivity.
Noise sensitive macros should be placed aside from noisy macro.
Take a look at PORT COMMUNICATIONS.
Avoid crisscross connectivity between macros.
Leave a halo space between macros on all sides.
For a non pin sides of macros a minimal separation is adequate.
For pin sides of macros a larger separation is appropriate.
Leave space between macro and the edge of chip/block, to allow for buffer insertion and power stripes to feed std cell rows between macros and blockages.
Calculate distance between macros by:

[Number of pins * pitch/ Available metal layer] X [Margin(5-10%)]

Here available metal layers are, only effective routing layers. usually vertical layers after excluding M1,since it is dedicated for standard cells.

Guidelines for placing block-level pins:

Determine the correct layer for the pins.
Spread out the pins to reduce congestion.
Avoid placing pins in corners where routing access is limited.
Use multiple pin layers for less congestion.
Never place cells within the perimeter of hard macros.
To keep from blocking access to signal pins, avoid placing cells under power straps unless   the straps are on metal layers higher than metal2.
Use density constraints or placement-blockage arrays to reduce congestion.
Avoid creating any blockage that increases congestion.

Qualifying floorplan:

All macros should be placed and fixed.
All IO pins must be placed and fixed.
Block should have an uniform std cell area.
Avoid notches.
Legality check must pass.There should not be any overlaps between memories.
Try to place memories across boundary areas until unless it is not an design requirement to place them in the center of the design. (only for exceptional cases like pvt sensor)
Blockages must be applied properly around the memories to avoid un necessary congestion.
There must be at least one VDD and VSS power strap between memories to avoid IR.
Try to maintain free space in front of ports to have easy access for std cells talking.
IO timing.
Macro to macro timing.
Macro to standard cell timing with margin.
IR drop must be considered.
Base DRC's should be clean(till m3) we consider base layers till M3 for DP violation(double pattern) ,since double pattering is used till M3.

Arriving at good floor plan takes multiple iterations, but its worth spending time to come up with a good floor plan. It will make further steps easy.

FLOORPLAN WITH POWER ISLANDS

Static power consumption is determined by the foundry process.
All CMOS processes leak whether the circuit is operating or not, so long as the circuit is powered on.
To reduce this leakage we can vary the voltage, but varying it too much affects functional behavior.
So the solution is by dividing the design into power islands and then turning off inactive islands will reduce leakage to zero.
Dynamic power is the power required to produce work, whereas static power is the cost of having the power on. We derive dynamic power consumption using a simple equation: PD = CV2f
A common technique for optimizing dynamic power is to employ hardware accelerators to perform functions that would have otherwise been a software-intensive, power-consuming load on the CPU.
where C is the capacitance at the node, V is the voltage at which the node switches, and f is the switching frequency.

Power Domain Partitioning

▪ More power regimes from PMIC(Power management integrated circuit , usually DC/AC converters or their control part.)

Better Power Control and Efficiency
Independent Voltage Scaling and Power Collapsing
Increased PDN impedance and IR Drop
Increased Bill Of Materials (BOM)
Requires level shifters and resynchronization at boundaries.
Need a small on-chip regulator with fast response and good efficiency
IR Drop and PDN impedance
Increased Power Density
Increased metal resistance
Increased IR due to Power Switches
Dynamic IR affects on skew and timing not well modeled.

Level Shifter:

▪ Purpose of this cell is to shift the voltage from low to high as well as high to low. Generally buffer type and Latch type level shifters are available

High to Low Level Shifter:

Low to High Level Shifter:

▪ Isolation Cell:

These are special cells required at the interface between blocks which are shut-down and always on. They clamp the output node to a known voltage. These cells needs to be placed in an ‘always on’ region only and the enable signal of the isolation cell needs to be ‘always_on’.
In a nut-shell,an isolation cell is necessary to isolate floating inputs.

Basic Isolation cell, Clamp to 0 and Clamp to 1.

There are 2 types of isolation cells (a) Retain “0′′ (b) Retain “1

▪ Enable Level Shifter:

This cell is a combination of a Level Shifter and a Isolation cell.

▪ Retention Flops:

These cells are special flops with multiple power supply. They are typically used as a shadow register to retain its value even if the block in which its residing is shut-down. All the paths leading to this register need to be ‘always_on’ and hence special care must be taken to synthesize/place/route them.

▪ AON cells:

Generally these are buffers, that remains always powered irrespective of where they are placed.
They can be either special cells or regular buffers. If special cells are used, they have their own secondary power supply and hence can be placed any where in the design. And when regular buffers are used they are placed such that they gets regular power supply. for e.g. placing them inside always on voltage island.

Voltage Island

Applying proper Blockages:

Blockage guides the tool for the placement of different kind of cells in different regions of the design, based on the type of blockage.

Types: Hard, Soft, Partial, Placement, Routing, Halos, Keep-Out regions

Hard Blockage: is the kind of blockage which strictly don't allow placement of any kind of standard-cell/buffer or inverter inside the hard blockage.

=>Usually used in highly congested regions to prevent congestion.

=> it will cause abrupt increase in the utilisation factor. So needs to make its use only on the critical basis or when we have enough margin.

Soft Blockage: allows only buffers and inverters to get placed. It does not impact total utilization of design.

Partial Blockage: Based on the blockage factor, we can guide tool to control the placement of standard cells for particular module in specific locations. It is a best method to control cell density Congestion. With the help of blockage factor we can provide the percentage factor by which we want to control the std cell placement.

Placement Blockage:

A placement blockage is an area that cells must avoid during placement, optimization and legalization, including overlapping any part of the placement blockage. A placement blockage can be hard or soft.

•A hard blockage prevents cells from being placed in the blockage area.

•A soft blockage restricts the coarse placer from putting cells in the blockage area, but optimization and legalization can place cells in a soft blockage area.

If you define both hard and soft placement blockages in a block, the hard placement blockages take priority over the soft placement blockages in places where they overlap.

Routing Blockage:

Routing blockages blocks routing resources on one or more layers. It can be created at any point in the design. A routing blockage defines a region where routing is not allowed on specific layers. Zroute considers routing blockages to be as hard constraints.

HALO ( Keep-Out Region): 

HALO is the region around the boundary of fixed macro in the design in which no other macro or std cells can be placed. Halo allows only the placement of buffers and inverter s in its area. 

Halos of two adjacent macros can be overlapped. 

If the macros are moved from one place to another place, Halos will also get moved. But in the case of blockages if the macros are moved from one place to another place the blockages cannot be moved.

Halo does not add anything in total utilization whereas hard blockage  usage cause abrupt increase in utilization factor.

Wednesday, January 23, 2019

Sanity Checks

To make sure that all the inputs(provided by synthesis team) are complete and not erroneous, we need to ensure the correctness/quality of inputs (netlist/libs/sdc) by means of sanity checks. Few checks are listed below:

a) Netlist check: check_design -netlist

This check which verifies correctness of the netlist.

1. Input pin of a net should not be floating .It may lead to power issues or IR drop in later stages.

2. There should not be any direct connection to Vdd and Vss. It may lead to circuit burns.

3. Multi-driven nets are not allowed in the netlist.

4. No combinational loops should be there, it may lead to meta stable state.

5. There should not be any assign statement.

6. checkUnique checks whether the netlist is unique or multiple instantiation of the same module inside a netlist. It helps to avoid multiple instantiations.

b) Timing check: check_timing

Includes few checks which verifies correctness of SDC. Timing Constraints present inside SDC are given by the synthesis team and check_timing command verifies following:

Unconstrained end points
Missing input output delays
Missing clock definitions
MultiClockDriven registers

Unconstrained end points

If timing paths are unconstrained, the check_timing command only reports the unconstrained endpoints, not the unconstrained start points.

Similarly, for paths constrained only by set_max_delay, set_min_delay, or both rather than set_input_delay and set_output_delay, the check_timing command only reports any unconstrained endpoints, not unconstrained start points.

To check for unconstrained endpoints use: report_timing -exceptions

Missing input output delays and Missing clock definitions may leads to incorrect timing calculations.
set input_delay: Specifies a timing delay from one group of points to another (maybe clock signal ). Define the timing arrival at Input port when clock comes .
set_output_delay: signal must arrive at least the specified amount of time that define by command "set_output_delay" before the clock signal.

MultiClockDriven registers: where same flop get driven by more than a single clock which is logically incorrect. In such cases tool wont be able to decide the correct clock port and may lead to more than single launch and capture edges for the data. Which can further lead to setup and hold violations.

c) Library check

This check verifies, whether physical libraries are consistent with logical libraries or not. For any missing libraries check_library command will show black boxes and timing calculation will not be proper.

It checks whether the cells used in the design have been defined in the timing library.If multiple delay corners are being analyzed then each cell needs to be characterized for each corner.

check_design -physicalLibrary : checks physical libraries and reports whether all cells have its corresponding LEF views.

check_design -timingLibrary :checks whether the all cells used in the design have its corresponding timing library.

Tuesday, January 15, 2019

HFN vs CTS

What are High fan out Nets, how do they differ from other nets?

HFN's are the nets which drives more number of load.

we set some max fan out limit by using set_max_fan out.

The nets which have greater than these limits are considered as HFN's.

e.g. Clock, Set/Reset, Scan Enable nets are high fan out nets.

HFNS, high fan out net synthesis is process of buffering the high fanout nets to balance the load.

Since load is directly proportional to Delay(Transition Time and load) ,too much load may effect delay numbers and transition time.

So by buffering this load can be balanced.

HFN (High Fan out Net) Synthesis is performed during placement stage .

Clock is a high fan out net, still buffering is not performed over it. Because it is considered as ideal net.

They are synthesized during CTS separately.

Since both HFNS and CTS has different targets.

CTS is mainly targets for ideal Skew ,Whereas HFNS does not.