VLSI - Physical Design

Wednesday, March 20, 2019

RC Variation

RC variation is also considered as corners for the setup and hold checks. RC variation can happen because of fabrication process and the width of metal layer can vary from the desired one.

Critical corners for Setup and Hold check

We always check our chip to work in worst scenarios. We should be very pessimistic about setup and hold checks, so we consider worst case scenarios first.

Setup violation can be caused if data is coming very slow. So the condition when process is slow, voltage is minimum and temperature is maximum is the worst case for setup check.

Hold violation is caused if data comes faster. So process should be faster,voltage should be maximum and temperature should be minimum.

Now if setup and hold are checked in worst corners, then the chip should work in every scenario. Still we check them in typical corners because we need to analyze power consumption. Refer following table for the worst case scenarios for setup and hold.

In 90-nm technology and above, a timing path is predominantly governed by cell delays. And that’s the reasons only below mentioned 2 RC interconnect corners are sufficient for all the timing analysis.

1. Cbest (Also known as Cmin) – minimizes C, maximizes R

2. Cworst (Also known as Cmax) – maximizes C, minimizes R

However below 90nm node, the contribution of interconnect delay in a timing path become significant and the Coupling Cap component (Cc) in net delay can significantly alter slack values at an endpoint of a timing path. So, RC corners have to be split up as per the contribution of each component Ground Capacitance (Cg) and Coupling Capacitance (Cc). So on top of the 2 conventional RC corners Cmax and Cmin, foundry came up with 2 more RC corners.

1. RC best (also known as XTALK corner) - Cc is max , Cg x R is min

2. RC worst (also known as Delay corner) - Cc is min ,Cg x R is max

So we can say that there are overall 5 parasitic corners.

1. Cbest

2. Cworst

3. RCbest

4. RCworst

5. Typical

Few definitions/information for every corner:

1.C-best:

It has minimum capacitance. So also known as Cmin corner.

Interconnect Resistance is larger than the Typical corner.

This corner results in smallest delay for paths with short nets and can be used for min-path-analysis.

2.C-worst:

Includes corners which results maximum Capacitance. So also known as Cmax corner.

Interconnect resistance is smaller than at typical corner.

This corners results in largest delay for paths with shorts nets and can be used for max-path-analysis.

3.RC-best:

Includes corners which minimize interconnect RC product. So also known as RC-min corner.

Typically corresponds to smaller etch which increases the trace width. This results in smallest resistance but corresponds to larger than typical capacitance.

Corner has smallest path delay for paths with long interconnects and can be used for min-path-analysis.

4.RC-worst:

Includes corners which maximize interconnect RC product. So also known as RC-max corner.

Typically corresponds to larger etch which reduces the trace width. This results in largest resistance but corresponds to smaller than typical capacitance.

Corner has largest path delay for paths with long interconnects and can be used for max-path-analysis.

5.Typical:

This refers to nominal value of interconnect Resistance and Capacitance.

So there are 2 types of parasitic

-C-based (Capacitance dominates short wire)

-RC-based (R dominates for long wires)

In C-based C means worst and best case capacitance but in RC-based RC means worst and best case R with adjustment in C towards worst or best but keeping the process planar.

Based on the experience it was found that C-based extraction provides worst and best case over RC for internal timing paths because Capacitance dominates short wire. However for large design, inter-block timing paths were often worst with RC worst parasitic since R dominates for long wires.

Note: No corner guarantees min or max delay for an arbitrary transistor driving an arbitrary wire topology

With the help of below picture, we can understand easily :

Monday, March 11, 2019

Extraction

After completion of standard cell placement and power analysis, the next phase is to route the ASIC design and perform extraction of routing and parasitic parameters for the purpose of static timing analysis and simulation.

Parasitic extraction is the calculation of all routed net capacitance's and resistances for the purpose of delay calculation, static timing analysis, circuit simulation, and signal integrity analysis.

Parasitic extraction is performed by analyzing each net in the design and taking into account the effects (such as dielectric stack) of the net’s own topology and proximity to other nets.

For calculating the Delay, we should be aware about the Resistance/Capacitance of the Network/Devices and we can extract this info (R/C) from a layout and "Parasitic Extraction do this job efficiently".

Effect of Parasitic Devices on Circuit Design:

Extra Power Consumption

Effect the Delay of circuit

Reduce the Noise Margin

Increase Signal Noise

Increase IR drop on power Supply lines

It can be used

During Static Timing analysis
During Noise Analysis, Crosstalk Analysis, Signal Integrity Check
In Logic Simulation
During IR Analysis
Substrate Noise Analysis

So there are several ways or say mode in every Parasitic Extraction tool provided by different Vendors so that user can extract only required information. Few of them are:

Extract Resistance(R) Only
Extract Capacitance(C)Only
Extract Resistance and Capacitance(RC) both

Capacitance also are of 2 types (or say "Mode") :

Decoupled Capacitance
Coupled Capacitance

So, we can use any combination to Decrease or Increase the Runtime. It depends on what you want and at which stage. There are 2 type of network

Lumped Network(Lumped-C and Lumped RC)
Distributed Network

Lumped capacitance is a single order approximation and considers only the total capacitance value of interconnection while ignoring the resistance value. This was used in the early days of process technology as wire delay contributions were negligible.

Lumped resistance and capacitance (or RC model) is considered to be a second-order approximation and takes into account the effect of loading capacitance as well as the total wire resistance of interconnections

Distributed RC Network

Distributed resistance and capacitance (or so-called pie model) is classified as a third order interconnect delay approximation. In this model, interconnections are segmented into a series of resistor and capacitor network resembling like the transmission line.

Apart of Above extraction mode, Runtime of each Extracted Tool also depends on several parameters:

Design Size
Process or Technology Node
Output format
System Configuration (Or say available Machine Resource) like No of CPU, Memory, Machine Type.

One of the most commonly used formats used to import and export distributed RC parasitic capacitance and resistance values extracted per net based on their actual geometry and layer width and spacing information, is the Standard Parasitic Extended Format (SPEF).

SPEF is an Institute of Electrical and Electronics Engineers (IEEE) standard.

Basics of Layout:

1. Layout can be very time consuming.

-Design gates to fit together nicely

-Build a library of std cells

-must follow a technology rule

2.Standard cell design methodology

-Vdd and GND should abut (std height)

-Adjacent gates should satisfy design rules

-nMOS at bottom and Pmos at top

-All gates include well and substrate contacts

What are Layout Extractors??

1. Once the layout is made, there always is parasitic capacitance's and resistances associated with the design.

2. This is because of the compact layouts to make the chips smaller. More you make compact layout more will it introduce these parasitic components.

3. These interferes in the functioning and performance of the circuit in terms of timing, speed and power consumption.

4. Examine interrelationship of mask layers to infer the existence o transistors and other components.

5. Related to Design Rule Checkers: design rule verification

6. Some form of layout extraction is usually done to create data for back annotation.

Tools used for extraction:

1. FastCap

2. Star-RCXT

3. QRC

4. Calibre xACT3D

Input files required for extraction

1. .def

2. qrc tech file

Outputs of extraction

1. spef

Steps to Extract a layout:

1. Create Layout CellView

2. Design Rule Checking: it will be successful when we see the results saying :"0 Total errors found".

3. Layout Parameter Extraction: mask layout contains only physical data. Extraction process identifies the devices from the layout and generates a spice like net-list and other files necessary to complete the design process.

4. Layout vs Schematic Comparison

Friday, March 1, 2019

Routing

After completion of standard cell placement and power analysis, Routing is the next phase. Extraction of routing and parasitic parameters for the purpose of static timing analysis and simulation will be followed.

As ASIC designs are getting more complex and larger,routing is becoming more difficult and challenging. It is possible for routing to fail to complete, or to take an unacceptable amount of execution run time.

Besides the routing algorithms, the factors which influence the routability of a given ASIC are the layout of standard cells style, a well-prepared floorplan and the quality of standard cell placement as discussed in previous chapters.

Due to the inherent complexity of ASIC designs and the very large numbers of interconnections associated with them, the overall routing is performed in three stages: special routing, global routing and detail routing.

1. Special Routing

Special routing is used for standard cells, macro power, and ground connections. Most special routers use line-probe algorithms. The line-probe method uses line segments to connect standard cells, macro power, and ground ports to ASIC power and ground supplies.

2. Global Routing

Global routing is the decomposition of ASIC design interconnections into net segments and the assignment of these net segments to regions without specifying their actual layouts. Thus, the first step of the global routing algorithm is to define routing regions or cells (i.e. a rectangular area with terminals on all sides) and calculate their corresponding routing density.

These routing regions are commonly known as Global Routing Cells (GRC's).

Global routing algorithms generate a non-restricted route (i.e. not a detail route) for each net in the design and use some method of estimation to compute wire lengths and extract their corresponding parasitics.

After global routing is performed, the pin locations will be determined such that the connectivity among all standard cells in the ASIC core area is minimal. Almost all global routers report the design routability statistic using overflow or underflow for Global Routing Cells (GRC), which is the ratio of routing cells’ capacity and the number of nets that are required to route a given routing cell for all vertical and horizontal routing layers.

3. Detail Routing

The objective of detail routing is to follow the global routing and perform the actual physical interconnections of ASIC design. Therefore, the detail router places the actual wire segments within the region defined by the global router to complete the required connections between the ports.

Detail routers use both horizontal and vertical routing grids for actual routing. The horizontal and vertical routing grids are defined in the technology file for all layers that are being used. The detail router can be grid-based, gridless-based, or subgrid-based.

Grid-based routing imposes a routing grid (evenly spaced routing tracks running both vertically and horizontally across the design area) that all outing segments must follow.

In addition, the router is allowed to change direction at the intersection of vertical and horizontal tracks as indicated.

The advantage of grid-based routing is efficiency. When using a grid-based router, one needs to make sure that the ports of all instances are on the grid.

Otherwise, they can create physical design rule errors and will be difficult to resolve with the router.

Gridless-based (or shape-based) routers do not follow the routing grid explicitly, but are dependent on the entire routing area and are not limited by grid’s restrictions. They can use different wire widths and spacing without routing grid requirements. The most fundamental problem with this type of router is that they are very slow and can be very complicated.

The subgrid-based router brings together the efficiency of grid-based routers with the flexibility (of varying the wire width and spacing) of the gridless-based routers. The subgrid-based router follows the normal grid similar to the grid-based router. However, a subgrid-based router considers these grids only as guidelines for routing and is not required to use them.

This procedure of detail routing is very similar to global routing. The only difference is that during detail routing, physical wire segments will be used for connection rather than connectivity projections. Thus, it is important to have strong correlation between the detail and global routers with regard to the wire length approximation and actual wire connection.

The reason for the close correlation between global and detail routers is that one can determine whether the ASIC design timing meets actual timing requirements by estimating the wire resistance and capacitance early on in the physical design cycle.

Saturday, February 16, 2019

CTS

Clock Tree Synthesis is a process of automatic insertion of buffers/inverters along the clock path to balance the clock delay to all the clock inputs in the asic design.

Since some amount of delay is associated with every physical wire due to RC factor associated with it, it will results into clock not reaching to the clock pin of all the flops at same time in the design. So by adding buffers/inverters ,we try to maintain Zero skew (ideally impossible) and minimum insertion delay by means of CTS.

Selecting a set of particular buffers and inverters plays a very important role ,which decides the performance of design.

If clock buffers are not selected correctly they may cause the clock pulse width to degrade as the clock propagates through them.

In most of the ICs clock consumes 30-40% of total power. So efficient clock architecture, clock gating & clock tree implementation helps to reduce power.

CTS Goals:

a) Meeting the clock tree DRC's which includes

▪ Max. Transition: should not be too tight and too relaxed

The Transition of the clock should not be too tight or too relaxed.

If it is too tight then we need more number of buffers.

If it is too relaxed then dynamic power is more.

▪ Max. Capacitance

▪ Max. Fan-out

b) Meeting the clock tree targets.

▪ Minimal skew: For this this reason we will need to synthesize the clock tree

▪ Minimum insertion delay

c) After CTS we should meet all the Hold Violations.

CTS steps:

▪ Create clock spec file,

▪ Create clock tree, compile clock tree by (Create_clock)

▪ Fine tune clock tree,for meeting clock skew and insertion delay (ClockOpt)

Implementing Clock Tree:

For implementing the clock tree, use the clock-opt which performs

* CTS &

* incremental physical optimization.

Synthesizes the clock Tree:

Before implementing the clock tree, the tool upsize & possible moves the existing clock gate which improves the quality of result (QoR) and reduce the number of clock tree levels.

Optimize the Clock Tree: is done by any of following steps

▪ Buffer relocation.

▪ Buffer sizing.

▪ Gate relocation.

▪ Gate sizing.

▪ Improve skew.

▪ Delay insertion.

▪ Perform inter-clock delay balancing

▪ Balancing has to be done between two flops driven by two different clocks.

▪ Clock groups between which balancing have to be performed need to be specified.

▪ Perform detail routing of clock nets [NDR rule].

▪ Apply non default routing (NDR) rules for clock nets.

Double width & Double spacing.

Shielding

▪ By default the tool applies routing rules for sink pin. It is better to use normal routing rules at the sink pin because to reduce the congestion and tapping of clock might be easy.

▪ Perform RC extraction of the clock nets and compute accurate clock arrival time.

▪ Adjust the I/O timings.

After implementing the clock tree, the tool can update the input and output delays to reflect the actual clock arrival time.

▪ Perform power optimization.

Use a large/Max clock gating fanout during insertion of the ICG cells.

Merge ICG cells that have the same enable signal.

Perform power-aware placement of ICG and registers.

▪ Check and fix any congestion hotspots.

▪ Optimize the scan chain.

▪ Fix the placement of the clock tree buffers and inverters.

▪ Perform placement and timing optimization.

▪ Check for major hold time violation.

Clock Tree Exceptions:

Clock tree exceptions are declared before clock tree synthesis to guide the tool for timing analysis and skew balancing. We can control clock tree tracing by including or excluding particular pins explicitly.

To define clock tree exceptions, use the set_clock_tree_exceptions command or choose Clock > set Clock Tree Exceptions in the GUI. We can set clock tree exceptions on pins or hierachical pins.

ICC prioritizes the clock tree pin exceptions as follows:

1. Nonstop pins

2.Exclude pins

3. Float pins

4. Stop pins

1. Non- Stop Pin:

Nonstop pins are pins that would normally be considered endpoints of the clock tree, but instead ICC traces through them to find the clock tree endpoints. The clock pins of sequential cells driving generated clocks are implicit nonstop pins. In addition, ICC supports user-defined ( or explicit ) nonstop pins.

Example :

The clock pin of sequential cells driving generated clock are implicit non-stop pins.

Clock pin of ICG cells.

2.Exclude Pin: CTS would exclude the pins from Skew analysis.

3.Float Pin: (macro model)

Float pins are clock pins that have special insertion delay requirements and balancing is done according to the delay.

[Macro modelling]. float

4. Stop Pin:

Stop pins are the endpoints of clock tree that are used for delay balancing.CTS, the tool uses stop pins in calculation & optimization for both DRC and clock tree timing.

Example:

Clock sink are implicit stop pins.

5. Leaf Pin:

CTS treats the pins as sinks, stops tracing further and balances clock skew.

Sunday, January 27, 2019

Placement

It is the third step in Physical design flow. During Placement, all standard cells are placed automatically on the chip core based on timing, die size and power constraints.

Placement takes place in two steps .

Coarse/Global placement
Legalization/Detail Placement.

During Coarse placement design gets divided into small squared equal sized boxes, called GRC's (Global Routing cells). And standard cells are just thrown inside randomly but not optimized at this stage. They may not be placed over grids and may be overlapping .

During Legalization(second step), standard cells are placed properly over grids and any overlaps are removed based on optimum timing and congestion.

If there would not be proper placement of cells than this would effect the chip performance.

Placements determines the die utilization . A typical utilization of 60% is reached after placement(depends on technology and design size). The placement follows certain boundary conditions while placing cells.They are :

1. Timing

2. Design Rule

OBJECTIVES OF PLACEMENT

To minimize the all critical net delay.

To minimize the total estimated interconnect length/wire length. This helps in minimizing the cost and chip size.
To minimize the power dissipation as possible. It involves distributing the locations of standard cells components as to reduce the overall power consumption.
To minimize the congestion. Congestion means excessive crowding. If there is less space and the standard cells are more. This is one thing which always avoid while placing standard cells.
To minimize the power dissipation as possible. It involves distributing the locations of standard cells components as to reduce the overall power consumption.

PLACEMENT OPTIONS AVAILABLE IN THE TOOL ARE

Congestion driven
Timing Driven
For Timing Driven placement , following timing constraints are read through SDC (Synopsis design constraint) file-
Clock period constraint
Input/Output delays
Uncertainty values

POST PLACEMENT CHECKS:

During the post placement stage , we need to observe the timing report. In timing report a particular start point , end point and the rest logic between them is given.
Observe the clock delay. Before the CTS clock should be ideal.
Check if operating conditions(Setup, Hold Time etc.) are set.

PLACEMENT OPTIMIZATION:

We must reserve placement space for more than 5% of targeted final design utilization to ensure that there is room to add buffers and remap the network to meet timing requirement, before performing placement optimization.

The method generally adopted by tool for optimizing.

Adding/Deleting Buffers.
Remapping logic.
Resizing gate/Up sizing/Downsizing.

ADVANTAGES OF UP SIZING:

Increases speed & Driving strength.
Up sizing means increasing the width of the transistor that is increasing the width of the diffusion. This helps to solve setup and hold violation in timing analysis.
Because up sizing reduces the delay of the std cell. After doing up sizing standard cell become faster and after doing downsizing standard cells become slower.

SWAPPING

involves replacing a kind of threshold voltage cell with another kind.

LVT → MVT → HVT

Advantage of LVT : Faster.

Advantage of HVT: less Leakage power.
Disadvantage: LVT's has more Leakage power. We prefer LVT at timing critical part.

TIMING :
Until the placement and placement optimization stage the clock is ideal, all the clock pins are connected to the clock source, factors like skew , transition and the network delay assume a value 0.

Congestion Analysis:

After Placement first thing we check for Congestion ( local and global congestion) and then Timing (DRV's = transition, capacitance and fanout), utilization etc.

Definition of Congestion: When number of available routing tracks are less than the required number of tracks in particular area,it is said to be congested area.

Local congestion: when congested cells are localized in some particular area, it leads to the creation of local Hotspots i.e. highly congested area. Which is called as local congestion

Reasons: larger count of AOI/OAI cells in particular area.

High Pin density & timing critical, high utilization designs.

To do the analysis ,we need to open the congestion analysis in congestion tab over gui. Now we can see the the whole design divided into GRC's (Global routing cell). Each GRC can be highlighted with green ,red or yellow colors.

Here highly congested areas will be highlighted with red color.

Reason may be more cell density as well as Pin density. we need to move cursor on each and every red colored GRC to know the type of cell (whether it is OAI or AOI cell) and no. of pins available on each cell.

OAI and AOI cells has more no. Of pins than their area, so called complex gates too.

For more detailed analysis, we can further check how the paths are traveling over gui. There may be the paths with single start point and multiple endpoints.

Solution: spreading out the cells and cell padding .

To do the cell padding we need to filter out the cells with no. of pins > 4 and then need to add padding for them. We can use set_keepout_margin ( ) (manual way) or through custom script which will add cell padding of 2 or 4 for filtered count of cells.

Global Congestion: when high density(cell/pin) regions are spread all over the design, it is called as global congestion.

Reasons: When cells of particular modules are not placed all together due to any reason, like bad macro placement etc.

Solution: improve macro placement and module padding/instance padding.

Padding is a method to fool the tool, just to make room available next to particular cell during timing optimization to add any buffer etc.

Thursday, January 24, 2019

Power Planning

Now it is time to plan and create power and ground structures for both I/O pads and core logic.
The I/O pads’ power and ground buses are built into the pad itself and will be connected by abutment.
For core logic, there is a core ring enclosing the core with one or more sets of power and ground rings.
A horizontal metal layer is used to define the top and bottom sides, or any other horizontal segment, while the vertical metal layer is utilized for left, right, and any other vertical segment.
These vertical and horizontal segments are connected through an appropriate via cut.
The next consideration is to construct the standard cell power and ground that is internal to the core logic.
Each of these power and ground strips run vertically, horizontally, or in both directions.
If these strips run both vertically and horizontally at regular intervals, then the style is known as power mesh.
The total number of strips and interval distance is solely dependent on the ASIC core power consumption. As the ASIC core power consumption (dynamic and static) increases, the distance of power and ground strip intervals increases.
This increase in the power and ground strip intervals is used mainly to reduce overall ASIC voltage drop, thereby improving ASIC design performance.
In addition to the core power and ground ring, macro power and ground rings need to be created using proper vertical and horizontal metal layers.
A macro ring encloses one or more macros, completely or partially, with one or more sets of power and ground rings.
It is strongly recommended to check for power and ground connectivity and/or any physical design rule violations after construction of the entire power and ground network.

Goal of power Planning is to provide power to all macros and standard cells within the given IR-Drop limit.

Power planning management can be divided in two major category :

core cell power management
I/O cell power management.

In core cell power planning power rings are formed around the core and macro.

In IO cell power planning power rings are formed for I/O cells and trunks are created between core power ring and power pads.

Power planning is part of floor plan stage.

Input Required In Power Planning

1. Database with valid floorplan.

2. power rings and power straps width.

3. Spacing between VDD and VSS Straps.

Output Of Power Planning

Design with Power Structure

Floorplanning

Once we are done with the initial Sanity checks with PD inputs, next step we move to the Floorplan .

It includes:

a. Die size estimation

b. IO placement

c. Macro Placement

d. Power routing

Floorplan is a critical and important step in PNR. A bad floor plan can cause all kind of issues related to timing, congestion, EM, IR, routing and noise.
Basic understanding of design, data-flow and module interactions is must to come up with a optimum floorplan.
Adding Physical cells, special cells , blockages and power mesh is part of flow. Placement of IO and macro is crucial and it needs thorough understanding and analysis.

for Top-down approach, we gets block size and shape from Top level. Where as in Bottom down, we need to come up with best block size based on instance count and other design requirements by our-self.

Die size consist of below important terms

• Core area = Standard cell area + memory area + macro area

• IO area = Total number of signal pads + total number of power pads

Design could become:

Core limited : Where the number of pads determines the die size.
Pad limited : Where the core logic determines the die size.
Using both pad limited and core limited for a square die.

Important Formulas:

• Aspect Ratio = Height/Width

• Total Utilization = (Area of standard cells + macros + IO)/ Total area of die

FLOORPLAN CONTROLLING PARAMETERS:

1. Aspect Ratio = Horizontal routing resources/Vertical routing resources= H/V

A.R. =1, defines a square shape for the block

2. Utilization:

Core utilization: is the ratio of area occupied by standard cell, macros and blockages to the total core area. i.e.

Core utilization = (standard cell area+ macro cells area+blockages)/ total core area

A core utilization of 0.8 means that 80% of the area is available for placement of cells, whereas 20% is left free for routing.

STD Cell utilization: ratio of area occupied by std cells to the total core area and channel area.

Std cell utilization = std cell area/ (total cell area + channel area)

For Die Size Estimation

Technology Inputs:

Gate Density per sq. mm = D

Number of Horizontal Layers = H

Number of Vertical Layers = V

Design Inputs:

Gate count (excluding memories, macros & subchips) = G

IO area, in sq. mm = I

Memory + Macros + Subchips area, in sq.mm = M

Target Utilization, in percentage = U %

Additional gate count for CTS, timing closure etc, in percentage = T %

Additional gate count for ECOs, in percentage = E %

Die area calculation:

Die Area in sq.mm = {[(Gate count + Additional gate count for CTS & ECO) / Gate density] + IO area + Mem, Macro area} / Target utilization

Die Area = {[(G + T + E) / D] + I + M} / U

Aspect ratio, width, height calculation:

Aspect Ratio: AR = width / height

= Number of horizontal resources / Number of vertical resources

AR = H / V

Height : AR = W / H

W = H * AR ----- (1)

Area = W * H

= H * H * AR (Expressing W in terms of H from (1)

H2 = Area / AR

H = SQRT (Die Area / AR)

Width: W = H * AR

Correct I/O pad placement and selection is important for the correct function of any ASIC design.

For a given ASIC design there are three types of I/O pads. These pads are power, ground, and signal.
It is critical to functional operation of an ASIC design to insure that the pads have adequate power and ground connections and are placed properly in order to eliminate electromigration and current-switching noise related problems

BOND PADS

Pad placements are of two types.

▪Inline Bonding:

All pads are of same height.
Generally used in most of the designs.
Can be used when design is not pad limited.

▪Staggered Bonding :

Pads are of different heights.
Can be used when design is pad limited.
An area-bump bonded chip (or flip-chip), the chip is turned upside down and solder bumps connect the pads to the lead frame.

MACRO PLACEMENT

Manual macro placement is a preferred method for memory placement with the help of

Fly-line analysis
Data flow diagram
Grouping (Macro Grouping, logic hierarchy)

Though we have Auto-placement option also available on tool .We can use it for initial memory placement when number of macros are huge and can refine it later.

Fly-line analysis:

By enabling Fly lines during macro placement, with the help of GUI, we can check and analyze how various modules/standard cells and IO's are communicating to each other.

Data flow diagram

Data flow diagram is provided by synthesis team, which helps in memory placement by providing an overview how data transfer is happening in design.

Grouping

in GUI under hierarchy tab we can get all necessary information required to know before memory placement by highlighting different modules with different colors for different depths.

Guidelines to place the Macros :

Place the macros around the core as much as possible.
Do not place the macros in the middle of the core ,which would create congestion problem.
Leave sufficient macros clearance for power routes and macro signal routes.
Place the macros such that macros pins face towards the core area, in most of the cases the macro pins interact with standard cells.
Place the macros close to IO pins, if it interacts with the specific IO's.
Group the macros based on hierarchy and place all those close together.
Check the fly line connection and place the the macros based on connectivity.
Noise sensitive macros should be placed aside from noisy macro.
Take a look at PORT COMMUNICATIONS.
Avoid crisscross connectivity between macros.
Leave a halo space between macros on all sides.
For a non pin sides of macros a minimal separation is adequate.
For pin sides of macros a larger separation is appropriate.
Leave space between macro and the edge of chip/block, to allow for buffer insertion and power stripes to feed std cell rows between macros and blockages.
Calculate distance between macros by:

[Number of pins * pitch/ Available metal layer] X [Margin(5-10%)]

Here available metal layers are, only effective routing layers. usually vertical layers after excluding M1,since it is dedicated for standard cells.

Guidelines for placing block-level pins:

Determine the correct layer for the pins.
Spread out the pins to reduce congestion.
Avoid placing pins in corners where routing access is limited.
Use multiple pin layers for less congestion.
Never place cells within the perimeter of hard macros.
To keep from blocking access to signal pins, avoid placing cells under power straps unless   the straps are on metal layers higher than metal2.
Use density constraints or placement-blockage arrays to reduce congestion.
Avoid creating any blockage that increases congestion.

Qualifying floorplan:

All macros should be placed and fixed.
All IO pins must be placed and fixed.
Block should have an uniform std cell area.
Avoid notches.
Legality check must pass.There should not be any overlaps between memories.
Try to place memories across boundary areas until unless it is not an design requirement to place them in the center of the design. (only for exceptional cases like pvt sensor)
Blockages must be applied properly around the memories to avoid un necessary congestion.
There must be at least one VDD and VSS power strap between memories to avoid IR.
Try to maintain free space in front of ports to have easy access for std cells talking.
IO timing.
Macro to macro timing.
Macro to standard cell timing with margin.
IR drop must be considered.
Base DRC's should be clean(till m3) we consider base layers till M3 for DP violation(double pattern) ,since double pattering is used till M3.

Arriving at good floor plan takes multiple iterations, but its worth spending time to come up with a good floor plan. It will make further steps easy.

FLOORPLAN WITH POWER ISLANDS

Static power consumption is determined by the foundry process.
All CMOS processes leak whether the circuit is operating or not, so long as the circuit is powered on.
To reduce this leakage we can vary the voltage, but varying it too much affects functional behavior.
So the solution is by dividing the design into power islands and then turning off inactive islands will reduce leakage to zero.
Dynamic power is the power required to produce work, whereas static power is the cost of having the power on. We derive dynamic power consumption using a simple equation: PD = CV2f
A common technique for optimizing dynamic power is to employ hardware accelerators to perform functions that would have otherwise been a software-intensive, power-consuming load on the CPU.
where C is the capacitance at the node, V is the voltage at which the node switches, and f is the switching frequency.

Power Domain Partitioning

▪ More power regimes from PMIC(Power management integrated circuit , usually DC/AC converters or their control part.)

Better Power Control and Efficiency
Independent Voltage Scaling and Power Collapsing
Increased PDN impedance and IR Drop
Increased Bill Of Materials (BOM)
Requires level shifters and resynchronization at boundaries.
Need a small on-chip regulator with fast response and good efficiency
IR Drop and PDN impedance
Increased Power Density
Increased metal resistance
Increased IR due to Power Switches
Dynamic IR affects on skew and timing not well modeled.

Level Shifter:

▪ Purpose of this cell is to shift the voltage from low to high as well as high to low. Generally buffer type and Latch type level shifters are available

High to Low Level Shifter:

Low to High Level Shifter:

▪ Isolation Cell:

These are special cells required at the interface between blocks which are shut-down and always on. They clamp the output node to a known voltage. These cells needs to be placed in an ‘always on’ region only and the enable signal of the isolation cell needs to be ‘always_on’.
In a nut-shell,an isolation cell is necessary to isolate floating inputs.

Basic Isolation cell, Clamp to 0 and Clamp to 1.

There are 2 types of isolation cells (a) Retain “0′′ (b) Retain “1

▪ Enable Level Shifter:

This cell is a combination of a Level Shifter and a Isolation cell.

▪ Retention Flops:

These cells are special flops with multiple power supply. They are typically used as a shadow register to retain its value even if the block in which its residing is shut-down. All the paths leading to this register need to be ‘always_on’ and hence special care must be taken to synthesize/place/route them.

▪ AON cells:

Generally these are buffers, that remains always powered irrespective of where they are placed.
They can be either special cells or regular buffers. If special cells are used, they have their own secondary power supply and hence can be placed any where in the design. And when regular buffers are used they are placed such that they gets regular power supply. for e.g. placing them inside always on voltage island.

Voltage Island

Applying proper Blockages:

Blockage guides the tool for the placement of different kind of cells in different regions of the design, based on the type of blockage.

Types: Hard, Soft, Partial, Placement, Routing, Halos, Keep-Out regions

Hard Blockage: is the kind of blockage which strictly don't allow placement of any kind of standard-cell/buffer or inverter inside the hard blockage.

=>Usually used in highly congested regions to prevent congestion.

=> it will cause abrupt increase in the utilisation factor. So needs to make its use only on the critical basis or when we have enough margin.

Soft Blockage: allows only buffers and inverters to get placed. It does not impact total utilization of design.

Partial Blockage: Based on the blockage factor, we can guide tool to control the placement of standard cells for particular module in specific locations. It is a best method to control cell density Congestion. With the help of blockage factor we can provide the percentage factor by which we want to control the std cell placement.

Placement Blockage:

A placement blockage is an area that cells must avoid during placement, optimization and legalization, including overlapping any part of the placement blockage. A placement blockage can be hard or soft.

•A hard blockage prevents cells from being placed in the blockage area.

•A soft blockage restricts the coarse placer from putting cells in the blockage area, but optimization and legalization can place cells in a soft blockage area.

If you define both hard and soft placement blockages in a block, the hard placement blockages take priority over the soft placement blockages in places where they overlap.

Routing Blockage:

Routing blockages blocks routing resources on one or more layers. It can be created at any point in the design. A routing blockage defines a region where routing is not allowed on specific layers. Zroute considers routing blockages to be as hard constraints.

HALO ( Keep-Out Region): 

HALO is the region around the boundary of fixed macro in the design in which no other macro or std cells can be placed. Halo allows only the placement of buffers and inverter s in its area. 

Halos of two adjacent macros can be overlapped. 

If the macros are moved from one place to another place, Halos will also get moved. But in the case of blockages if the macros are moved from one place to another place the blockages cannot be moved.

Halo does not add anything in total utilization whereas hard blockage  usage cause abrupt increase in utilization factor.