I've completed a new, tested, fully working controller - have a look at Simple_SDRAM_Controller
Although aimed at 100MHz all of the designs below can be adapted to other clock speeds. The only changes needed are to increase the number of NOPs in the refresh chain to ensure that it takes at least 70ns.
Adapting to a CAS setting of 2 is only a little bit more difficult, as the data is available one cycle earlier. A CAS of 2 can only be used with a clock speed of 100MHz, and will make the biggest difference with the simple FSM where it saves a cycle on every read, or in the most complex where it saves a cycle flipping between reads and writes.
The priority should be first to perform any pending refresh, but priority of performing reads over writes depends on your target application. For example if you are generating a VGA video signal reads should take priority over writes otherwise "tearing" of the picture could occur.
FSM1 - Simple controller
Based on the information above, here is the design for a simple FSM to access the SDRAM with a burst length of 4 at 100MHz, CAS = 3:
(The blue circles indicate where data is transferred to/from the SDRAM)
Performance:
- Read is 11 cycles for four words = 72MB/s @ 100Hz, excluding refresh overhead
- Write is 11 cycles for four words = 72MB/s @ 100Hz, excluding refresh overhead
- Refresh is 8 cycles.
Pros:
- Simple to implement
- Fast logic - only two nodes has multiple exits, and the choice is simple.
- Predictable performance.
Cons:
- Poor performance
This can be slightly improved on with a few little changes. These focus around skipping the idle stage where possible.
FSM2 - Optimised simple controller
(The blue circles indicate where data is transferred to/from the SDRAM)
Performance:
- Read is 10 cycles for four words = 80MB/s @ 143Hz, excluding refresh overhead
- Write is 10 cycles for four words = 80MB/s @ 143Hz, excluding refresh overhead
- Refresh is 7 cycles.
Pros:
- Simple to implement
- Fast logic - only three nodes has multiple exits, and the choice at these nodes is simple.
- Predictable performance.
Cons:
- Poor performance
Further improvements can be made by not activating and precharging the row every time.
FSM3 - With back-to-back reads or back-to-back writes
(The blue circles indicate where data is transferred to/from the SDRAM)
Performance:
- Single read is 10 cycles for four words = 80MB/s @ 100Hz, excluding refresh overhead. For back to back reads this gets close 200MB/s
- Single write is 10 cycles for four words = 80MB/s @ 100Hz, excluding refresh overhead. For back to back writes this gets close 200MB/s
- For mixed read/write workloads performance can be as low as 72MB/s
- Refresh is 7 cycles.
Pros:
- Much improved performance for back-to-back operations (as long as you don't mix reads and writes.
- You can choose to allow back-to-back operations in only the write or read sections, optimising for the applicaiton
Cons:
- Logic is starting to get complex (and slow).
- Unpredictable latency.
One major issue with this design is that it is possible to get stuck in a loop in either the 'read' or 'write' operations. In the unlikely case that this occurs there is the chance that refresh operations will not performed as needed. The easy solution would be to not perform back-to-back writes if a refresh operation is pending.
To improve on this we have to start mixing the read and write operations, as long as they are on the same row. This is where things get complex!
FSM4 - With mixed back-to-back reads and writes
(The blue circles indicate where data is transferred to/from the SDRAM)
Performance:
- Single read is 10 cycles for four words = 80MB/s @ 100Hz, excluding refresh overhead. For back to back reads this gets close 200MB/s
- Single write is 10 cycles for four words = 80MB/s @ 100Hz, excluding refresh overhead. For back to back writes this gets close 200MB/s
- For mixed read/write workloads performance can be upto 145MB/s
- Refresh is 7 cycles.
Pros:
- Much improved performance for back-to-back operations, including mixed reads and writes.
Cons:
- Logic is getting complex (and slow).
- Unpredictable latency
- Only back-to-back operations are improved, if the requests to the same rwo are separated by a few clock cycles the row gets precharged and opened again.
One other issue with this design is that it is possible to get stuck in a loop in either the 'read' or 'write' operations. In the unlikely case that this occurs there is the chance that refresh operations will not performed as needed. The easy solution is to not perform back-to-back operations if a refresh operation is pending.
Further improvements - FSM5
The FSM4 design can also be improved on. It involves having an "idle row activated" state, which would reduce latency for operations that are interspersed with a few idle cycles - down from 10 cycles to 7 for reads, and down from 7 cycles to 4 four for writes. These are pretty big improvements.
As it involves a lot more complexity than the above designs, so the diagram looks completely different:
(Blue circles are data transfers from the SDRAM, red circles are data transfers to the SDRAM)
Pros:
- Nearly a full featured design, everything but the ability to abort burst transfers is catered for
Cons:
- Very complex to code and test.
- Complexity may reduce speed.
- Large number of states to understand and manage.
- Design in some cases is slower simpler design. for example from and idle activerow to a completed read in a different row is takes three cycles longer..
As long as priority is given to getting back to the idle state when a refresh is pending this seems to be close to optimal design.
Source code
This is the source code for the FSM.
It is really an awesome information. with lot of knowledge.
ReplyDeleteAngularJS training in chennai | AngularJS training in anna nagar | AngularJS training in omr | AngularJS training in porur | AngularJS training in tambaram | AngularJS training in velachery