HDL Express

Welcome to HDL Express, the personal webpages of Kirk Weedman

HDL stands for Hardware Description Language.

This website also contains information on various Verilog/FPGA tutorials, Alternative Energy projects, and the progress of a new CPU architecture that I'm designing.

I'm an electronic design engineer specializing in contract Verilog RTL FPGA design, functional verification, testbench creation, simulation and debug. I have a varied background in other disciplines too.

My resume: Download the PDF version here. Download Word format here

Currently available for new FPGA design contract work.

 

April 23, 2017: Current Status of the new Out of Order CPU Architecture.

This architecture is not like the typical OoO architectures being used today and the goal is to improve OoO IPC.

4/27/2017 - Since this architecture can be applied to most any ISA, I am switching from the ARMv7 ISA to the RISC-V. RISC-V will be simpler to implement and there is good software tool support for it. Due to this change, a lot of the %written/debugged numbers in the tables below have decreased. For some modules, it makes little if any difference due to how the code is written.

I'm looking for possible help. Read my OFFER to join in helping me progress faster on this new architecture.

See CPU History for more information about the progress on this architecture

See Branch Prediction Elimination for more info about the progress on this method.

1. Current simplified block diagram of new Out of Order Microarchitecture

2. Debugging RV32I instructions & flow through all stages. Getting ready to debug ls_process.v and all related files.

3. Adding RISC-V CSR instructions to the decode, disassembly(debug only), etc..modules

4. Creating a behavioural memory interface module for simulation purposes. Also creating an RTL version of a 32KB 8-way Set Assoc. L1 Data cache.

More details on specific modules:

Module Name % Written %Debugged Description
KPU_OoOe.pptx 95 N/A PowerPoint presentation: overview and details about how this new microarchitecture works
top_tb1.v 20 10 Top level test bench #1. There's enough written to start the debug process.
cpu_params.h 80 50 Include file for all design modules. Additions/changes as needed.
disasm.v 90 80 Used for debugging. It displays RV32I instructions real time in ModelSim simulations.
kpu_oooe.v 85 60 Top level module. Ties all CPU design related modules together.
fetch.v 75 75 Fetches instructions from memory. Still needs logic to handle branches.
decode.v 75 80 Decodes RV32I instructions. Some system level instructions may not be implemented. Main object is to show new OoOE method, not a complete Arm CPU.
microcode.v 75 80 Logic & data for ROM/RAM microcode table.
dependency_control.v 100 50 Determines instruction execution dependencies. Seems to be working well so far
llrs.v 100 80 Linked List Reservation Stations (not similar to any known RS) 100% written. This is a linked list type queue. This module keeps track of all instructions and whether they are ready to start execution. The oldest instruction that's ready to be executed is offered to the appropriate functional unit in the Issue/Execute stage. Seems to be working well.
L1_ram.v 80 20 Currently using this behavioural model to emulate memory for simulation purposes ONLY
L1_dcache 70 0 32KB 8-way set assoc. L1 data cache for use with the CPU. - currently using a behavioural model for sim
sll.v 100 95 Contains the core logic for a Singly Linked List queue and is instantiated in llrs.v. The queue list is linked from newest to oldest instruction.
pool.v 100 60 This code implements the Pool of Functional Units. This working code was formerly inside kpu_oooe.v as inline code, but now has its own module.
gpr.v 90 50 Contains CPU architectural registers and read/write logic connected to them. This is not shown in the above diagram, but commit.v writes to gpr.v (the architectural register set).
rob.v 95 95 Reorder Buffer to queue up Out of Order instructions to be committed In Order
commit.v 85 80 This commits/retires instructions (multiple per clock if available) In Order.


Pool of Functional Units - This is actually a collection of the modules below with controlling logic. In the current simulations, there are 5 alu_functional_units, 1 br_functional_unit, and 5 ls_functional_units in the "pool". Each type proceses certain types of instructions. The number of alu, branch, load/store functional, etc. units are determined by individual parameters. This allows the design to be varied between simulations to see the effects of different numbers of functional units. In general the design has been parameterized for many areas of the design allowing different simulations to determine which are optimal parameters for a given target CPU.

Module Name % Written %Debugged Description
alu_functional_unit.v 80 75 ALU logic. Contains Logical functions, add, subtract, etc. logic.
br_functional_unit.v 70 40 Branch Functional Unit. enough written to just pass instructions on to commit.v so they don't hold up the data processing instructions I'm currently debugging.
fm_functional_unit.v 0 0 Floating Point Multiply Functional Unit. Not currently needed for RV32I version.
fa_functional_unit.v 0 0 Floating Point Add Functional Unit. Not currently needed for RV32I version.
ls_functional_unit.v 100 5  
sys_functional_unit.v 50 5 System functional unit. Handles system instructions such as CSR for the RV32IM

 

L/S Process - This is also a collection of several moduels listed below. The module always consists of 1 ls_queue.v, 1 ls_dependency control.v (which includes 1 ls_cam_cache.v), several lsrs.v, and 1 ls_mem_rw.v. The number of lsrs.v is a variable that can be changed. In current simulations there are 5 lsrs.v (matching the number of ls_functional_unit.v)

Module Name % Written %Debugged Description
ls_process.v 100 5 Load/Store Processing top module. This module connects the ls_queue.v, ls_dependency.v, lsrs.v and ls_mem_rw.v modules into one module.
ls_queue.v 100 5 Load/Store reordering queue. Holds instructions only until they can be output "In Order" (L/S order) to ls_dependency.v
ls_dependency_control.v 100 5 Load/Store Dependency Control. Similar to dependency_control, but uses addresses resolved in ls_functional_unit.v instead of the arch. registers.
ls_cam_cache.v 100 5 Content Addressable Memory used as a cache to store N Load/Store addresses, tags, and flags.
lsrs.v 100 5 Load/Store Reservation Station. Similar to llrs.v in that it temporarily holds pending L/S instructions from ls_dependency.v

 

Hit Web Stats unique visitors since Mar. 3, 2016
Fast Counters

 

rss feed