HDL Express

Welcome to HDL Express, the personal webpages of Kirk Weedman

HDL stands for Hardware Description Language.

This website also contains information on various Verilog/FPGA tutorials, Alternative Energy projects, and the progress of a new CPU architecture that I'm designing.

I'm an electronic design engineer specializing in contract Verilog RTL FPGA design, functional verification, testbench creation, simulation and debug. I have a varied background in other disciplines too.

My resume: Download the PDF version here. Download Word format here

Currently available for new FPGA design contract work.


Aug, 2017: Current Status of the new Out of Order CPU Architecture - based on a newly invented dynamic instruction scheduling algorithm.

Engineers that are currently working on this CPU Project. For those working on the project that want their name listed, please let me know. For those not wishing to have their name posted, I want to thank them for helping.

Anurag Bhat Masters in Electrical & Computer Engineering Portland State University
Apurva Gondole Masters in Electrical & Computer Engineering Portland State University
Parimal Kulkarni Masters in Electrical & Computer Engineering Portland State University
Shreya Mehrotra Masters in Electrical & Computer Engineering University of Florida


This new dynamic instruction scheduling algorithm is not like the typical OoO methods being used today and the goal is to improve OoO IPC. Most Modern Out of Order CPU's, either use the Tomasulo or Scoreboarding algorithm (or some variant) for dynamic scheduling. The method used in this CPU is completely different and simpler although there are several specific rules it follows. There is no register renaming, as the method effectively has infinite renaming, and thus no physical register set, just the architectural registers. It appears the logic for dynamic scheduling grows fairly linear as IPC increases instead of exponentially like the Tomasulo algorithm. The goal of building a working CPU is to prove the method works and to vary design parameters to get maximum performance (IPC throughput) for a given microarchitecture.

4/27/2017 - Since this algorithm can be applied to most any ISA, I am switching from the ARMv7 ISA to the RISC-V. RISC-V will be simpler to implement and there is good software tool support for it. Due to this change, a lot of the "% written/debugged" numbers in the tables below have decreased. For some modules, it makes little if any difference due to how the code is written.

See CPU History for more information about the progress on this architecture

See Branch Prediction Elimination for more info about the progress on this method.

1. Current simplified block diagram of new Out of Order Microarchitecture

2. Debugging RV32I instructions & flow through all stages. Getting ready to debug ls_process.v and all related files.

3. Adding RISC-V CSR instructions to the decode, disassembly(debug only), etc..modules

4. Creating a behavioural memory interface module for simulation purposes. Also creating an RTL version of a 32KB 8-way Set Assoc. L1 Data cache.

More details on specific modules:

Module Name % Written

% Debugged

KPU_OoOe.pptx 95 N/A PowerPoint presentation: overview and details about how this new microarchitecture works
top_tb1.v 20 10 Top level test bench #1. There's enough written to start the debug process.
cpu_params.h 80 50 Include file for all design modules. Additions/changes as needed.
disasm.v 90 80 Used for debugging. It displays RV32I instructions real time in ModelSim simulations.
kpu_oooe.v 85 60 Top level module. Ties all CPU design related modules together.
fetch.v 75 75 Fetches instructions from memory. Still needs logic to handle branches.
decode.v 75 80 Decodes RV32I instructions. Some system level instructions may not be implemented. Main object is to show new OoOE method, not a complete Arm CPU.
microcode.v 75 80 Logic & data for ROM/RAM microcode table.
dependency_control.v 100 50 Determines instruction execution dependencies. Seems to be working well so far
llrs.v 100 80 Linked List Reservation Stations (not similar to any known RS) 100% written. This is a linked list type queue. This module keeps track of all instructions and whether they are ready to start execution. The oldest instruction that's ready to be executed is offered to the appropriate functional unit in the Issue/Execute stage. Seems to be working well.
sll.v 100 95 Contains the core logic for a Singly Linked List queue and is instantiated in llrs.v. The queue list is linked from newest to oldest instruction.
pool.v 100 60 This code implements the Pool of Functional Units. This working code was formerly inside kpu_oooe.v as inline code, but now has its own module.
gpr.v 90 50 Contains CPU architectural registers and read/write logic connected to them. This is not shown in the above diagram, but commit.v writes to gpr.v (the architectural register set).
rob.v 95 95 Reorder Buffer to queue up Out of Order instructions to be committed In Order
commit.v 85 80 This commits/retires instructions (multiple per clock if available) In Order.

Pool of Functional Units - This is actually a collection of the modules below with controlling logic. In the current simulations, there are 5 alu_functional_units, 1 br_functional_unit, and 5 ls_functional_units in the "pool". Each type proceses certain types of instructions. The number of alu, branch, load/store functional, etc. units are determined by individual parameters. This allows the design to be varied between simulations to see the effects of different numbers of functional units. In general the design has been parameterized for many areas of the design allowing different simulations to determine which are optimal parameters for a given target CPU.

Module Name % Written %Debugged Description
alu_functional_unit.v 80 75 ALU logic. Contains Logical functions, add, subtract, etc. logic.
br_functional_unit.v 70 40 Branch Functional Unit. enough written to just pass instructions on to commit.v so they don't hold up the data processing instructions I'm currently debugging.
spf_functional_unit.v 0 0 Single Precision Floating Point Functional Unit. Not currently needed for RV32I version.
ls_functional_unit.v 100 5  
sys_functional_unit.v 50 5 System functional unit. Handles system instructions such as CSR for the RV32IM


L/S Process - This is also a collection of several moduels listed below. The module always consists of 1 ls_queue.v, 1 ls_dependency control.v (which includes 1 ls_cam_cache.v), several lsrs.v, and 1 ls_mem_rw.v. The number of lsrs.v is a variable that can be changed. In current simulations there are 5 lsrs.v (matching the number of ls_functional_unit.v)

Module Name % Written %Debugged Description
ls_process.v 100 5 Load/Store Processing top module. This module connects the ls_queue.v, ls_dependency.v, lsrs.v and ls_mem_rw.v modules into one module.
ls_queue.v 100 5 Load/Store reordering queue. Holds instructions only until they can be output "In Order" (L/S order) to ls_dependency.v
ls_dependency_control.v 100 5 Load/Store Dependency Control. Similar to dependency_control, but uses addresses resolved in ls_functional_unit.v instead of the arch. registers.
ls_cam_cache.v 100 5 Content Addressable Memory used as a cache to store N Load/Store addresses, tags, and flags.
lsrs.v 100 5 Load/Store Reservation Station. Similar to llrs.v in that it temporarily holds pending L/S instructions from ls_dependency.v


Caches - There are two level 0 caches in progress. One for the instruction fetch and one for the Load/Store memory access instructions (D0_Cache)


Hit Web Stats unique visitors since Mar. 3, 2016
Fast Counters


rss feed