Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refdesign-sim demo fails on Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid #18

Open
gricardo99 opened this issue Jun 2, 2023 · 5 comments

Comments

@gricardo99
Copy link

Hello,
I'm trying to run the refdesign-sim demo, connecting to the Xilinx QEMU VM. The QEMU is running and waiting for connection:

(qemu) device_add remote-port-pci-adaptor,bus=rootport1,id=rp0
Failed to connect to 'machine-x86/qemu-rport-_machine_peripheral_rp0_rp': Connection refused
info: QEMU waiting for connection on: disconnected:unix:machine-x86/qemu-rport-_machine_peripheral_rp0_rp,server=on

The host is:
g++ --version g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

 > verilator --version
Verilator 4.038 2020-07-11 rev v4.036-114-g0cd4a57ad

SystemC 2.3.3

 > ldd tests/rtl-bridges/pcie/refdesign-sim
        linux-vdso.so.1 (0x00007ffd663da000)
        libsystemc-2.3.3.so => /lib/x86_64-linux-gnu/libsystemc-2.3.3.so (0x00007fc0d02df000)

I get a strange error when running refdesign-sim on the host:

> sudo ./refdesign-sim unix:machine-x86/qemu-rport-_machine_peripheral_rp0_rp 1000
        SystemC 2.3.3-Accellera --- Mar 17 2022 13:55:26
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED

Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid
In file: ../../test-modules/signals-common.h:97

After some debug it appears this is from this line: refdesign-sim.cc

108                         snprintf(pname, sizeof(pname) - 1, "m_axi_usr_%d_", bi);
109                         signals_m_tieoff[i].connect(ep_bridge, pname);

Where the verilated ep_bridge module (Vpcie_ep) seems to only return some generic port names for its child objects, which leads to this error.
For example, I print all child object names returned in the signal_find_child function, and the names are all of the form:
port_0
port_1
port_2
etc...

Looking at the Vpcie_ep.cpp/h files, and the module appears to have the correct SC ports, which should match with the refdesign-sim.cc code, if ep_bridge children returned the names of the SC module port variables.
Example: tests/rtl-bridges/pcie/obj_dir/Vpcie_ep.h, these should match the connections made in refdesign-sim.cc

   37     // PORTS
   38     // The application code writes and reads these signals to
   39     // propagate new values into/out from the Verilated model.
   40     sc_in<bool> clk;
   41     sc_in<bool> resetn;
   42     sc_out<bool> usr_resetn;
   43     sc_in<bool> s_axi_pcie_m0_awvalid;
   44     sc_out<bool> s_axi_pcie_m0_awready;
   45     sc_in<bool> s_axi_pcie_m0_wvalid;
   46     sc_out<bool> s_axi_pcie_m0_wready;
   47     sc_out<bool> s_axi_pcie_m0_bvalid;

Any ideas why the child names in ep_bridge are not matching, and seem to be generic port+number, such as 'port_0' etc..?

@gricardo99
Copy link
Author

gricardo99 commented Jun 7, 2023

Just an update. I was able to get passed the refdesign-sim errors that I reported in my initial post, but my workaround suggests there's something wrong with my setup/steps, but I'm not sure what exactly.

After playing around with SystemC (apologies, but I'm a systemC noob), I realized that any sc_in/out will default to the generic names that I was seeing (i.e. port_0, port_1, etc...). As a side-note, I also moved to using SystemC 2.3.2, since that was in the instructions, and I wanted to rule that out as a cause (I hit the same initial refdesign-sim error with both 2.3.3 and 2.3.2).

Looking at the verilated ep_bridge module (tests/rtl-bridges/pcie/obj_dir/Vpcie_ep.h), I could see that the ports do not have names, and the default SC_CTOR constructor is being used.
I hacked up Vpcie_ep.h to comment-out the default SC_CTOR constructor, and add the code to include the port constructors that pass in the port names.
E.g.: obj_dir/Vpcie_ep.h

12925   public:
12926    // SC_CTOR(Vpcie_ep);
12927     typedef Vpcie_ep SC_CURRENT_USER_MODULE;
12928     Vpcie_ep( ::sc_core::sc_module_name ) :
12929      s_axi_pcie_m0_awvalid("s_axi_pcie_m0_awvalid"),
12930      s_axi_pcie_m0_awready("s_axi_pcie_m0_awready"),
12931      s_axi_pcie_m0_wvalid("s_axi_pcie_m0_wvalid"),
12932      s_axi_pcie_m0_wready("s_axi_pcie_m0_wready"),
12933      s_axi_pcie_m0_bvalid("s_axi_pcie_m0_bvalid"),
12934      s_axi_pcie_m0_bready("s_axi_pcie_m0_bready"),
                    ...etc...
14223       s_axi_usr_5_ruser("s_axi_usr_5_ruser"),
14224       s_axi_usr_5_wid("s_axi_usr_5_wid")
14225          { };
14226     virtual ~Vpcie_ep();

Unfortunately re-running make after obj_dir/Vpcie_ep.h source code changes was not possible, since this rebuilds the verilated source code Vpcie_ep.h/.cc, and clobbers any changes.

Thus, I manually rebuild refdesign-sim by calling each relevant compile/link step separately, to pick up my manual verilated source code changes:

  1. recompiling: obj_dir/Vpcie_ep.o

  2. relinking all the obj_dir/Vpcie_ep*.o files into the static lib: Vpcie_ep__ALL.a

  3. recompiling refdesign-sim.o

  4. recompiling refdesign-sim

    Luckily the Make steps do print out each of the compile/link steps such that I could rerun my above steps after modifying the verilated source code.

After this, I'm able to connect refdesign-sim to the Qemu VM:

> sudo ./refdesign-sim unix:/nis/asic/us_dump2/ricardga/temp/qemu_playground/machine-x86/qemu-rport-_machine_peripheral_rp0_rp 1000

        SystemC 2.3.2-Accellera --- Jun  2 2023 16:03:03
        Copyright (c) 1996-2017 by all Contributors,
        ALL RIGHTS RESERVED

Info: (I702) default timescale unit used for tracing: 1 ps (./refdesign-sim.vcd)
connect to /nis/asic/us_dump2/ricardga/temp/qemu_playground/machine-x86/qemu-rport-_machine_peripheral_rp0_rp

And on the Qemu, after the device_add commands, I can see this device:

01:00.0 Serial controller: Xilinx Corporation Device d004 (rev 12) (prog-if 01 [16450])
	Subsystem: Red Hat, Inc. Device 1100
	Physical Slot: 0
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 23
	Region 0: Memory at fe800000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [40] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Kernel driver in use: vfio-pci

Note that this also shows "Kernel driver in use: vfio-pci", after the refdesign-sim steps on the Qemu guest:

$ sudo modprobe vfio-pci nointxmask=1
$ sudo sh -c 'echo 10ee d004 > /sys/bus/pci/drivers/vfio-pci/new_id'

However, when I go to run the test on the Qemu guest, the test hangs:

> ls -l /sys/bus/pci/devices/0000\:01\:00.0/iommu_group
lrwxrwxrwx 1 root root 0 Jun  7 20:35 /sys/bus/pci/devices/0000:01:00.0/iommu_group -> ../../../../kernel/iommu_groups/3
ubuntu@ubuntu:~/Downloads/github/libsystemctlm-soc/tests/rtl-bridges/pcie$ sudo ./test-pcie-ep-master-vfio 0000:01:00.0 3 0

        SystemC 2.3.2-Accellera --- Jun  2 2023 18:34:05
        Copyright (c) 1996-2017 by all Contributors,
        ALL RIGHTS RESERVED
Device supports 9 regions, 5 irqs
mapped 0 at 0x7f9ec018c000

Info: (I702) default timescale unit used for t

I get no further output after this. I've tried a few different times, including after adding ssh port forwarding to the qemu guest and running from an ssh/terminal (in case the stdio/serial port was somehow causing some issue).

I do see refdesign-sim.vcd file increasing in size, since launching it. The Qemu quest just hangs and is non-responsive at this point. I can however Ctrl-C refdesign-sim and that seems to crash/kill the Qemu VM.
After that I see:

(qemu) qemu-system-x86_64: /machine/peripheral/rp0/rp: Disconnected clk=144537801101 ns

And qemu VM dies.

So I'm hoping someone my have some clues/insights into what is going wrong for me here, or some debug tips. As I mentioned, the fact that I had to hack a workaround for the initial refdeisgn-sim error suggests to me that I must have something setup/configured wrong, or I'm missing a step.

Thanks!

@kevinyuan
Copy link

Hi @gricardo99 ,

I got the same error. Have you got a chance to fix or workaround it ?

Best regards :)

Kevin

@gricardo99
Copy link
Author

gricardo99 commented Nov 7, 2023

Hi @kevinyuan
What error did you hit? My initial compile error, or the subsequent hang (after my compile error workaround)?
Unfortunately no, I never resolved these issues with the demo. I had to move on to other things, so I haven't looked at this in a while. It would be great if someone from the project, maybe @edgarigl or @franciscoIglesias could take a look? Hopefully it's something very simple with the setup.

@kevinyuan
Copy link

Hi @gricardo99 ,

I found this error happens with verilator from Ubuntu repo, i.g. install via "apt install ...";

I tried to compile systemc + verilator from source, then the connection() error disappeared and double-checked by looking into debug message , i.e. there 's no port_0/1/2 anymore.

However, another error happened with tye dynamic_cast<> which try to cast sc_object to sc_out.

@kevinyuan
Copy link

Hi @gricardo99 ,

I hit this error initially:

Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants