Genvid Forum

Cannot start game job


#1

Hi,

When I click start on my game job from the cluster UI (for the AWS cloud, it’s fine local), it now just sits there spinning in the pending state.

The AWS/AMI machine is showing the following which I assume is relevant:

image

(this is from the python script that runs automatically from startup)

Do you have any suggestions for how I can debug what’s going on here please? I have tried doing a clean-config/load-config-sdk cycle already, to no avail.

Thanks,

Adrian


#2

Hi Adrian,

I’ve emailed you so we can schedule a call.

Thanks,
Sophie


#3

FYI, If anyone stumbles across this, it seems to be related to the secret-id being wrong in the nomad settings on the AWS Windows machine running the game, after a reboot.

The workaround I used was to change the UUID for the nomad client (something I learnt from reading up elsewhere on issues with nomad)

The UUID can be found in Z:\services\nomad\data\client\client_id. Even just a single digit change to the UUID is enough to get it working again, as it tricks nomad into thinking it is a different node,

Before hacking around with it, you can diagnose if this is your issue by looking at the nomad.err/nomad.out logs in z:\services\nomad, there will be a bunch of complaints about the secret id being wrong.

I’m almost certain there’ll be a cleaner workaround to fetch the “correct” secret id from somewhere, but that’s beyond me …


#4

Sorry for my late answer. The client_id is based on a fingerprint of the machine and the way it is built seem to change after a reboot of the windows machine. Since the machine doesn’t change it’s name, when its client_id doesn’t matched, nomad refused it to make it part of his network. A quick workaround is to do a nomad gc but I think the current nomad no longer has this problem. It has sadly other problems with Windows and IPv6 interface, but we are working on it.