Genvid Forum

Reconnecting bastion to AWS after reboot


#1

Hi,

Following a reboot of a development PC with bastion setup on it, it is necessary to re-run genvid-bastion install (I think this is mentioned in another thread).

What steps are necessary to point this back at an existing Cloud cluster setup on AWS (which was setup using the genvid-bastion stuff on an earlier run), so that the cluster machines can be destroyed if necessary?

I followed the usual steps described under “The Cloud Environment” > “Create a Cluster”, and set the terraform settings to be like they were before, however pressing the Destroy button on the Terraform page says:

Destroy complete! Resources: 0 destroyed

Suggesting it’s not connected to the existing cluster in AWS ?


#2

Hi Adrian,

Running the genvid-bastion install command is necessary every time the dev PC is rebooted. We have a solution to this in the development pipeline but is not yet available. In the meantime, this command is still necessary.

If you created a new cluster on AWS but did not upload anything into it, this is why you got “Destroy complete! Resources: 0 destroyed” as a message. The cluster was successfully destroyed.

Thanks,
Sophie


#3

Should that destroy (terminate) the machines on AWS ? Because they were still up and running afterwards.

Thanks,

Adrian


#4

Hi Adrian,

It’s possible the status was not updated after the Destroy was initiated. I’ve never experienced this but I’ll investigate this further on my side to validate this behavior (it could be a minor bug).

So far, from what I’ve been able to see, there were 2 behaviors I observed:

Test 1:
Create Cluster > Import Module > Settings modified/saved > Terraform = Apply
Result: Destroy complete! Resources: 0 destroyed

Test 2:
Create Cluster > Import Module > Settings modified/saved > Terraform (did not click Apply)
Result: Destroy complete! Resources: 0 destroyed

Is it possible that you did not click Apply before you clicked Destroy? This seems to explain why there were 0 resources destroyed.

In both cases, I still had the status DOWN for each cluster I tested.

The best way to confirm if it was successfully destroyed and removed is to run 'genvid-clusters cluster-delete clustername`. An additional inspection on AWS confirmed that there were no traces remaining after running this command.

If this is a problematic behavior that causes issues, I will be more than happy to assist you with further debugging to correct it. Otherwise, I would recommend you upgrade to our 1.13.0 version which is scheduled to be released on Monday.

Thanks,
Sophie


#5

Hi Sophie,

Definitely did click Apply, as that was what created the servers on AWS for me. It was after clicking apply that my (local dev) machine had to be rebooted, installed Bastion again, and then clicked Destroy.

I will indeed try again after the new release.

Adrian


#6

Still having issues with this after upgrading to 1.13 …

I have an existing cluster I want to reconnect to. My AWS machine has been restarted (NOT terminated and not reinstalled from an image), which has generated a new IP address.

When I run:

genvid-sdk -c mycluster upload-iamges-sdk --update-config

I get the error

"Remote IP not found for for cluster mycluster. You may need to do apply on the cluster"

Where can that remote IP be set ?

Does that refer to a terraform apply? I don’t believe I want to do that again as I don’t want to recreate all the Amazon cluster machines, just reconnect to the existing ones after a reboot.

Thanks,

Adrian


#7

Hi Adrian,

You need to refresh the terraform information for your cluster, either by using the Refresh button in the Cluster Terraform interface, or by running the genvid-clusters terraform-refresh command, like this:

genvid-clusters refresh -c mycluster

Note that this is note a well-supported scenario, so please reports any problems you get. We are planning to improve on this in the next release or right after it.


#8

The refresh reports back with a private key (either from command line or from the UI)

However, the errors I mentioned above about the remote IP still occur. Same error if I try a “genvid-sdk clean” command too.

Adrian


#9

Could you send me the output of

genvid-clusters terraform-output -c mycluster

You can mark all sensible values in it (like public IPs and private keys), I’m only interested in what it was able to generate.


#10

{
“cluster”: {
“type”: “string”,
“value”: “mycluster”
},
“private_key_pem”: {
“type”: “string”,
“value”: “-----BEGIN RSA PRIVATE KEY
… key values go here …
-----PRIVATE KEY-----\n”
},
“region”: {
“type”: “string”,
“value”: “us-east-1”
}
}


#11

That’s look like a completly empty cluster.

Could you send me the output of:

genvid-clusters terraform-plan -c mycluster

It should not contain private information, just what terraform will try to create as resources.

genvid-clusters terraform-plan -c mycluster -d

Could be also useful: this will output a plan for destruction without actually destroying anything. It will help me figure out which resources terraform is actually tracking.

Thanks,
Fabien


#12

Have PM’ed you those details.
Hi Fabien,

After a reboot and installing bastion again, the terraform refresh is now just returning the cluster id and region, and no longer a private key (which it did yesterday).

A terraform plan apply now says it needs to add everything, and a terraform plan destroy says no changes are required. So it seems to have lost all links to the cluster machines that definitely exist in AWS (I have confirmed they are there, and that I’m using the correct prefix, and correct login credentials)

Looking at the cluster-ui, it is unable to connect, complaining that: ‘server_public_ips’ is not available from terraform output.

Thanks,

Adrian


#13

Hi Adrian,

That’s bad news, sadly. By default, the terraform state is saved in the Consul backend, so that’s probably mean your consul data get erased during a reboot (by default, they are inside %USERPROFILE%/.genvid/consul. There is a way to reimport this manually, but it is quite complex (it requires querying AWS to find the resources ID and than manually running terraform against the terraform workdir for each of them). I can guide you through them, but it is quite a run about. I’m not sure to understand how that happen however.

I’m preparing also a post on how to clear up your instance in AWS. I will try to get one today.


#14

Understood. For now I’ve recreated the cluster again from the beginning (which now works, is the good news) - this is quite time consuming as it has to load a new instance from the AMI again, as opposed to just using the existing machine.

In terms of clearing up the old disconnected instances, the process I followed was to run the terraform apply, see which resources it complained already existed, then manually delete them from the AWS console or command line.


#15

Hi Adrian,

We will work on making the process more robust in future releases.

Thanks!
Sophie