This fixes the vagrant based sandbox from not working. This was particularly
annoying to track down because of not having `set -x` in `setup.sh` but
what looks like xtrace output in stderr. The xtrace output on stderr
was actually from the `generate_certificates` container:
```
provisioner: 2021/04/26 21:22:32 [INFO] signed certificate with serial number 142120228981443865252746731124927082232998754394
provisioner: + cat
provisioner: server.pem
provisioner: ca.pem
provisioner: + cmp
provisioner: -s
provisioner: bundle.pem.tmp
provisioner: bundle.pem
provisioner: + mv
provisioner: bundle.pem.tmp
provisioner: bundle.pem
provisioner: Error: No such object:
==> provisioner: Clearing any previously set forwarded ports...
==> provisioner: Removing domain...
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
```
I ended up doubting the `if ! cmp` blocks until I added `set -euxo pipefail` and
the issue was pretty obviously in docker-compose land.
```
$ vagrant destroy -f; vagrant up provisioner
==> worker: Domain is not created. Please run `vagrant up` first.
==> provisioner: Domain is not created. Please run `vagrant up` first.
Bringing machine 'provisioner' up with 'libvirt' provider...
==> provisioner: Checking if box 'tinkerbelloss/sandbox-ubuntu1804' version '0.1.0' is up to date...
==> provisioner: Creating image (snapshot of base box volume).
==> provisioner: Creating domain with the following settings...
...
provisioner: 2021/04/27 18:20:13 [INFO] signed certificate with serial number 138080403356863347716407921665793913032297783787
provisioner: + cat server.pem ca.pem
provisioner: + cmp -s bundle.pem.tmp bundle.pem
provisioner: + mv bundle.pem.tmp bundle.pem
provisioner: + local certs_dir=/etc/docker/certs.d/192.168.1.1
provisioner: + cmp --quiet /vagrant/deploy/state/certs/ca.pem /vagrant/deploy/state/webroot/workflow/ca.pem
provisioner: + cp /vagrant/deploy/state/certs/ca.pem /vagrant/deploy/state/webroot/workflow/ca.pem
provisioner: + cmp --quiet /vagrant/deploy/state/certs/ca.pem /etc/docker/certs.d/192.168.1.1/tinkerbell.crt
provisioner: + [[ -d /etc/docker/certs.d/192.168.1.1/ ]]
provisioner: + cp /vagrant/deploy/state/certs/ca.pem /etc/docker/certs.d/192.168.1.1/tinkerbell.crt
provisioner: + setup_docker_registry
provisioner: + local registry_images=/vagrant/deploy/state/registry
provisioner: + [[ -d /vagrant/deploy/state/registry ]]
provisioner: + mkdir -p /vagrant/deploy/state/registry
provisioner: + start_registry
provisioner: + docker-compose -f /vagrant/deploy/docker-compose.yml up --build -d registry
provisioner: + check_container_status registry
provisioner: + local container_name=registry
provisioner: + local container_id
provisioner: ++ docker-compose -f /vagrant/deploy/docker-compose.yml ps -q registry
provisioner: + container_id=
provisioner: + local start_moment
provisioner: + local current_status
provisioner: ++ docker inspect '' --format '{{ .State.StartedAt }}'
provisioner: Error: No such object:
provisioner: + start_moment=
provisioner: + finish
provisioner: + rm -rf /tmp/tmp.ve3XJ7qtgA
```
Notice that `container_id` is empty. This turns out to be because
`docker-compose` is an empty file!
```
vagrant@provisioner:/vagrant/deploy$ docker-compose up --build registry
vagrant@provisioner:/vagrant/deploy$ which docker-compose
/usr/local/bin/docker-compose
vagrant@provisioner:/vagrant/deploy$ docker-compose -h
vagrant@provisioner:/vagrant/deploy$ file /usr/local/bin/docker-compose
/usr/local/bin/docker-compose: empty
```
So with the following test patch:
```diff
diff --git a/deploy/vagrant/scripts/tinkerbell.sh b/deploy/vagrant/scripts/tinkerbell.sh
index 915f27f..dcb379c 100644
--- a/deploy/vagrant/scripts/tinkerbell.sh
+++ b/deploy/vagrant/scripts/tinkerbell.sh
@@ -34,6 +34,14 @@ setup_nat() (
main() (
export DEBIAN_FRONTEND=noninteractive
+ local name=docker-compose-$(uname -s)-$(uname -m)
+ local url=https://github.com/docker/compose/releases/download/1.26.0/$name
+ curl -fsSLO "$url"
+ curl -fsSLO "$url.sha256"
+ sha256sum -c <"$name.sha256"
+ chmod +x "$name"
+ sudo mv "$name" /usr/local/bin/docker-compose
+
if ! [[ -f ./.env ]]; then
./generate-env.sh eth1 >.env
fi
```
We can try again and we're back to a working state:
```
$ vagrant destroy -f; vagrant up provisioner
==> worker: Domain is not created. Please run `vagrant up` first.
==> provisioner: Domain is not created. Please run `vagrant up` first.
Bringing machine 'provisioner' up with 'libvirt' provider...
==> provisioner: Checking if box 'tinkerbelloss/sandbox-ubuntu1804' version '0.1.0' is up to date...
==> provisioner: Creating image (snapshot of base box volume).
==> provisioner: Creating domain with the following settings...
...
provisioner: + setup_docker_registry
provisioner: + local registry_images=/vagrant/deploy/state/registry
provisioner: + [[ -d /vagrant/deploy/state/registry ]]
provisioner: + mkdir -p /vagrant/deploy/state/registry
provisioner: + start_registry
provisioner: + docker-compose -f /vagrant/deploy/docker-compose.yml up --build -d registry
provisioner: Creating network "deploy_default" with the default driver
provisioner: Creating volume "deploy_postgres_data" with default driver
provisioner: Building registry
provisioner: Step 1/7 : FROM registry:2.7.1
...
provisioner: Successfully tagged deploy_registry:latest
provisioner: Creating deploy_registry_1 ...
Creating deploy_registry_1 ... done
provisioner: + check_container_status registry
provisioner: + local container_name=registry
provisioner: + local container_id
provisioner: ++ docker-compose -f /vagrant/deploy/docker-compose.yml ps -q registry
provisioner: + container_id=2e3d9557fd4c0d7f7e1c091b957a0033d23ebb93f6c8e5cdfeb8947b2812845c
...
provisioner: + sudo -iu vagrant docker login --username=admin --password-stdin 192.168.1.1
provisioner: WARNING! Your password will be stored unencrypted in /home/vagrant/.docker/config.json.
provisioner: Configure a credential helper to remove this warning. See
provisioner: https://docs.docker.com/engine/reference/commandline/login/#credentials-store
provisioner: Login Succeeded
provisioner: + set +x
provisioner: NEXT: 1. Enter /vagrant/deploy and run: source ../.env; docker-compose up -d
provisioner: 2. Try executing your fist workflow.
provisioner: Follow the steps described in https://tinkerbell.org/examples/hello-world/ to say 'Hello World!' with a workflow.
```
:toot:
Except that my results are not due to the way docker-compose is being installed
at all. After still running into this issue when using a box built with the new
install method I was still seeing empty docker-compose files. I ran a bunch of
experiments to try and figure out what is going on. The issue is strictly
in vagrant-libvirt since vagrant-virtualbox works fine. Turns out data isn't
being flushed back to disk at shutdown. Both calling `sync` or writing multiple
copies of the binary to the fs (3x at least) ended up working. Then I was informed
of a known vagrant-libvirt issue which matches this behavior, https://github.com/vagrant-libvirt/vagrant-libvirt/issues/1013!
Fixes#59
Signed-off-by: Manuel Mendez <mmendez@equinix.com>
The tinkerbell.sh script ends up doing some other work after
calling setup.sh and has set -x enabled so the whats_next message
is likely to be missed. So now save it for later reading as the last
thing done.
Signed-off-by: Manuel Mendez <mmendez@equinix.com>
pipefail for more safety and xtrace for better debuggability.
The missing xtrace here is likely what led to the docker-compose
issue going unfixed for so long as the last bit of output was
from the gencerts container and did not make any sense (because it
wasn't the issue :D ).
Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Both [[ ]] and (( )) bashisms are better than the alternative
in POSIX sh, since they are builtin and don't suffer from quoting
or number-of-args issues.
Signed-off-by: Manuel Mendez <mmendez@equinix.com>
## Description
This is a follow-up to #76 which introduced a failure:
```
provisioner: ./setup.sh: line 117: NAT_INTERFACE: unbound variable
```
## Why is this needed
Unbreak `setup.sh` when used by Vagrant
Fixes#77
## How Has This Been Tested?
I used the following simple test case. It works now that the variable is declared first, but still breaks as reported without the fix.
```bash
#!/bin/bash
set -eu
NAT_INTERFACE=""
if [ -r .nat_interface ]; then
NAT_INTERFACE=$(cat .nat_interface)
fi
if [ -n "$NAT_INTERFACE" ] && ip addr show "$NAT_INTERFACE" &>/dev/null; then
echo "$NAT_INTERFACE"
fi
```
## How are existing users impacted? What migration steps/scripts do we need?
Vagrant users are currently broken as reported in the community Slack.
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
## Description
The NAT setup commands assume that the interface is named eth1, when clearly from the [documentation](https://github.com/tinkerbell/tinkerbell-docs/blame/master/docs/setup/equinix-metal-terraform.md#L118) it is named `enp1s0f1`. This commit fixes the NAT setup commands accordingly.
## Why is this needed
NAT doesn't work by default on Equinix Metal when following the documentation
## How Has This Been Tested?
- [x] Tested with Terraform in Equinix Metal
## How are existing users impacted? What migration steps/scripts do we need?
Existing sandboxes (that are broken) should either be rebuilt, or can run the commands manually to enable NAT
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Signed-off-by: Nahum Shalman <nshalman@equinix.com>
## Description
Add `jq` to the nix-shell environment
## Why is this needed
There are bits of documentation that use the sandbox and reference using `jq` from the command line.
This makes them work nicely.
## How Has This Been Tested?
On NixOS running `nix-shell` now has `jq` in the PATH.
## How are existing users impacted? What migration steps/scripts do we need?
N/A
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
## Description
This change causes Nix to pull in Terraform v0.14 rather than v0.12 when users run `nix-shell`
## Why is this needed
Without this change I get this error on both Mac and NixOS:
```
[nix-shell:~/sandbox/deploy/terraform]$ terraform init --upgrade
Warning: Provider source not supported in Terraform v0.12
on main.tf line 4, in terraform:
4: metal = {
5: source = "equinix/metal"
6: version = "1.0.0"
7: }
A source was declared for provider metal. Terraform v0.12 does not support the
provider source attribute. It will be ignored.
(and 2 more similar warnings elsewhere)
Error: Unsupported Terraform Core version
This configuration does not support Terraform version 0.12.30. To proceed,
either choose another supported Terraform version or update the root module's
version constraint. Version constraints are normally set for good reason, so
updating the constraint may lead to other errors or unexpected behavior.
```
## How Has This Been Tested?
I used this on both a Mac and a NixOS machine using `nix-shell`
## How are existing users impacted? What migration steps/scripts do we need?
Running `nix-shell` on existing checkouts will now pull down a newer version of Terraform.
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
As I explained here
https://github.com/tinkerbell/sandbox/pull/66#issuecomment-803009169 the
current OSIE on master broke how tink-worker gets installed in sandbox.
For a series of bad habits, the PR got merged even if e2e tests are
broken leaving sandbox/master to a not working state
This commit reverts OSIE back to a fully operational version
As I explained here
https://github.com/tinkerbell/sandbox/pull/66#issuecomment-803009169 the
current OSIE on master broke how tink-worker gets installed in sandbox.
For a series of bad habits the PR got merged even if e2e tests are
broken leaving sandbox/master to a not working state
This commit reverts OSIE back to a fully operational version
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
## Description
Allows for deploying the vagrant/libvirt setup without NAT and with multiple workers, which enables testing with cluster-api-provider-tink
## Why is this needed
Helps with testing CAPT
## How Has This Been Tested?
Currently testing at the moment, but all testing will consist of manual testing with vagrant/libvirt
## How are existing users impacted? What migration steps/scripts do we need?
This could affect existing vagrant/libvirt users if they have an existing worker running when they update, not sure if there is a good way to avoid that, though.
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
## Description
Using a custom endpoint instead of using the default endpoint that `hegel` adjusts
<!--- Please describe what this PR is going to change -->
## Why is this needed
The reason behind that is, while using the sandbox in combination with tinkerbell's example [workflows](https://github.com/tinkerbell/workflows). The [functions.sh](https://github.com/tinkerbell/workflows/blob/master/ubuntu_18_04/00-base/functions.sh) will fail due to lack of information in the retrieved metadata from `hegel`. The default endpoint filters out all the needed metadata such as `plan_slug`. This PR removes that filtration criteria.
Also I am not sure, I think we are safe to provide these info(the full hardware spec) to the worker while using the Sandbox setup, as this is mainly used as an example setup not as a production one.
<!--- Link to issue you have raised -->
Fixes: #64
## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Yes, it was tested locally by setting the env var in the docker-compose.yml file and batch it in the sandbox setup.
## How are existing users impacted? What migration steps/scripts do we need?
<!--- Fixes a bug, unblocks installation, removes a component of the stack etc -->
<!--- Requires a DB migration script, etc. -->
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Apparently the idea to prefix a package with an underscore is not that
smart as I thought. Yes `go test` does not run it by default when you
run `go test ./...` but also other commands like `go mod tidy` do not
work consistently.
Nothing changes in practice. By default only unit tests run. Setting the
new environment variable: `TEST_WITH_VAGRANT` you include the test who
uses vagrant.
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
## Description
Mounting the `current_versions.sh` file to the target provisioner when installing TinkerBell on Equinix using terraform
<!--- Please describe what this PR is going to change -->
## Why is this needed
Because the `generate-envrc.sh` will fail and no TinkerBell env will be created.
<!--- Link to issue you have raised -->
Fixes: #60
## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Simply ran the provisioner on Equinix and the env file was created with all the needed info.
## How are existing users impacted? What migration steps/scripts do we need?
None
<!--- Fixes a bug, unblocks installation, removes a component of the stack etc -->
<!--- Requires a DB migration script, etc. -->
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
I am not sure when it happens, it can be when we removed the NGINX_IP,
or when we checked that every services were using ports OR network_mode
but we exposed nginx and boots over the same port.
This commit fixes that.
I am not sure when it happens, it can be when we removed the NGINX_IP,
or when we checked that every services were using ports OR network_mode
but we exposed nginx and boots over the same port.
This commit fixes that.
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
… container definition
## Description
Resolves#53
## Why is this needed
This conflict causes container creation to fail.
Fixes: #
## How Has This Been Tested?
I ran the setup and was able to run a workflow and deployment without issue.
## How are existing users impacted? What migration steps/scripts do we need?
No impact.
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
This commit contains a new utility that helps to automate a version bump
for sandbox.
You can run this command to get the vibe of what it does.
```
$ go run cmd/bump-version/main.go -help
```
In order to try it out you can run this command form sandbox root. By
default it won't overwrite anything. It will print to stdout a new
version of the current_versions.sh file where all the images are
calculate cloning the various repositories
```
$ go run cmd/bump-version/main.go
```
If you want to overwrite the current_versions file you can use the flag
`-overwrite`.
More will come but for now, that's the PoC. Ideally this can be hooked
to CI/CD and run periodically, opening a PR that can be evaluated and
merged.
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
## Description
Updates [Packet Terraform](https://docs.tinkerbell.org/setup/packet-terraform/) plan to use the Equinix Metal provider.
## Why is this needed
Consistent with rebranding efforts across the organization.
Fixes: #
## How Has This Been Tested?
This plan validates, and applies as expected (and has previously) with the renamed resources, and updates outputs.
## How are existing users impacted? What migration steps/scripts do we need?
Existing users may need to reinitialize their Terraform environment, but existing resources in state can be imported.
## Checklist:
I have:
- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
This PR contains a provisioning mechanism for the Vagrant boxes we ship
as part of Sandbox.
In order to self contain and distribute the required dependencies for Tinkerbell
and Sandbox without having to download all of them runtime we decided to use
[Packer.io](https://packer.io) to build boxes that you can use when provisioning
Tinkerbell on Vagrant.
Currently the generated boxes are available via [Vagrant
Cloud](https://app.vagrantup.com/tinkerbelloss).
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>