Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some more Bare Metal guides #498

Merged
merged 4 commits into from
Feb 10, 2021

Conversation

GingerGeek
Copy link
Member

This is a WIP PR for adding more information on how to deploy OKD onto bare metal.

These are mainly lessons learnt "the hard way" from deploying OKD into the bare metal environment.

Guides and methods followed may not be best practise so am sharing early as I'm still writing the documents in case something is well off the mark of what should be done!

These are the guide I'm planning to author:

Document Overview
Installer Workspace When doing bare metal deploys you may be doing a lot of them whilst having to tweak config files, manifests and more. This document contains a few tips and tricks on workspace layout.
Manual DNS Configuration and Fix Fedora CoreOS (FCOS) resolv bug FCOS 33 contains a bug which breaks DNS during a fresh bare metal deploy. This document also explains how to inject custom DNS configuration into your deploy.
Disable or Enable certain Network Interfaces Some servers may have redundant management interfaces or extra NICs which you never expect to come up. This can delay startup. This document discusses how to push configurations to enable or disable network interfaces.
Dual Interface Metal (Public/Private adapters) You may be doing an OKD deployment where the nodes have multiple interfaces (e.g one for public and one for private). This document goes over some gotchas this has and also firewalls to ensure cluster traffic goes over the private address space where possible
Customising hostname logic Misconfigured hostnames will cause clusters to fail in bizarre ways. Sometimes you may need to inject custom scripts to resolve the hostname for a given node (ie call a central API or do PTR lookups)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 2, 2021
The OKD installation docs mention that setting valid PTR records will allow the system to automatically detect the hostname. This does not seem to conistently be the case. In addition you may be in an environment where you are unable to edit PTR records.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The OKD installation docs mention that setting valid PTR records will allow the system to automatically detect the hostname. This does not seem to conistently be the case. In addition you may be in an environment where you are unable to edit PTR records.
The OKD installation docs mention that setting valid PTR records will allow the system to automatically detect the hostname. This does not seem to consistently be the case. In addition you may be in an environment where you are unable to edit PTR records.

I think its been a NetworkManager bug in OKD 4.6 from december. It shouldn't be the case anymore, so probably not worth mentioning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed this bug in the second and third week of January during cluster deploys, if it's been fixed since then I wouldn't have noticed since I've been setting it manually via the MachineConfig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay, lets file a bug on OKD (most likely there's already one) and refer to it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if it's still reproducible on the latest version of OKD and file a bug if not. The related issue I looked at was #394 which is now closed - but most of that conversation was around vSphere.

```

## Edit `bootstrap.ign`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strongly not recommended. A better idea would be creating a new Ignition, which refers to unmodified bootstrap.ign - see ignition.config.merge in https://github.com/coreos/ignition/blob/master/docs/configuration-v3_2.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to edit the bootstrap.ign is not ideal but to use the merge methodology would be troublesome, I would either need to base64 the vanilla bootstrap.ign and include that, or it would require the files to be accessible on a server - which doesn't occur in my deploy pattern.

In my deploy usecase, the physical machines are net-booted into a "rescue" OS. The ignition files are sftp-ed into the rescue OS and then we use the run from container methodology to install to the local disk.

In future we will likely switch to embedding the ignition into an ISO created for each server.

I can switch the docs to base64 encode the original in but I actually think this adds more room for human error + vastly reduces the ability for someone to inspect/debug what's going on the system when something goes wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm, okay, lets describe this generally this is not recommended - but still possible of course

Copy link
Member Author

@GingerGeek GingerGeek Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately, there's no config for us to modify to impact the rendering of the bootstrap.ign in earlier steps to any significant degree which would be the far superior way.

You're really not going to like the workaround for nodes which have dual IPs (public/private) which requires a patch to bootkube.sh to ensure cluster traffic uses the correct interfaces!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added text that mentions editing the rendered filed are not recommended and pointed them to the merge functionality if that's something which suits their flow.

…d and that alternatives may be more suitable
@vrutkovs vrutkovs changed the title WIP: Add some more Bare Metal guides Add some more Bare Metal guides Feb 10, 2021
@vrutkovs vrutkovs merged commit b70b5e9 into okd-project:master Feb 10, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants