Skip to content
This repository was archived by the owner on Jul 24, 2019. It is now read-only.

Conversation

@galthaus
Copy link
Contributor

@galthaus galthaus commented Jan 27, 2017

I have been trying to bring up Openstack on Kubernetes deployed by DigitalRebar with Kargo on KVM instances. :-)

On my KVM systems, I have "real" disks attached for ceph consumption. These changes all the current defaults to be maintained, but real disks can be injected into the configuration. It assumes system homogeneity but it is a start.


This change is Reviewable

values.yml or cli override - still need docs
Copy link
Collaborator

@v1k0d3n v1k0d3n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for submitting a PR! we're going to need a lot of expertise with ceph in the near future.

privileged: true
env:
- name: CEPH_DAEMON
value: osd_directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a boolean conditional used for this approach. this allows the deployment to be more flexible. this is partially our fault...we are in the process of defining where we would want conditionals. docs take a while, but will really help new developers. we need to define still what this ceph conditional would be. let me nail this down with @alanmeadows submit in the PR notes.

Example of conditional in deployment:
example: https://github.com/att-comdev/openstack-helm/blob/master/mariadb/templates/deployment.yaml#L157-L172

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a problem, but there are more the two ceph osd options, I think. Now we may want validators that will fail (if that is possible in helm).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree. this is something we have discuss as well; "what do we do when there are a, b, c, etc options". this is something we need to lay out in docs for developers who are contributing. we're at that point now, and we're taking just a little step back from developing so we can layout clear, sane goals for developers to follow. to this point in our two month project we've written issues and folks have plucked out issues they know we want to collectively solve, but this is changing now that we're trying to find a permanent home for the project (i want these docs drafted now).

be overridden by adding additional set parameters. This will tell each osd to use physical disks
/dev/sdb and /dev/sdc as drives.
```
admin@kubenode01:~$ helm install --set network.public=$osd_public_network,osd.daemon="osd",osd.zap="1",ods.disks="0:sdb 1:sdc" --name=ceph local/ceph --namespace=ceph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. i like the override option. ceph is going to be one of those charts we're going to need a lot of conditionals and it will end up very different than it stands today!

Copy link
Contributor

@alanmeadows alanmeadows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good "real world" enhancement, but I think we need to think through how single-host + multi-disk should be handled - one OSD per disk, and get helm to manage that. Will that fit within the existing daemonset model or do we need to step outside that to support this?

An armchair idea we had was what would be required is:

hostname:
/dev/sdc
/dev/sdd

and so on, as values input, allowing helm to create host-locked (node labels pinned to specific hosts, with specific physical disks fed into that specific deployment.

value: osd_directory
value: {{ .Values.osd.daemon }}
# Type, Disks, and Zap are ignored for directory-based OSDs.
- name: OSD_TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of these need to be conditional if .Values.osd.daemon is osd_directory. It seems like they should even if just from a clarity perspective--e.g.

{{- if not eq .Values.osd.daemon "osd_directory" }}
...
{{- end }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entrypoint.sh script doesn't, but it would be a safety thing more than anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for safety when we talk about zapping.

# Change to osd to use reall disks and set zap to 1
daemon: osd_directory
type: devices
disks: "0:sdb 1:sdc 2:sdd 3:sde"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to examine this approach more closely -- isn't the ceph best practice recommendation to leverage one osd per physical disk?

Helm can be used to iterate over the disks, creating an OSD per disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what the entrypoint script does when you specify the TYPE Devices. It actually launches and aggregates the logging for each osd. It is all in the single container.

Copy link
Contributor

@intlabs intlabs Jan 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offhand I think it would be better to launch a container per device, rather than rely on the launching of multiple processes that entrypoint.sh does. The suggestion Alan has made from his arm-chair would be worth exploring here I think. The primary downside I see here would be the need to specify the OSD/Host topology at chart launch rather than have a somewhat organic approach to deployment, but there I'm on the fence as to whether this constraint is actually a limitation or a sensible belt and braces approach to ensuring our cluster does what we expect.

If we explored this option (and from that decide to go down this path) it would be fantastic to try and get a somewhat standardised format for this values specification between different storage backends, as I believe Gluster may use a similar pattern currently.

Copy link
Contributor

@alanmeadows alanmeadows Jan 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for articulating this at the chart level would be two fold -- one is that we do what ceph wants us to do -- one osd process per physical device. The second is that its very clear what is going to happen at the expense of forcing the operator to be verbose. I think this is a must when we talk about "zapping" disks or start working through workflows on physical host disk replacement. Once a container is running multiple processes, you've just given up several kubernetes process management resiliency features, such as restarting the container when the process dies. I think supervisor and other intermediaries (entrypoint.sh spawning multiple processes) need to have good reason for being in the middle.

If we find that this causes tremendous operational pain, I'm willing to entertain having entrypoint.sh manage the processes but I think we should try to do the right thing first and see what issues we uncover.

I think ending up with chart overrides that look like:

osd_map:
  $hostname:
    device_regexp: /dev/sd[e-z]+ # may prove difficult to allow helm to iterate to create container:osd definitions but would be convenient

or more likely, to enable helm to iterate over the list creating an osd container per device:

osd_map:
  $hostname:
    device_list: 
      - /dev/sdb
      - /dev/sdc
...

Is not really requiring the operator to go to extreme lengths and it supports varying disk capabilities for each host. Also, if there is a discrepancy between what the operator expects to be on the host and what we're able to launch an OSD against, it is immediately realized. For someone doing this at scale, that is an important distinction. I believe an entrypoint.sh may likely hide an issue such as having six hosts each with six disks, and one of them only having four online, for instance.

This also is further complicated if operators want to dedicated some devices (e.g. SSDs) to journals as they need to help us identify those.

I would also add we should think how we can support /dev/disk-by* type identification/wildcards for this as opposed to spelling out actual device paths such as /dev/sdb. In other words, the operator may know each of his hosts have two SSDs for journaling, but they may not be equivalent /dev/sdX on all hosts.

Even if the gluster use case doesn't require anything like an "osd per device", the mapping of disks to rope in for dedicated storage is likely similar so this has value to work through for any storage backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanmeadows - I agree, but I was trying to get something to play with quicker to get better feedback on the actual problem space. I also agree that the way ceph chooses to run multiple OSD processes in the same container misses out on some of the k8s/container value, but it does understand the nuances of running multiple OSD process on the nodes (like port assignment and rotation).

What you describe is fine and actually works quite nicely with what DR can provide from a hardware enumeration perspective. DR can build drives for nodes in this environment and update the map as appropriate (even with disk-by-* ids). My helm foo is not good enough to understand or build this, yet. I was mostly building this to see what could be done.

So, what do we want to do with this one?

@v1k0d3n
Copy link
Collaborator

v1k0d3n commented Mar 6, 2017

there hasn't been activity on this PR in quite some time. the issue will end up getting dropped once the project is imported in openstack proper (potentially next week).

@v1k0d3n
Copy link
Collaborator

v1k0d3n commented Mar 8, 2017

there doesn't seem to be any interest in contributing back to the requests that have been recommended. closing this PR.

@v1k0d3n v1k0d3n closed this Mar 8, 2017
@alanmeadows
Copy link
Contributor

Let's make sure to note links to these somewhere in the roadmap document perhaps. It contains discussion points we want to preserve because even though the issue itself is closed, the work still needs to be done. The new request, blueprint, whatever form that takes within OpenStack should benefit from this conversation/direction.

@v1k0d3n
Copy link
Collaborator

v1k0d3n commented Mar 8, 2017

@alanmeadows yes, I'm actually in the process of including these items into the roadmap docs and slating the rework for a release (i'm looking at 0.3.0 or 0.4.0 as a target). the discussion points you made about drive zapping, and general drive layout definition in the chart values is a perfectly valid ask/approach for this rework effort.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants