Allow CEPH OSD to use devices on the systems through #158

galthaus · 2017-01-27T20:19:49Z

I have been trying to bring up Openstack on Kubernetes deployed by DigitalRebar with Kargo on KVM instances. :-)

On my KVM systems, I have "real" disks attached for ceph consumption. These changes all the current defaults to be maintained, but real disks can be injected into the configuration. It assumes system homogeneity but it is a start.

This change is

values.yml or cli override - still need docs

v1k0d3n

thank you for submitting a PR! we're going to need a lot of expertise with ceph in the near future.

v1k0d3n · 2017-01-27T20:27:17Z

ceph/templates/daemonset-osd.yaml

            privileged: true
          env:
            - name: CEPH_DAEMON
-              value: osd_directory


I'd like to see a boolean conditional used for this approach. this allows the deployment to be more flexible. this is partially our fault...we are in the process of defining where we would want conditionals. docs take a while, but will really help new developers. we need to define still what this ceph conditional would be. let me nail this down with @alanmeadows submit in the PR notes.

Example of conditional in deployment:
example: https://github.com/att-comdev/openstack-helm/blob/master/mariadb/templates/deployment.yaml#L157-L172

Not necessarily a problem, but there are more the two ceph osd options, I think. Now we may want validators that will fail (if that is possible in helm).

i agree. this is something we have discuss as well; "what do we do when there are a, b, c, etc options". this is something we need to lay out in docs for developers who are contributing. we're at that point now, and we're taking just a little step back from developing so we can layout clear, sane goals for developers to follow. to this point in our two month project we've written issues and folks have plucked out issues they know we want to collectively solve, but this is changing now that we're trying to find a permanent home for the project (i want these docs drafted now).

v1k0d3n · 2017-01-27T20:34:19Z

docs/installation/getting-started.md

+be overridden by adding additional set parameters.  This will tell each osd to use physical disks
+/dev/sdb and /dev/sdc as drives.
+```
+admin@kubenode01:~$ helm install --set network.public=$osd_public_network,osd.daemon="osd",osd.zap="1",ods.disks="0:sdb 1:sdc" --name=ceph local/ceph --namespace=ceph


very nice. i like the override option. ceph is going to be one of those charts we're going to need a lot of conditionals and it will end up very different than it stands today!

alanmeadows

This is a good "real world" enhancement, but I think we need to think through how single-host + multi-disk should be handled - one OSD per disk, and get helm to manage that. Will that fit within the existing daemonset model or do we need to step outside that to support this?

An armchair idea we had was what would be required is:

hostname:
/dev/sdc
/dev/sdd

and so on, as values input, allowing helm to create host-locked (node labels pinned to specific hosts, with specific physical disks fed into that specific deployment.

alanmeadows · 2017-01-27T20:38:14Z

ceph/templates/daemonset-osd.yaml

-              value: osd_directory
+              value: {{ .Values.osd.daemon }}
+            # Type, Disks, and Zap are ignored for directory-based OSDs.
+            - name: OSD_TYPE


Do any of these need to be conditional if .Values.osd.daemon is osd_directory. It seems like they should even if just from a clarity perspective--e.g.

{{- if not eq .Values.osd.daemon "osd_directory" }} ... {{- end }}

The entrypoint.sh script doesn't, but it would be a safety thing more than anything else.

I'm all for safety when we talk about zapping.

alanmeadows · 2017-01-27T20:41:10Z

ceph/values.yaml

+  # Change to osd to use reall disks and set zap to 1
+  daemon: osd_directory
+  type: devices
+  disks: "0:sdb 1:sdc 2:sdd 3:sde"


We may want to examine this approach more closely -- isn't the ceph best practice recommendation to leverage one osd per physical disk?

Helm can be used to iterate over the disks, creating an OSD per disk.

That is what the entrypoint script does when you specify the TYPE Devices. It actually launches and aggregates the logging for each osd. It is all in the single container.

Offhand I think it would be better to launch a container per device, rather than rely on the launching of multiple processes that entrypoint.sh does. The suggestion Alan has made from his arm-chair would be worth exploring here I think. The primary downside I see here would be the need to specify the OSD/Host topology at chart launch rather than have a somewhat organic approach to deployment, but there I'm on the fence as to whether this constraint is actually a limitation or a sensible belt and braces approach to ensuring our cluster does what we expect.

If we explored this option (and from that decide to go down this path) it would be fantastic to try and get a somewhat standardised format for this values specification between different storage backends, as I believe Gluster may use a similar pattern currently.

The reason for articulating this at the chart level would be two fold -- one is that we do what ceph wants us to do -- one osd process per physical device. The second is that its very clear what is going to happen at the expense of forcing the operator to be verbose. I think this is a must when we talk about "zapping" disks or start working through workflows on physical host disk replacement. Once a container is running multiple processes, you've just given up several kubernetes process management resiliency features, such as restarting the container when the process dies. I think supervisor and other intermediaries (entrypoint.sh spawning multiple processes) need to have good reason for being in the middle.

If we find that this causes tremendous operational pain, I'm willing to entertain having entrypoint.sh manage the processes but I think we should try to do the right thing first and see what issues we uncover.

I think ending up with chart overrides that look like:

osd_map: $hostname: device_regexp: /dev/sd[e-z]+ # may prove difficult to allow helm to iterate to create container:osd definitions but would be convenient or more likely, to enable helm to iterate over the list creating an osd container per device: osd_map: $hostname: device_list: - /dev/sdb - /dev/sdc ...

Is not really requiring the operator to go to extreme lengths and it supports varying disk capabilities for each host. Also, if there is a discrepancy between what the operator expects to be on the host and what we're able to launch an OSD against, it is immediately realized. For someone doing this at scale, that is an important distinction. I believe an entrypoint.sh may likely hide an issue such as having six hosts each with six disks, and one of them only having four online, for instance.

This also is further complicated if operators want to dedicated some devices (e.g. SSDs) to journals as they need to help us identify those.

I would also add we should think how we can support /dev/disk-by* type identification/wildcards for this as opposed to spelling out actual device paths such as /dev/sdb. In other words, the operator may know each of his hosts have two SSDs for journaling, but they may not be equivalent /dev/sdX on all hosts.

Even if the gluster use case doesn't require anything like an "osd per device", the mapping of disks to rope in for dedicated storage is likely similar so this has value to work through for any storage backend.

@alanmeadows - I agree, but I was trying to get something to play with quicker to get better feedback on the actual problem space. I also agree that the way ceph chooses to run multiple OSD processes in the same container misses out on some of the k8s/container value, but it does understand the nuances of running multiple OSD process on the nodes (like port assignment and rotation).

What you describe is fine and actually works quite nicely with what DR can provide from a hardware enumeration perspective. DR can build drives for nodes in this environment and update the map as appropriate (even with disk-by-* ids). My helm foo is not good enough to understand or build this, yet. I was mostly building this to see what could be done.

So, what do we want to do with this one?

v1k0d3n · 2017-03-06T01:36:38Z

there hasn't been activity on this PR in quite some time. the issue will end up getting dropped once the project is imported in openstack proper (potentially next week).

v1k0d3n · 2017-03-08T14:37:08Z

there doesn't seem to be any interest in contributing back to the requests that have been recommended. closing this PR.

alanmeadows · 2017-03-08T15:24:26Z

Let's make sure to note links to these somewhere in the roadmap document perhaps. It contains discussion points we want to preserve because even though the issue itself is closed, the work still needs to be done. The new request, blueprint, whatever form that takes within OpenStack should benefit from this conversation/direction.

v1k0d3n · 2017-03-08T15:34:42Z

@alanmeadows yes, I'm actually in the process of including these items into the roadmap docs and slating the rework for a release (i'm looking at 0.3.0 or 0.4.0 as a target). the discussion points you made about drive zapping, and general drive layout definition in the chart values is a perfectly valid ask/approach for this rework effort.

Allow CEPH OSD to use devices on the systems through

4d24407

values.yml or cli override - still need docs

v1k0d3n suggested changes Jan 27, 2017

View reviewed changes

alanmeadows suggested changes Jan 27, 2017

View reviewed changes

alanmeadows mentioned this pull request Feb 1, 2017

How do you use ceph-docker? ceph/ceph-container#472

Closed

v1k0d3n closed this Mar 8, 2017

v1k0d3n mentioned this pull request Mar 15, 2017

feat: ceph chart improvements to target real disks #286

Open

hunter mentioned this pull request Mar 21, 2017

Helm: Configurable OSD directory or devices ceph/ceph-container#562

Closed

Allow CEPH OSD to use devices on the systems through #158

Allow CEPH OSD to use devices on the systems through #158

Uh oh!

Conversation

galthaus commented Jan 27, 2017 • edited by v1k0d3n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

v1k0d3n left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alanmeadows left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

intlabs Jan 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alanmeadows Jan 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v1k0d3n commented Mar 6, 2017

Uh oh!

v1k0d3n commented Mar 8, 2017

Uh oh!

alanmeadows commented Mar 8, 2017

Uh oh!

v1k0d3n commented Mar 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

galthaus commented Jan 27, 2017 •

edited by v1k0d3n

Loading

intlabs Jan 28, 2017 •

edited

Loading

alanmeadows Jan 29, 2017 •

edited

Loading