OpenStack Down Under
This year the travelling circus that is the OpenStack summit migrated to Sydney. A lot of us in Europe / North America found out exactly how far away from our normal venues it really is. (#openstacksummit on twitter for the days before the summit was an entertaining read :) )
Sunday Board / Joint Leadership Meeting
As I was in Sydney, and staying across the road from the meeting, I decided to drop in and listen. It was an interesting discussion, with a couple of highlights.
Chris Dent had a very interesting item about developer satisfaction - he has blogged about it on his blog: anticdent.org and it is well worth the read.
Johnathon Bryce lead the presentation of a proposed new expansion of the foundation, which he touched on in the Keynote the next day - I have a few concerns, but they are all much longer term issues, and may just be my own interal biases. I think the first new addition to the foundation will let us know how the rest of the process is going to go.
Colleen Murphy and Julia Kreger told us that they (along with Flavio Percoco) will be starting research to help improve our inclusiveness in the community.
The last item was brought forward by 2 board members, and they focused on LTS (Long Term Support / Stable) branches. The time from an upstream release until a user has it in production is actually long than expected - with a lot of time being used by distros packaging and ensuring installers are up to date.
This means that by the time users have a release in production, the upstream branches may be fully deprecated. There was a follow up Forum Session, and there is now an effort to co-ordinate a new methodology for long term collaboration in the LTS Etherpad.
There seems to be an assumption that distros are keeping actual git branches around for the longer term, and not layering patches inside of deb / rpm files, which I think is much more likely. I hope this effort succeeds, but my cynical side thinks this is more of a "fix it for us" cry, than "help us fix it". I suppose we will see if people show up.
One slide from this section was not discussed but concerned me. It was talking about having an enforced "TC Roadmap" which had lines from various workgroups and SIGs. Coming from a project that gets a lot of "Can you do x feature?" (to which I usually respond with "Do you have anyone to write the code?") this concerns me. I understand that it can be hard to get things changed in OpenStack, really I do, but a top down enforced "Roadmap" is not the way forward. Honestly, that two board members of an Open Source foundation think it is is worrying.
Designate had 3 sessions in Sydney:
- Our project update
- Project On Boarding
- Ops Feedback
The project update was good - much improved from Boston, where the 2 presenters were not paid to work on the project. We covered the major potential features, where we were for Cycle goals (both Queens goals completed, and Pike goals underway).
Project on boarding was not hugely attended, but I am hoping that was a side effect of the summit being both smaller and far away.
Ops feedback was great - we got a lot of bugs that were impacting our users and deployers, and collected it in our Feedback Etherpad (any comments welcome).
Cross Project Work
I went to quite a few cross project sessions - there was a good amount of discussion, and some useful work came out of it.
This is something that had completely slipped past me until now, but the ideas were great, and it would have made things I have done in previous companies much much easier.
Healthchecks per service
We came to a good agreement on how we can do standardised health checks across OpenStack, we now need to write a spec and start coding a new piece of middleware :)
Not so sure this was worth a vist - it was much more crowded than any of the other Forum sessions I went to, and ended up Bike Shedding on where the Edge ends (we literally spent 10 mins talking about if a car was part of the Edge or a thing managed by the edge.)
I kept hearing "smaller and lighter OpenStack" in that session, but have yet to hear what is too heavy about what we currently have. Nearly all our service scale down to some extent, and you can run a complete infrastructure on an 8GB VM.
Overall, it was a good summit - not too busy, and short. Looking forward to not traveling for the next PTG, I think the DUB -> DOH -> SYD and back drained the enthusiasm for flights for the next few months.
I would like to submit my candidacy for the Technical Committee for the upcoming election.
TL;DR; So, I bit the bullet and ran for the TC :)
I have been contributing to OpenStack since the Havana cycle  mainly in Designate. I have also sporadically gotten involved with the TC, and its meetings since Designate applied for incubation all the way back in Atlanta.
I have been PTL for Designate for Mitaka, Newton, Ocata and the Queens cycle, and a core for a longer period. I was also PTL for the Global Load Balancing before it was an unfortunate early casualty of the recent reshuffling within sponsoring organizations in the community.
As part of previous projects, I was both a developer and a heavy user of OpenStack. As part of contributing to the Kubernetes OpenStack integration we ran into a lot of the problems that impact our users, and people who try to integrate with us.
I believe that we all ready have a great base structure in place to help OpenStack evolve, and part of that is too have a group of people from different companies, backgrounds, genders and cultures to drive the project in the Technical Committee.
I believe my experience working in a younger, smaller project within OpenStack is a benefit, along with the experience of working on software as an end user of OpenStack I can help us ensure the Technical Committee is mindful of the unique challenges these projects and users can face.
I have not traditionally been shy about broaching these topics in the past   and , but I feel it is time I started follow through, and help guide the resolution for these questions, and I now have an employeer who is supportive of me spending more time in the community.
I do really like this community, and I want us to grow, expand and evolve the software we write, without changing what we stand for.
Thank you for taking the time to read this.
- Graham Hayes (mugsie)
|||https://review.openstack.org/#/c/312267/ (and related discussion)|
Today marks the start of a new chapter for me - I started as an employee of SUSE this morning. After 3.5 years (I had no idea it was that long) in HPE the time came, and me, and the rest of the team I work with, along with the Cloud Foundry team, and the OpenStack teams all moved en-mass to the new company.
I am really excited to work with SUSE, they really get open source, and do it well. I think with the progression of IaaS / CaaS / PaaS in recent years, we will end up with something like the linux kernel, and a few distros built around core open source components like OpenStack, Kubernetes and Cloud Foundry (or OpenShift).
SUSE has already shown they know how to do an enterprise distribution of an open source product, and I am looking forward to seeing how we do it for the future of compute.
There is a feeling of wistfulness leaving HP(E) - we did some amazing things there, and we had an amazing group of teams working closely to produce the best products we could. Running services in the public cloud will always be a highlight for me, and the level of freedom we were given to work upstream allowed us to have great projects, not just for HPE customers but for the community.
But alas, life is like a river, and the flow has sped up, carrying me into the next adventure!
I have been asked a few times recently "What is the state of the Designate project?", "How is Designate getting on?", and by people who know what is happening "What are you going to do about Designate?".
Needless to say, all of this is depressing to me, and the people that I have worked with for the last number of years to make Designate a truly useful, feature rich project.
TL;DR; for this - Designate is not in a sustainable place.
To start out - Designate has always been a small project. DNS does not have massive cool appeal - its not shiny, pretty, or something you see on the front page of HackerNews (unless it breaks - then oh boy do people become DNS experts).
A line a previous PTL for the project used to use, and I have happily robbed is "DNS is like plumbing, no one cares about it until it breaks, and then you are standing knee deep in $expletive". (As an aside, that was the reason we chose the crocodile as our mascot - its basically a dinosaur, old as dirt, and when it bites it causes some serious complications).
Unfortunately that comes over into the development of DNS products sometimes. DNSaaS is a check box on a tender response, an assumption.
We were lucky in the beginning - we had 2 large(ish) public clouds that needed DNS services, and nothing currently existed in the eco-system, so we got funding for a team from a few sources.
We got a ton done in that period - we moved from a v1 API which was synchronous to a new v2 async API, we massively increased the amount of DNS servers we supported, and added new features.
Unfortunately, this didn't last. Internal priorities within companies sponsoring the development changed, and we started to shed contributors, which happens, however disappointing. Usually when this happens if a project is important enough the community will pick up where the previous group left off.
We have yet to see many (meaningful) commits from the community though. We have some great deployers who will file bugs, and if they can put up patch sets - but they are (incredibly valuable and appreciated) tactical contributions. A project cannot survive on them, and we are no exception.
So where does that leave us? Let have a look at how many actual commits we have had:
Next cycle, we are going to have 2 community goals:
- Control Plane API endpoints deployment via WSGI
- Python 3.5 functional testing
We would have been actually OK for the tempest one - we were one of the first external repo based plug-ins with designate-tempest-plugin
For WSGI based APIs, this will be a chunk of work - due to our internal code structure splitting out the API is going to be ... an issue. (and I think it will be harder than most people expect - anyone using olso.service has eventlet imported - I am not sure how that affects running in a WSGI server)
Python 3.5 - I have no idea. We can't even run all our unit tests on python 3.5, so I suspect getting functional testing may be an issue. And, convincing management that re-factoring parts of the code base due to "community goals" or a future potential pay-off can be more difficult than it should.
We now have a situation where the largest "non-core" project  in the tent has a tiny number of developers working on it. 42% of deployers are evaluating Designate, so we should see this start to increase.
How did this happen?
Like most situations, there is no single cause.
Certainly there may have been fault on the side of the Designate leadership. We had started out as a small team, and had built a huge amount of trust and respect based on in person interactions over a few years, which meant that there was a fair bit of "tribal knowledge" in the heads of a few people, and that new people had a hard time becoming part of the group.
Also, due to volume of work done by this small group, a lot of users / distros were OK leaving us work - some of us were also running a production designate service during this time, so we knew what we needed to develop, and we had pretty quick feedback when we made a mistake, or caused a bug. All of this resulted in the major development cost being funded by two companies, which left us vulnerable to changes in direction from those companies. Then that shoe dropped. We are now one corporate change of direction from having no cores on the project being paid to work on the project. 
Preceding this, the governance of OpenStack changed to the Big Tent While this change was a good thing for the OpenStack project as a whole it had quite a bad impact on us.
Pre Big Tent, you got integrated. This was at least a cycle, where you moved docs to docs.openstack.org, integrated with QA testing tooling, got packaged by Linux distros, and build cross project features.
When this was a selective thing, there was teams available to help with that, docs teams would help with content (and tooling - docs was a mass of XML back then), QA would help with tempest and devstack, horizon would help with panels.
In Big Tent, there just wasn't resources to do this - the scope of the project expansion was huge. However the big tent happened (in my opinion - I have written about this before) before the horizontal / cross project teams were ready. They stuck to covering the "integrated" projects, which was all they could do at the time.
This left us in a position of having to reimplement tooling, figure out what tooling we did have access to, and migrate everything we had on our own. And, as a project that (at our peak level of contribution) only ever had 5% of the number of contributors compared to a project like nova, this put quite a load on our developers. Things like grenade, tempest and horizon plug-ins, took weeks to figure out all of which took time from other vital things like docs, functional tests and getting designate into other tools.
One of the companies who invested in designate had a QE engineer that used to contribute, and I can honestly say that the quality of our testing improved 10 fold during the time he worked with us. Not just from in repo tests, but from standing up full deployment stacks, and trying to break them - we learned a lot about how we could improve things from his expertise.
Which is kind of the point I think. Nobody is amazing at everything. You need people with domain knowledge to work on these areas. If you asked me to do a multi-node grenade job, I would either start drinking, throw my laptop at you or do both.
We still have some of these problems to this day - most of our docs are in a messy pile in docs.openstack.org/developer/designate while we still have a small amount of old functional tests that are not ported from our old non plug-in style.
All of this adds up to make projects like Designate much less attractive to users - we just need to look at the project navigator to see what a bad image potential users get of us.  This is for a project that was ran as a full (non beta) service in a public cloud. 
Where too now then?
Well, this is where I call out to people who actually use the project - don't jump ship and use something else because of the picture I have painted. We are a dedicated team, who cares about the project. We just need some help.
I know there are large telcos who use Designate. I am sure there is tooling, or docs build up in these companies that could be very useful to the project.
Nearly every commercial OpenStack distro has Designate. Some have had it since the beginning. Again, developers, docs, tooling, testers, anything and everything is welcome. We don't need a massive amount of resources - we are a small ish, stable, project.
We need developers with upstream time allocated, and the budget to go to events like the PTG - for cross project work, and internal designate road map, these events form the core of how we work.
We also need help from cross project teams - the work done by them is brilliant but it can be hard for smaller projects to consume. We have had a lot of progress since the Leveller Playing Field debate, but a lot of work is still optimised for the larger teams who get direct support, or well resourced teams who can dedicate people to the implementation of plugins / code.
As someone I was talking to recently said - AWS is not winning public cloud because of commodity compute (that does help - a lot), but because of the added services that make using the cloud, well, cloud like. OpenStack needs to decide that either it is just compute, or if it wants the eco-system.  Designate is far from alone in this.
I am happy to talk to anyone about helping to fill in the needed resources - Designate is a project that started in the very office I am writing this blog post in, and something I want to last.
For a visual this is Designate team in Atlanta, just before we got incubated.
and this was our last mid cycle:
and in Atlanta at the PTG, there will be two of us.
|||In the Oct-2016 User Survey Designate was deployed in 23% of clouds|
|||I have been lucky to have a management chain that is OK with me spending some time on Designate, and have not asked me to take time off for Summits or Gatherings, but my day job is working on a completely different project.|
|||I do have other issues with the metrics - mainly that we existed before leaving stackforge, and some of the other stats are set so high, that non "core" projects will probably never meet them.|
|||I recently went to an internal training talk, where they were talking about new features in Newton. There was a whole slide about how projects had improved, or gotten worse on these scores. A whole slide. With tables of scores, and I think there may have even been a graph.|
|||Now, I am slightly biased, but I would argue that DNS is needed in commodity compute, but again, that is my view.|
Non Candidacy for Designate PTL - Pike
Happy new year!
As you may have guessed from the title, I have decided that the time has come to step aside as PTL for the upcoming cycle. It is unfortunate, but my work has pivoted in a different direction over the last year (containers all the way down man - but hey, I got part of my wish to write Golang, just not on the project I envisaged :) ).
As a result, I have been trying to PTL out of hours for the last cycle and a half. Unfortunatly, this has had a bad impact on this cycle, and I don't think we should repeat the pattern.
We have done some great work over the last year or so - Worker Model, the
the new dashboard, being one of the first projects to have an external tempest plugin and getting lost in
the west of Ireland in the aftermath of the flooding.
I can honestly say, I have enjoyed my entire time with this team, from our first meeting in Austin, back in the beginning of 2014, the whole way through to today. We have always been a small team, but when I think back to what we have produced over the last few years, I am incredibly proud.
Change is healthy, and I have been in a leadership position in Designate longer than most, and no project should rely on a person or persons to continue to exist.
I will stick around on IRC, and still remain a member of the core review team, as a lot of the roadmap is still in the heads of myself and 2 or 3 others, but my main aim will be to document the roadmap in a single place, and not just in thousands of etherpads.
It has been a fun journey - I have gotten to work with some great people, see some amazing places, work on really interestig problems and contribute to a project that was close to my heart.
This is not an easy thing to do, but I think the time is right for the project and me to let someone else make their stamp on the project, and bring it to the next level.
Thank you for this opportunity to serve the community for so long, it is not something I will forget.
|||graham.hayes (a) hpe.com|
I was pleasantly surprised - no one started shouting at me - but by trying to not point fingers at individual teams I made the text too convoluted.
So, in an effort to clarify things, here is an overview of what has been said so far, both in the mailing list and the gerrit review itself.
... does this also include plugins within projects, like storage backends in cinder and hypervisor drivers in nova?
No - this was not clear enough. This change is aimed at projects that are points of significant cross project interaction. While, in the future there may come a point where Nova Compute Drivers are developed out of tree (though I doubt it), that is not happening today. As a result, there is no projects in the list of projects that would need to integrate with Nova.
Could you please clarify: do you advocate for a generic plugin interface for every project, or that each project should expose a plugin interface that allows plugin to behave as in-tree components? Because the latter is what happens with Tempest, and I see the former a bit complicated.
For every project that has cross project interaction - tempest is a good example.
For these projects, they should allow all projects in tree (like Nova, Neutron, Cinder etc are today), or they should have a plugin interface (like they currently do), but all projects must use it, and not use parts of tempest that are not exposed in that interface.
This would mean that tempest would move the nova, neutron, etc tests to use the plugin interface.
Now, that plugin could be kept in the tempest repo, and still maintained by the QA team, but should use the same interface as the other plugins that are not in that repository.
Of course, it is not just tempest - an incomplete list looks like:
- OpenStack Client
- OpenStack SDK
And I am sure I have missed some obvious ones. (if you see a project missing let me know on the motion)
I think I disagree here. The root cause is being addressed: external tests can use the Tempest plugin interface, and use the API, which is being stabilized. The fact that the Tempest API is partially unstable is a temporary things, due to the origin of the project and the way the scope was redefined, but again it's temporary.
This seems to be the core of a lot of the disagreement - this is only temporary, it will all be fixed in the future, and it should stay this way.
Unfortunately the discrepancy between projects is not temporary. The specific problems I have highlighted in the thread for one of the projects is temporary, but I beleive the only long-term solution is to remove the difference between projects.
Before we start making lots of specific rules about how teams coordinate, I would like to understand the problem those rules are meant to solve, so thank you for providing that example. ... It's not clear yet whether there needs to be a new policy to change the existing intent, or if a discussion just hasn't happened, or if someone simply needs to edit some code.
Unfortunately there is a big push back on editing code to help plugins from some of the projects. Again, having the differing access between projects will continue to exacerbate the problem.
"Change the name of the resolution"
—(Paraphrase from a few people)
That was done in the last patchset. I think the Level Playing Field title bounced around my head from the other resolution that was titled Level Playing Field. It may have been confusing alright.
I feel like I have been picking on tempest a little too much, it just captures the current issues perfectly, and a large number of the community have some knowledge of it, and how it works.
There is other areas across OpenStack the need attention as well:
Horizon privileged projects have access to much more panels than plugins (service status, quotas, overviews etc). Plugins have to rely on tarballs of horizon
OpenStack CLI privileged projects have access to more commands, as plugins cannot hook in to them (e.g. quotas)
Plugins may or may not have tempest tests ran (I think that patch merged), they have to use parts of tempest I was told explicitly plugins should not use to get the tests to run at that point.
We can now add install guides and hook into the API Reference, and API guides. This is great - and I am really happy about it. We still have issues trying to integrate with other areas in docs, and most non docs privileged projects end up with massive amounts of users docs in docs.openstack.org/developer/<project> , which is not ideal.
I just proposed a review to openstack/governance repo  that aims to have everything across OpenStack be plugin based for all cross project interaction, or allow all projects access to the same internal APIs and I wanted to give a bit of background on my motivation, and how it came about.
Coming from a smaller project, I can see issues for new projects, smaller projects, and projects that may not be seen as "important".
As a smaller project trying to fit into cross project initiatives, (and yes, make sure our software looks at least OK in the Project Navigator) the process can be difficult.
A lot of projects / repositories have plugin interfaces, but also have project integrations in tree, that do not follow the plugin interface. This makes it difficult to see what a plugin can, and should do.
When we moved to the big tent, we wanted as a community to move to a flatter model, removing the old integrated status.
Unfortunately we still have areas when some projects are more equal - there is a lingering set of projects who were integrated at the point in time that we moved, and have preferential status.
A lot of the effects are hard to see, and are not insurmountable, but do cause projects to re-invent the wheel.
For example, quotas - there is no way for a project that is not nova, neutron, cinder to hook into the standard CLI, or UI for setting quotas. They can be done as either extra commands (openstack dns quota set --foo bar) or as custom panels, but not the way other quotas get set.
Tempest plugins are another example. Approximately 30 of the 36 current plugins are using resources that are not supposed to be used, and are an unstable interface. Projects in tree in tempest are at a much better position, as any change to the internal API will have to be fixed before the gate merges, but other out of tree plugins are in a place where they can be broken at any point.
None of this is meant to single out projects, or teams. A lot of the projects that are in this situation have inordinate amounts of work placed on them by the big-tent, and I can emphasize with why things are this way. These were the examples that currently stick out in my mind, and I think we have come to a point where we need to make a change as a community.
By moving to a "plugins for all" model, these issues are reduced. It undoubtedly will cause more, but it is closer to our goal of Recognizing all our community is part of OpenStack, and differentiate projects by tags.
This won't be a change that happens tomorrow, next week, or even next cycle, but think as a goal, we should start moving in this direction as soon as we can, and start building momentum.
This was originally posted to the openstack-dev mailing list.
I started writing this on a plane on the way home from Austin, TX where we just finished up the Newton design summit for OpenStack, and finished it a few days later - please excuse any wierdness in syntax or flow :)
Austin was its usual weird, wacky, wonderful self. Everytime we are here, I have a great time, I just can't deal with the heat :).
The summit format this year worked really well - I pretty much stayed in the design summit hotel for the week - and got some very good work done.
6 Months On: Where are we
One of the things I did before this design summit was to look at what we had achived last cycle.
We had a quiet cycle overall, but we did merge some vital features.
Operators no longer need to use the awfull config format we created for pools
in Kilo, we now have a much easier to read YAML file that is loaded into the
database via the
designate-manage cli, we now actually support multiple
pools in a real way with the addition of a schedular in
Where are we going?
So - the point of the design summit is to plan the future of the projects - and designate is no execption. We were in a (very nice boardroom) room for a few hours - and we talked through quite a few things.
The collection of etherpads for the summit are available as well.
Golang Replacement of MiniDNS
One of the nicer things about our current architecture is the flexibility that we have because we use the standard DNS protocols to update the target DNS servers. This has the downside however, that we are writing code that deals with DNS packets in python, which is slow. timsim from RackSpace has writen a POC in go that has a very large perfomance improvement.
This needs to documented, and permission requested from the TC to move this component to Go, (and will be a separate post in its own right).
After we got back it turned out that we were not the only project considering this, and swift actually have a feature branch with code in place. So, based on this, we are going to collaborate with them on the integration of Go to OpenStack.
As we dicussed in Galway - we need to replace the current
with a generic service that can scale out horizontally. As part of this we
planned out our upgrade and implementation plans for these services.
Docs, Deprications and more Docs
Docs were a common theme - we were asked to improve them, and also have them located in the main docs on the OpenStack.org website.
We had a member of the docs team in the room, who gave us some great guidenance on how to include our docs in the correct place.
VMT - The application process
On of the teams that supports OpenStack is the Vunrebility Management Team. They deal with disclosures and assigning OSSA numbers to issues that could present and issue to our deployers.
They had a session this summit on how that process might work, and designate was one of the projects chosen to be used as a pilot as I have previously produced Threat Analysis documentation for Designate inside of Hewlett Packard Enterprise - this information is currently being processed for release to the community.
Searchlight is a new (ish) project in OpenStack that enables true searching capabilities on clouds. We have a designate plugin, but there are issues with how we emit notifications from the v1 and the v2 API.
We decided that when we move the Horizon panels to v2, we will just listen for v2 notifications in Searchlight.
There was an interesting session on how the community are moving to document their APIs. It will now reside in the projects repo, and is based on a custom sphinx extention that was written for OpenStack.
As this progresses we will migrate designate to these docs, and remove our current docs, as they are much harder to read.
We had this bug come in yesterday.
It was a bit unexpected - as we tested it pretty extensively when it was being developed.
The line in question was this:
In eventlet 0.17.x this behaved like the standard
socket.sendall() , instead of
The other major problem is that the bug did not manifest itself until we pushed the AXFR over a long range connection.
Designate Mid Cycle
This was a little over a week ago, and I have been trying to get my thoughts down on paper / rst.
It was a good few days overall - we had our usual arguements about implementation of some familiar features, we listed out reasons that our past selves were wrong about everything, and how we should fix our mistakes.