OSCON 2017 and InVision Engineering in Open Source
Our Platform Team grouped up for a great trip to Austin, Texas for this year’s O’Reilly Open Source Convention - (OSCON-2017). It was a great opportunity to see what kind of innovations were going on in open source, connect with companies whose products we use, and generally catch up on technology trends. InVision builds much of it’s software stack on open source projects. Docker, Kubernetes, Linux, Golang, NodeJS, React and many other projects make up the stack at InVision. Therefore, we really benefit from connection with the community. Additionally, we want to contribute back as we build great services and libraries that can be reusable to the community as a whole! Below, are some summaries from great talks we saw at OSCON 2017. Also, you’ll find some summaries of our current open source offerings. Come check out our projects, contribute and use them!
Open Source AI at AWS and Apache MXNet
Summary by Tatsuro Alpert, Senior Software Engineer, Core Services
Adrian Cockroft talked about AWS’s open source library for AI algorithms. They offer easy access to these AI engines as well as EC2 instances with specs that are ideal to running them such as the P2 with multiple GPUs and lots of RAM. They also offer access to the services that back some of Amazon’s products such as the Alexa. You can use their APIs to take advantage of conversation engines as well as face and image recognition engines. Cockroft concluded with a demo of a self learning, self driving toy car. The car had an onboard RaspberryPi and camera that ran a MXNet to control it. The data processing and generation of the model is done in EC2 on one of the powerful machines.
Scaling massive, real-time data pipelines with Go
Summary by Tatsuro Alpert, Senior Software Engineer, Core Services
Jean de Klerk spoke about data pipelines written in Go. He began with a comparison of several network protocols and their strengths and weaknesses for transferring large amounts of data. He compared across HTTP, UDP, gRPC unary, websocket streaming, and gRPC streaming. While the streaming mechanisms were by far the fastest, they are more difficult to use from an implementation perspective. He then went on to discuss queueing data between producers and consumers. He compared several models on Go using arrays, channels, and ring buffers. He concluded that any methods that required the use of mutices did not perform well. Atomics on the other hand perform well, but are very complex to implement. Channels are the easiest to implement, but they do not have the flexibility of decoupling the producer from the consumer as they will eventually block. If your use case requires this decoupling, ring buffers are the best solution.
Monitoring at scale at Salesforce
Summary by Adam Frank, Engineering Manager, SRE
Salesforce is a giant HBase shop so it’s always really interesting how they run something like that at scale. This talk was about Argus, the tool Salesforce uses for gathering millions of metrics which, naturally, uses HBase on the backend. The data structure it uses in HBase is defined by OpenTSDB, which I believe makes it compatible with the various OpenTSDB tools for data collection, like tcollector. It uses Kafka to queue ingest, which makes it extremely flexible and not as susceptible to the sort of back pressure you see in other solutions, especially 3rd party hosted solutions. The talk presented a great world where we can collect as many metrics as we want with minimal sampling, but is of somewhat-limited use to us, because anyone who’s run an HBase cluster before knows you only do so if you have to 🙂 All the same, it clearly demonstrated the value of treating your internal metrics with the same big data analytics you’d consider for customer data.
Using NGINX as an effective and highly available content cache
Summary by Adam Frank, Engineering Manager, SRE
This ended up being the most technical NGINX-related talk at the conference, which surprised me since NGINX is so integral to most ingress tiers in the container and VM space. Although the talk was specifically about content caching, which didn’t seem particularly sexy, it ended up being a very informative talk (which ended with the speaker giving everyone developer licenses to NGINX Plus, which was nice). One thing that was covered was using dynamic variables in NGINX.
For example, you can set up a variable in the nginx config and map a value to it based on the output of a regex:
map $http_user_agent $dynamic {
"~*oogle" google;
default not_google;
}
In that example the variable $dynamic
will either get google
or not_google
depending on the user agent. The speaker also covered the split_clients
option, which lets you apply a certain value to a variable based on percentages.
For example:
split_clients $request_uri $variable {
50% "var1";
50% "var2";
}
The config related specifically to caching that was discussed was too much to describe here, but generally he covered different types of hashes, improved logging, a couple fairly-clever ways of doing HA, and how to properly configure disk caching without your disks becoming the bottleneck.
How and why we’re opening our code at Octopus Deploy
Summary by Chuck Freitas, Lead Software Engineer, Engineering Velocity
Damian Brady gave a talk on how Octopus Deploy decided to open source their deployment tooling as well as determining how much of the tools to open source. Since I work on the deployment tools at InVision and we are also in the process of open sourcing some of our internal projects I was especially interested in this talk.
While some of this talk revolved around the business case to be made for releasing a core part of the company revenue stream as open source - which is not directly applicable to my current teams case, I did enjoy the talk around defining what makes a tool truly useful as a public open source tool vs just a exercise in the process. Part of this comes down to is it a general enough tool to be useful and is it ready to be released? Also, is there an user base for this project and is there leadership to help steer it in the future?
Evolutionary architectures
Summary by Chuck Freitas, Lead Software Engineer, Engineering Velocity
Great talk around architecture patterns for supporting evolution of your application stack. As part of this Neal Ford discussed the idea of fitness functions. These fitness functions can rate or validate certain characteristics of your application architecture. For example, security or performance. This can then be used as part of your continuous deployment pipeline to validate the “fitness” of your application and that your changes have not hindered any important aspects of your application.
In describing a architecture that is flexible and can support evolution Neal showed an architecture diagram using micro-services that allows for scalability and isolation of concerns and an API gateway to isolate any changes and support a more flexible design. Since InVision is actively implementing this sort of design, it was good validation that we are on the right path.
Open Source at InVision
InVision has only just begun to contribute some of our in-house projects back to the community. We have more planned in the future, but currently our four efforts are out and waiting on GitHub for your contribution, input and use. Below is a quick summary of each project and how we use them.
Kit
Kubernetes + Git github.com/InVisionApp/kit
Kit is a full system to push Kubernetes deployment out based on a docker pipeline. There are multiple pieces of Kit that helps keep your Kubernetes deployments simple and manageable. We run many Kubernetes clusters here at InVision and we use Kit to manage the interactions. Our continuous integration pipeline allows us to do a great deal of deployments. We still take action on individual deployments (not continuous deployment), however those actions are all automated through our own internal chat-bot. Kit helps us push these deployments to MANY clusters without hassle. Feel free to take a gander at Kit and give it a try!
Kit-Overwatch
github.com/InVisionApp/kit-overwatch
Kit-Overwatch is a simple service who’s sole responsibility is to watch the Kubernetes event stream and push notifications to other services. This can be really useful to get the stream into something your engineering staff can work with. Currently, we only have a few notifiers built, but this could easily be expanded based on your needs. We run this in a docker container and it helps us get the Kuberenetes event stream into Slack and DataDog. There’s also a stdout logging feature for testing. Enjoy!
Rye
So, everyone needs middleware. That’s the truth. However, in Golang, middleware is one of those things you could do eight different ways and it wouldn’t matter, they would all work. Rye is our answer to middleware. We built a very simple middleware library to give us some out-of-the-box functionality such as StatsD integration and timing on our middleware methods. Additionally, we built out some base middlewares to go with the library including request logging, CORS, JWT verification, and Golang 1.7 context support. Rye has turned out to be very useful for us and is being used by multiple teams here at InVision. Feel free to give it a whirl!
Conjungo
github.com/InVisionApp/conjungo
So! Have you ever had a situation where you had to instances of the same struct and needed to merge them together? Well, we did. Basically, imagine that you have a PATCH endpoint that takes in your struct in your service, but you need to merge that with the value in your Mongo database. In that case, you can use Conjungo to merge the structs together. Conjungo allows you to control much of the process of merging and supports many use cases. This library was put together by our Senior Software Engineer, Tatsuro Alpert to solve a problem in a service catalog service that we built in-house. It’s turned out to be a very useful library for us! Check it out, use it and enjoy it. We’d love to know how we can make it better!
Coming Soon
We have other open source projects in the hopper here at InVision. One of the projects coming is named Chronos which is a very tiny Golang library for managing the scheduling, logging and reporting of recurring tasks. Look for this in the near future! Many other projects are growing here at InVision as we continue to improve our stack. That being said, look to us in the future as we Tweet out new projects as they become available.
The last word…
InVision is striving to build a platform we can be proud of. Open source has been a big part of that. As an engineering practice, not only do we rely on open source to build a platform that we are proud of, but we have started to contribute back our internal efforts. We’d love your input and help as we grow this effort. We welcome contributions, Github issues, pull requests and feedback. Additionally, come join us at InVision and help us contribute more open source efforts to the community! We can all be better together!