Let's talk about consolidating across other tools?

So this is a topic near and dear to my heart since I spend a pretty terrible amount of time bouncing across platforms and trying to sort things out for people. One of the things I have been excited by in relation to NR since we started using it is that it seems like there is good potential for scraping data out of all kinds of places and bundling it up in here in a nice cohesive package. So far I have lots of big ideas but very little in the way of concrete progress to refer to :frowning: and wanted to see what everyone else who has a situation anything like mine is doing.

So we currently heavily use New Relic, Splunk, SolarWinds, Stackdriver, Netcool, and Service Now as the ticketing for all that to monitor an environment where most of our systems have made the jump to GCP and AWS, with some legacy systems living in colo’s and on vCenter VM’s. Also tracking network gear spread all around the world. Lots of the servers were classic lift and shift monoliths, but some teams are living deeply in that SRE/Devops universe where they need to monitor their automated deployment pipelines in kubernetes and such. Everyone is in varying stages of technological maturity and time is always a constraint. It’s essentially the wild west and my 3 man team’s job is to hard these cats as best we can.

So just to throw an example out there, I recently put together a script that I can use to crawl through a team’s subaccount and check if they have any servers with leftover process monitors on any SolarWinds SAM templates. Since the Infra agent is already grabbing that data I figured that was a fairly easy target to set up Infra alerts to match all the process monitors and be one step closer to removing their systems from the second tool. Now I’m a bit bogged down in the whole communicating with all the users and change control processes before I get to pull the trigger on forcing the change but it’s an encouraging step forward for me.

I’m also trying to explore what options I have with NR One where I might be able to visualize info from other tools, just as an example I want to be able to pull up a given server or APM app and cook up some wizardry to see some of the info I need from their CMDB CI records. Just save our users some clicks in terms of hunting down the info about a host that maybe the person who got the ticket wasn’t as familiar with as we would have hoped. I’ve done similar sorts of correlations in Orion so I know how I want it to look under the covers, I just need to skill up on JS and the nerdpacks and see how I can make it happen there.

Anyway those are some ideas of my struggles and successes, hopefully the community has some neat tricks up their sleeves or just wants to commiserate on the challenges we have?

6 Likes

For the curious I posted a sanitized version of the script I mentioned here

Worth mentioning a lot of this was built on top of an existing code base left behind by @zackm when he used to have my job.

1 Like

very cool stuff! i’m curious, can you share your process on how you audited your tools and went through the rationalization of each? i’ve got my own opinions (probably biased) on that, but since you went into the environment with fresh eyes it would be interesting to see how you approached it since we have such similar backgrounds.

Well, its very much a thing in flux. Lots of leadership and staff changes in the past year so everything is kind of up for grabs. The goal for our team right now is currently a lot of looking at the existing tooling and work flows to try to see what we can do to make sure our team isn’t an obstacle between various app and system owners and what they want from a monitoring and notification platform. Have to keep the wheels turning so we can’t just upend everything unless whatever replacement we are offering is a more compelling offering to them than whatever they are used to, so the legacy platforms are carrying a lot of inertia while we sort out all the details. So were currently delivering a mix of incremental quality of life improvements within the tools they know while trying to design a smooth runway toward the destination. I don’t want to draw a negative focus toward my team by taking features and capabilities away from people without providing an equal or improved experience from whatever I’m offering as a replacement. Ultimately I’m trying to coalesce the info we get from every source into something that people find to be easy to use and find whateve it is they want to know about the infra and apps they are on the hook for. Of course on top of everything the business will always want better, faster, cheaper.

1 Like