This content has been archived, and while it was correct at time of publication, it may no longer be accurate or reflect the current situation at Microsoft.
When ownership of returns and repairs at Microsoft shifted hands, Microsoft Digital saw an opportunity to transform the supply chain infrastructure supporting those services.
Microsoft’s Customer Repair Experience and Warranty (CREW) relies on DeviceCare, an internal suite of tools and background processes used to manage repairs and returns, to satisfy customers. But DeviceCare was becoming a bit much to handle.
As much as we wanted to move from legacy to modern, it couldn’t just be lift and shift. We had to tread carefully to avoid any negative impacts to our customer’s experience.
—Abhinav Mahajan, principal group engineering manager, Microsoft Digital
“We knew what we signed up for, so excitement fueled our confidence,” says Abhinav Mahajan, a principal group engineering manager with Microsoft Digital, the organization that powers, protects, and transforms Microsoft, including CREW. “There were infrastructure opportunities to be had—between our two teams, we were spending more than $2 million a year to keep the lights on.”

With multiple applications running in DeviceCare’s backend, updating the supply chain infrastructure meant consolidating several stacks into one. The system handles over 6 million requests a day, making it an important tool for customer satisfaction. Microsoft Digital would need to update the backend without interrupting business.
“As much as we wanted to move from legacy to modern, it couldn’t just be lift and shift,” Mahajan says. “We had to tread carefully to avoid any negative impacts to our customer’s experience.”
The stakes were high, but in transforming DeviceCare, Microsoft Digital created a modern supply chain infrastructure to work from which improved developer productivity, made it easier to onboard new employees, allowed them to retire CREW’s legacy system, and saw close to $1 million a year savings in operating costs.
[Learn how Azure Data Lake connects supply chain data for advanced analytics.]
Too much can be a bad thing
CREW inherited DeviceCare in December of 2019. When initially tasked with supporting CREW and their new application, Microsoft Digital tried to work with the system they were handed.
“It was like hammering nails into a black box, never knowing if you were about to hit your finger,” says Deepak Sanghi, a senior software engineer with Microsoft Digital. “Any change required us to figure something out, and we weren’t always certain there wouldn’t be issues.”
The challenge was due to years of point solutions and legacy systems that made up what was a very piecemeal supply chain infrastructure. Since its inception, DeviceCare’s backend was assembled from dozens of applications and powered by multiple datasets merged into the environment.
“These solutions are great, but not in our context,” Sanghi says. “It was too much to manage. If anything was asked of us—for example, updating an aging feature—we had to figure out how it was hosted, sort out which tools we would need to use to do the work, and just sort out the entire process that we needed to follow.”
Customers are already having some problem, that’s why they need to interact with DeviceCare. We don’t want them to have a frustrating time with their return or repair. Reputation is important, you have to win them back with a great experience.
—Deepak Sanghi, senior software engineer, Microsoft Digital
DeviceCare utilized over 170 APIs, 27 upstream dependent systems, and around 17 subscriptions. It was a lot to engage with, expensive, and not particularly efficient.
“It’s not scalable,” says Amulya Pokala, a software engineer with Microsoft Digital. “There are so many APIs that if we want to scale a background job, we couldn’t do it. Plus, when someone new comes onto the team, they have to learn all of these processes and tools.”
The crowded backend left Microsoft Digital routinely on the hunt for issues, because letting DeviceCare go down was simply not an option.
“Customers are already having some problem, that’s why they need to interact with DeviceCare,” Sanghi says. “We don’t want them to have a frustrating time with their return or repair. Reputation is important, you have to win them back with a great experience.”
And there was one other looming concern.
“We knew compliance would be pressed hard on this infrastructure,” Mahajan says. Microsoft Digital knew that DeviceCare’s legacy solutions would eventually fall short of Microsoft’s compliance standards.
This made it even more pertinent to institute changes.
Microsoft Digital would need to come up with new supply chain infrastructure to support DeviceCare.
Transforming the infrastructure
Bringing DeviceCare to a future state meant CREW would be equipped with the latest technology, improved developer productivity, a better user experience, the ability to quickly scale up, and would see a reduced cost of ownership.
To get there, Microsoft Digital decided to use the Microsoft Azure Kubernetes Service (AKS).
“That would make us future ready,” Sanghi says.
Modernization really helped us with scalability. We had over 35 background processors on a single instance. In the new state, we separate it out so that we only have to scale up a single background processor.
—Renjuraj T, software engineer, Microsoft Digital
Kubernetes is a container orchestration platform. Apps are deployed as a container, and AKS will auto-heal out of the box.
“It’s really needed for resiliency and scaling,” Sanghi says. “It’s pay as you go, you give your app what you need and get the best out of it. You decide how much you want out of an instance and efficiently manage resources and cost. Kubernetes makes scaling instantaneous.”
AKS allowed CREW to host based on need. Previously, DeviceCare existed on one instance, which meant everything had to be scaled even if it wasn’t needed. Now with AKS, Microsoft Digital can be precise.

“Modernization really helped us with scalability,” says Renjuraj T, a software engineer with Microsoft Digital. “We had over 35 background processors on a single instance. In the new state, we separate it out so that we only have to scale up a single background processor.”
Microsoft Digital would also need to migrate the supply chain infrastructure from .NET Framework to .NET Core, but doing so added another layer of complexity.
“Testing .NET Core in the legacy environment would double our effort,” Pokala says. “We decided to move it to Kubernetes and address any problems there.”
The decision ended up paying off.
“When we were doing this, the system was high-volume all day,” Sanghi says. “We could get everything back to the old state in five minutes if we needed to roll it back. But we also knew that if we could make it through the high-volume period, we could survive anything.”
Optimal outcomes for DeviceCare
The Microsoft Digital team has not only created a sustainable backend, they’ve also optimized the way DeviceCare works. Under the legacy environment, a large amount of data was being generated and uploaded.
“The system was creating 100gb a day,” Pokala says. “If we lift and shift, that data would have stayed there forever and no one would look at it. The way Azure Kubernetes works, that’s redundant data and we took the opportunity to remove it.”
We were able to decrease release time from five hours to 30 minutes. The legacy environment was so complex that build to release time was long. Deploying a new feature meant we had to wait and rolling it back took several hours. That’s no longer the case.
—Amulya Pokala, software engineer, Microsoft Digital
In performing this optimization, Microsoft Digital was able to cut 97gb of data off the daily system upload, reducing the amount of data generated to just 3gb per day.
Development time and onboarding have improved as well.
With a streamlined environment powered by the latest framework, users can create features that will continue to improve DeviceCare.
“We were able to decrease release time from five hours to 30 minutes,” Pokala says. “The legacy environment was so complex that build to release time was long. Deploying a new feature meant we had to wait and rolling it back took several hours. That’s no longer the case.”
But there’s one other important outcome as well: “We anticipated compliance issues for the legacy infrastructure,” Mahajan says. “We were well-positioned to retire the old environment because of our decisions.”
Updating the supply chain infrastructure allowed Microsoft Digital to be proactive.
Paving the way for others
Changes to DeviceCare have all been implemented without disruption. AKS gave Microsoft Digital supply chain infrastructure that would improve developer productivity, boost compliance, and introduce new efficiencies and cost savings.
Now that Microsoft Digital has completed the first leg of improving DeviceCare’s backend, there’s an opportunity to support the rest of the supply chain.
“With this infrastructure, we can help out others as well,” Mahajan says. “We can impact CREW and other engineering teams in supply chain. No one told us that we would save a million dollars a year on this effort. We can now onboard other tenants from supply chain.”
Support from leadership, even in the face of the unknown, meant everyone contributed. As a result, the systems that support supply chain are scalable, more resilient, and better equipped to serve Microsoft in the future.
“This is all about helping engineering, but everyone contributed in their own way,” Sanghi says. “We did our due diligence, went ahead, and had to see things in production to learn. We kept moving and made progress.”
Check out this introduction to Azure Kubernetes Service.
Learn how Azure Data Lake connects supply chain data for advanced analytics.