Database departure: How to migrate millions of users (while preparing for millions more)
Thu 19 Jan 2017
Paul Nues, Systems Architect at about.me, writes on his experience of undergoing a cloud database migration. He discusses the challenges, processes and the technical strategy behind the move…
Whenever an active web hosting service has sky-high growth expectations and any moment of downtime could be crippling, transferring a database becomes a high-wire act.
This was the situation we found ourselves facing after we bought ourselves back from AOL. Tasked with a new migration project, our team decided to move all our infrastructure back to the AWS cloud. We already had existing infrastructure (S3 and EC2) running there, but our team was also interested in AWS services higher up the stack like RedShift and CloudFront.
However, we still needed to migrate our existing services – including our MongoDB database – from AOL’s private data center to the AWS cloud, and we needed to do this quickly with absolutely no interruptions.
We had already expanded our service to millions of users and were looking to add millions more. Maintaining that kind of growth only happens with fluid scalability and without any major outages putting a dent in users’ goodwill. Doing both would require exceptional database administration precision.
Our team was lean and we didn’t have deep database expertise. That meant we began strategizing around possible cloud platforms that could provide it to us. Our goal was to solve not just our short-term technical quandary of seamlessly migrating from AOL without a catastrophe, but also our long-term growth requirements around performance and scaling.
With our new platform, we press a button and a MongoDB database appears a minute later.
We needed a reliable and reputable MongoDB-centric solution that would serve as the foundation for our future projects. It was also important that we could get advanced MongoDB recommendations on a variety of topics (such as performance tuning and indexing). And we wanted a comprehensive suite of tools that made it easy for our engineers to create, analyze, and scale any MongoDB deployment.
After some research and honing our strategy, we ended up implementing mLab’s fully managed MongoDB hosting platform (then known as MongoLab). By provisioning their databases directly in AWS, we could get low latency and secure network traffic between our application servers and mLab databases.
Once the decision was made, the platform and database tools transitioned us through the AOL-to-AWS live migration with minimal downtime and no impact on our users. We also used the opportunity to move to a simpler architecture that allowed us to reduce costs while still maintaining solid performance for our userbase.
What comes next?
With the migration safely in the rear-view mirror, we’ve focused our efforts on new products and features that enrich the about.me user experience – while continuing to meet customers’ expectations of near-zero latency. Rapid provisioning of production-ready MongoDB clusters on demand, a capability we did not possess prior to migration, has radically accelerated our development and testing processes. With our new platform, we press a button and a MongoDB database appears a minute later.
Because we use immutable containers in our infrastructure, we needed tools that could provide more information than just host/network health.
On the technical tools side, there have been two new ones we’ve begun using that have been particularly helpful to the development team during this new stage. For proactive indexing of any new unindexed operations that are running slowly, we’re now utilizing a slow query analyzer tool that provides our team with index suggestions.
The Telemetry monitoring and alerting system has also proven effective as a tool for monitoring and delving into database performance – something we hadn’t previously had all that much insight into.
With our database migration and expertise taken care of, we also evaluated additional AWS services and third-party service providers to see how they could help us stay agile.
In addition to EC2 and S3, we utilized AWS CloudFront (and other CDNs) in our infrastructure to ensure that users around the world can quickly view our site and enjoy the same customer experience as those local to the United States. We also implemented Datadog and Sysdig to provide better insight into the health of our architecture. Because we use immutable containers in our infrastructure, we needed tools that could provide more information than just host/network health.
Looking to the future of our company, there will of course be challenges ahead as we continue to pursue greater growth. But we feel that we’ve now gotten over a big hurdle and given our technical team the freedom to focus on the larger goal: creating the best platform for enabling our users to represent themselves online.