Hi everyone, 👋
We are very excited to bring you the ‘To Do or Not To Do’ series. The goal of this series is to explore various nuances and trade-offs that organizations should consider while making security, compliance, infrastructure or DevOps decisions. If you find yourself in a position where you need to take a certain infrastructure decision, it is our hope that this column will provide a framework to think through all the different variables around it.
While there are a lot of great technical blogs out there that focus on implementation, we feel there isn’t enough content on the strategic and business reasons behind engineering decisions.
In our first newsletter, we focus on whether organizations should share infrastructure access with the team members or not. This has become quite a critical question post-covid as companies are still adapting to the remote set-up. Also, a lot of recent high-profile stories like the Twitter whistleblower incident and the Uber & crypto hacks have created an urgency in organizations to really think through their infrastructure access strategies.
Cost of Unsupervised Access
Before we dive deeper into the why/ how of infrastructure access management, it is important to understand what happens when you don’t have a strategy.
Unsupervised access often leads to compromised credentials, which then lead to data breaches. Data breaches in organizations have been increasing at an alarming rate of 15% every year since 2015. In the US alone, the number of data breaches has gone up from just 785 in 2015 to more than 1800 cases in 2021.
And these data breaches are extremely expensive to an organization. An average data breach costs about $4.35M and 19% of the breaches are caused due to compromised credentials - Hence, unsupervised access is extremely dangerous and expensive!
In fact, what happened at Uber recently is a perfect example of a hack caused due to compromised credentials.
Okay, great - unsupervised access is expensive. But why care about this now?
There are a few trends in the tech industry, all gaining momentum at the same time that have forced organizations to start thinking about their infrastructure access strategy.
Trend 1: Covid-19 forces companies to go remote-first
Trend 2: Increased Cloud Adoption
Trend 3: Compliance Requirements becoming more important and moving downstream to early-stage companies
Trend 4: Crypto creates new financial incentives for hackers!
Trend 1: Covid-19 forces companies to go remote-first
In March 2020 Covid struck, and it completely transformed the operating paradigm of the world - not just for people, but also for organizations. Remote work went mainstream in an unnaturally rushed timeline, and the consequence was organizations not getting enough time to think through all the different dimensions and processes in order to adapt to this change sustainably.
Tech companies were the quickest to respond to covid and moved to work-from-home policies. They promoted remote collaboration, virtual setups, and the use of cloud services and quickly embraced the new normal of remote work culture. For example, by mid-2020, Facebook had planned to employ more than 50% of its workforce to work remotely over the next 5 years.
Organizations are forced to reimagine their infrastructure access management policies and processes in this new remote-first world.
Trend 2: Increased Cloud Adoption
Nothing screams validation of cloud adoption like AWS revenues, which went from $3.1 Billion in 2014 to $61.5 Billion in 2021.
Cloud adoption has been a major help for tech-first companies to maintain business continuity as well as growth during covid. While other sectors struggled to stay afloat during the pandemic, the tech-first companies did quite well during the same time. During covid, Amazon went on a hiring spree and onboarded close to 275,000 employees in these 6 months, which is a great validation that cloud adoption really helped maintain business continuity.
A common infrastructure setup that companies used in the pre-covid era was setting up an on-premise server and a perimeter network to restrict access. However, this onset of cloud adoption has changed how infrastructure is set up resulting in the need for new access policies.
Trend 3: Compliance Requirements becoming more important and moving downstream to early-stage companies
Zero Trust has changed the way companies operate and infrastructure access is one of the most critical components that ensure compliance. In the next article, we’ll dive into Zero Trust and why it is an important security measure.
These days, even Seed and Series A stage organizations are forced to adhere to compliance standards like SOC 2, ISO 27001, and HIPAA - especially if they operate in regulated industries. Most organizations scramble to gather evidence while going through compliance audits and also struggle the most with access audit logs. Anytime an employee touches an infrastructure resource such as ubuntu VM on AWS or a managed SQL Server Database on Azure (especially in production environments), there needs to be a log that captures all the queries/commands that were executed as well as the resulting outputs.
In order to stay continuously compliant with various standards, infrastructure access hygiene is crucial!
Trend 4: Crypto creates new financial incentives for hackers!
Crypto has gained a lot of momentum over the past few years (for better or for worse) - and that has changed the financial incentives for hackers. Crypto rails have allowed for funds to be transferred in a way where it becomes extremely challenging to track the sender or the receiver.
On May 7 2021, an oil pipeline company Colonial Pipeline which provides roughly 45% of East Coast’s fuels (gasoline, diesel, home heating oil, jet fuel, and military supplies) got hacked by ransomware. Colonial had to cease operations temporarily and decided to proactively take certain systems offline to contain the threat. Within hours of the attack, Colonial also paid 75 Bitcoin (worth roughly $4.4 million at the time) — to DarkSide, the Russia-based cybercriminal group responsible for the attack. Colonial was finally able to resume operations 6 days later but during that time, the shutdown plus the panic resulted in fuel shortages in several areas.
Historically, a hack or a compromise in an organization’s IT infrastructure would most likely result in the hacker releasing some confidential information or documents to the public. But now, hackers often blackmail organizations with the same confidential information, and in exchange ask for funds to be transferred via a chain that cannot be traced easily (Finally a use case @Zach Weinberg!)
Now what is worse is if the victim of a hack is itself a crypto company. In that case, the hacker can drain the tokens directly and convert them into fiat. Case in point: Crypto.com, Qubit, Bored Apes, Wormhole, Cashio, Beanstalk, Fei Protocol - should I keep going?
OK! So, how do we figure out the best access management strategy?
So now that we’ve established how important having an infrastructure access strategy really is, how do companies currently do this? Especially w.r.t. production environments, where the cost of a breach is way more expensive?
At the end of the day, any organization’s security implementation is a tradeoff between:
Locking everything down and making it really hard to use which is expensive to implement and maintain
Moving fast with minimal restrictions, key sharing, etc. but it’s really prone to security incidents
However, there are a few more nuances to the above philosophies that are important to consider. We will evaluate and rate different access strategies on the basis of 4 key dimensions:
Security Posture
Auditability
Developer Productivity
Ease of Set-up & Maintenance
We understand that a lot of people might not agree with our ratings, but the main takeaway here is to be aware of all the different tradeoffs and dimensions we are evaluating. The goal of this newsletter is not about the exact ratings but about understanding all the different tradeoffs and potential risks involved in selecting your organization’s access management strategy.
“We don’t believe in sharing access at all”
The first and perhaps, most inefficient way organizations tackle infrastructure access is, by NOT SHARING ACCESS AT ALL. While one can argue this is a secure practice, the organization’s productivity really suffers as a result of this.
Let’s say there is an incident, and a developer needs access to a production server. Now in order to solve this problem, the dev will first have to get in touch with someone who has access (which might be really difficult in the first place as there are very few people in the org with keys/credentials, given the company’s policy). Once that happens, the person with access might have to somehow start a TeamViewer, or do a screen share to let the developer access the resource through their machine. According to Gartner, the average cost of IT downtime is $5,600 per minute. Now imagine all the time being spent with someone just trying to access the resource and resolve an incident via screen share.
Thus, while this access management strategy offers good security posture and is easy to manage as well, the trade-offs are low developer productivity and lack of auditability.
“We are going to create shared keys and credentials for teams!”
This is an extremely popular access management strategy widely adopted by many organizations. While this partially gets the job done, there are a few different challenges to this approach.
1.) Keeping track of who has access
When an organization has shared accounts, the biggest issue arises due to not having a central platform to keep track of who has access to which resources. So, what do companies do to solve this problem? They use our oldest friend, Excel.
We have seen many organizations use Excel files (Google sheets for more progressive ones) to manage and keep a track of the keys and permissions. Here’s what this actually looks like!
2.) Auditability
Another issue that arises from using shared keys is the lack of auditability for any change done. Since the keys are being used by multiple folks, companies cannot pinpoint exactly “Who did What” when an infrastructure resource was changed. Now imagine someone making a change in production DB (we know this is rare, but it DOES happen!), and the organization does not know who actually made that change. Well, even Twitter suffers from this lack of auditability as highlighted in their recent whistleblower story.
This strategy scores really low on security posture as well as auditability and creates a high risk factor for any organization. The setup for this strategy is easy but maintenance is deceptively cumbersome. Let’s say an employee who has access to the shared keys leaves the organization. Now the organization will have to deprecate the earlier keys, create new ones for all resources and redistribute them with team members who are supposed to have access. This can get extremely painful to manage in a large organization where attrition is common.
“No fear when VPN is here!”
VPN is ubiquitous and almost every organization uses one. Historically, VPN was great when organizations only cared about protecting servers located on-premise.
Modern-day infrastructure, on the other hand, is on-cloud, ephemeral and collaborative - and that is where relying solely on VPN becomes complex and cumbersome. Provisioning all these resources via VPN can be a maintenance challenge.
Second, solely relying on a VPN follows the all-or-nothing principle. This means that when someone is inside the VPN network, they can access ALL the resources. This could be extremely dangerous if a bad actor gets inside the VPN network. This also means that providing infrastructure access to a third-party vendor or a contractor can be painful and slows down the process of granting access.
From an auditability perspective, VPNs provide TCP logs but aren’t able to provide info on individual commands or queries within the resources and hence don’t provide great transparency into user activity.
“Bastion Servers are the best!”
A Bastion Server is a specialized computer used to access an infrastructure resource and helps create a separation between the downstream resource and developers. From a security perspective, a Bastion host is the only node in the network exposed to the public.
The fundamental challenges with Bastion nodes are the same as with shared keys/credentials - Basically, it's hard to know who has access to that server and also there is a lack of auditability.
An additional challenge is when organizations have a larger infrastructure. In that scenario, managing Bastion hosts becomes extremely challenging and you would typically need a separate team just to maintain them. Also, in a very isolated experience, we have seen instances where someone by mistake deleted the Bastion host itself. Now if a Bastion host is deleted, there is an immense amount of groundwork needed to recreate a new host and put the credentials of all the downstream resources again.
“I’m going to use a combination of Bastion, VPN, and shared credentials!”
Now, this is the ground reality in most organizations - they typically use a combination of VPN, Bastion, and some shared credentials. While this dramatically improves the security posture of the organization, it also increases the security budget - as this setup is extremely cumbersome to manage and maintain.
There is a tremendous amount of friction anytime a new resource is added to the company’s infrastructure or even a new employee is to be on-boarded. Also, the lack of auditability still continues to be a challenge and infrastructure access logs are not that easy to create with this approach.
About 15% of organizations using DevOps have a separate team to look after providing infrastructure access, defining processes, and even responding to tickets related to infrastructure issues for the rest of the organization. This strategy clearly provides the best security posture but is also the worst when it comes to setup & maintenance given the number of moving parts.
Conclusion
Below is a summary of all the above organization philosophies and their ratings.
Different strategies work better at different stages of an organization. If you are a fast-moving early-stage company looking for a product-market fit, focus on developer productivity and ease of setup & maintenance. Post product-market fit, improve your security posture, and auditability as the increased visibility will make you more susceptible to hacks and data breaches.
Also, the vertical in which an organization operates plays a huge part in figuring out the access strategy. If you are operating in a highly regulated vertical with sensitive customer information (for eg. FinTech, Healthcare, insurance), ensure a high-security posture and auditability at all times - even when it lowers the dev productivity and/or complicates the setup & maintenance
Every organization is unique, and there are many factors at play. The most critical aspect while selecting a specific access strategy (or not) is to be intentional about the decision. It is extremely risky if an organization keeps granting access randomly on an on-needed basis. When the organization eventually matures and plans to put some structure to the access strategy, a lot of time and resources would be needed to recycle and deactivate old keys & credentials.
The takeaway of this article is for organizations to be deliberate about access even at early stages and be aware of the business reasons, tradeoffs and risks involved in different strategies.