Before I begin, I’d just like to thank my friend JP Cedeno for giving me a crash course into Tetration and allowing me to use what he taught me to make the next few blog posts. I’d also like to dedicate this blog post to Matt White who asked for it some months ago. In this blog post, we’re going to go over the fundamentals of Tetration.
So, what does Tetration do? If you read the Cisco Tetration site, it states that Tetration offers holistic workload protection for multi-cloud data centers by enabling a zero-trust model using segmentation. Let’s take a step back and think about what a workload is: A workload is an endpoint or server in a data center or cloud that’s providing any sort of function. Let’s say we have a web tier that consists of three servers. If two of those servers are virtual machines and one is a bare metal server, Tetration would see that as three workloads. A workload could be a virtual machine, bare metal server, VDI environment, etc.
To understand the problems that Tetration can help solve, let’s take a look at a drawing of what might be a typical application.
Note: I’ve totally stolen this drawing from JP Cedeno and using his real-world example. Thanks, JP!
The above picture is a diagram of an application that needed to be secured. Whether you know it or not, most of you reading this have applications like the above drawing in your environment. In the above case, JP was brought on as a consultant to secure the above application and ensure their application was secure. The goal was to be able to secure the application and the network which he had to make sure everything was in the right VLANs, subnets, etc and firewall rules were properly created. This was how discovery went:
How many servers were in the web tier? No one knew for sure. Were they all virtual machines, bare metal, or hybrid? No one knew. They just knew that there was a web tier of some kind and they thought that it might be replicated but weren’t sure.
They knew for sure that the user (Mr User) came in through the edge firewall and hit the web tier on HTTP 443.
The web tier then communicated with an Oracle and SQL server. When asked why both, the client wasn’t sure which one was being used and which one wasn’t. At one point, it was stated that they used to have Oracle but upgraded to SQL. Unfortunately, the client wasn’t sure which servers were communicating with the SQL servers and which ones were communicating with the Oracle servers so they maintained both. Unfortunately, this lack of information made it harder to write firewall rules without knowing what servers were supposed to be talking to which database servers and which ports to open. One option was to take packet captures for an extended period of time and shift through those but it would take some time to do that.
Then the client had Active Directory dependencies since people were being authenticated coming into the web and app tiers. JP needed to know they had one server or multiple servers that needed to authenticate? Were they replicated? Were they the same? Were they all running the same processes? Which ports needed to be open with the firewall to make the app work vs features-based ports? Were they actually using these features? If they weren’t using the features, they wouldn’t need the ports open but they weren’t sure if they are using them or will be using them. So, what could the client do? Usually the client left everything open because they weren’t sure and didn’t want the application to be unusable.
JP also knew that the SQL and Oracle servers were going to need access to NFS as well as the servers needing access to data feeds but the client wasn’t sure what ports, what feeds, how many servers, etc so again too many things were left wide open.
So once Mr. User got past that edge firewall, he would have way too much access since too many things were wide open in this client’s environment and they didn’t want to lock it down because there were so many variables, they were unsure of. If Mr User compromised the web tier, he could potentially go wherever he wanted and there’s not much to keep him from talking directly to the Oracle or SQL server if everything is left open
Unfortunately, this is not a rare situation and is all too familiar in typical environments. Just looking over this all, there’s so much missing from this map and this is just one application out of maybe thousands in a data center. So, when your clients or teams call you for help securing an application, how can you help them if these details aren’t readily available? Besides security, how do we begin to troubleshoot this application and look for network congestion when issues arise if you have no idea where to track down where that congestion may be?
What does Tetration do?
Workload Discovery - It helps you discover workload dependencies while making applications “hybrid cloud ready.” Essentially, Tetration provides the ability to discover application mappings automatically and allows us to gather information including but not limited to:
Server Response Time (SRT)
TCP resets and retransmits
Window size - This helps us to see if the problem is the application or the network
TCP performance - Lets us see how long it takes for the traffic to get from point A to point B in the data center
Hostnames
Interface information
Processes - Lets us see every process turned on and running on a server and be able to create policy based on that process. For example, one can write a policy that blocks access to the internet or the campus for any server running Docker. If the process is seen, Tetration’s policy can prevent that communication.
User information - Visibility into the user ID of who’s logging in, login failures, what process that user kicks off when logging in, privilege escalation, etc. For example, if one logs in as KatMac, escalates her privileges to root and then executes 5 commands, Tetration is going to see all of this. Tetration can give that full forensic report of everything a user did after they login to that server. An administrator can then create alerts in Tetration for specific events as well as privilege escalation.
Software inventory & CVE vulnerabilities - Tetration can identify a critical vulnerability in the software packages installed and craft policies around them. One can have Tetration automatically prevent a resource from accessing the internet or some other resource in the data center based on that vulnerability. If the vulnerability exists, Tetration can change or block access immediately and once that vulnerability goes away, the server can automatically have access again.'
And much more
With all the above information, Tetration starts to piece flows together to see that server 1 might be talking to server 2 and 3 as well as all the flows in the data center. The actual flow might be a large amount of data but what is being sent to Tetration is just the meta data. The Tetration cluster does all the hard work of putting it all together and mapping it out.
Note: AppDynamic and Tetration use the same mnemonic of “Application Dependency Mapping” but what each product provides is different. In AppDynamic, It’s just the layer 7 application. i.e. “I have an app server talking to a web server and here are the exact queries going out in between.” It focuses only on the layer 7 visibility. In Tetration, application dependency mapping includes all the dependent services in the data center. For example, “I have an app server that’s not only talking to a web server but it’s also performing authentication to an Active Directory cluster and here are the 5 servers in that AD cluster.” Tetration can be as granular or high level as one wishes.
Workload Protection - Helps to secure workloads with portable policies acress any cloud, any data center, and any OS. After Tetration discovers all the applications and what those applications talk to, it will map the ports and protocols being utilized by that application for up to a year. Tetration will then take that information and automatically generate a whitelist policy. An administrator can analyze and edit these policies after they are created. Tetration provides the ability to even test them before deploying the policies. The policies themselves are enforced through the host firewall or iptables. Tetration can also export the rules and policies into JSON, YAML, XML, etc if you would like to import them into another platform that can ingest the rules such as a firewall. As of ACI 3.1, Tetration can also drive the ACI configuration which means that when the whitelist is created in Tetration, it can post it to ACI. ACI would then ingest that policy and update contracts for the endpoint groups. In short, Tetration would map out the data center and give ACI all the policies it needs to enforce.
Workload Assurance - Tetration gives you the ability to analyze security policies prior to deploying them and ensuring policy compliance. After Tetration creates the whitelist rules, one can run it through post assurance and ensure compliance. If Tetration is enforcing the policy, it won’t allow the traffic to send but it can still alert administrations if the behavior was attempted. Tetration is aware of the policies it’s enforcing in a cluster and if it sees something behaving outside of what was mapped, it can alert someone immediately.
Network Insights - Tetration gives performance insights per workload in real time with historical references. With Tetration, one has the ability to do a flow search anywhere in the cluster so it can show where one server might communicate to another, all ports and protocols it might use, and run reports on those flows. If one has ACI deployed, Tetration can do a hop-by-hop topology. Tetration will draw out those leaves and spine. With that topology, one can show where one server might communicate to another and how that flow moved through the topology.
How does Tetration gather it’s data?
There are a lot of methods including:
Software sensors - Installed on the endpoint. This is highly recommended and grants a lot of visibility down to the process level. With these software sensors, it doesn’t matter if the workload is in the public cloud or in your local on-prem data center. This captures all activity on the servers themselves including east-west traffic. It has a very small footprint on the server itself (within 3% server utilization) and supports bare metal servers, virtual machines, and containers.
ERSPAN sensors - For rich telemetry data from portions of the network in which software and hardware sensors are not present.
NetFlow sensors - For rich telemetry data from portions of the network in which software and hardware sensors are not present.
Hardware sensors - Line-rate telemetry within the switch’s ASIC (Nexus 9000s)
AnyConnect NVM proxy sensor - Telemetry from endpoint devices such as laptops, desktops, etc
As far as Tetration itself, it can be installed as a large form factor Tetration hardware cluster, a small form factor Tetration hardware cluster, a virtual form factor cluster, or deployed as a SaaS.
I hope this post provided an overview of some of the benefits of Tetration, what it’s trying to solve, where it gets it’s data, and how it’s deployed. In my next couple of posts, I’m going to dig into the Tetration UI and configuration a bit more to show some of its power.