Saturday, September 8, 2018

NACL and Security Group settings in Databricks

                               NACL and Security Group settings in Databricks


AWS offers virtual firewalls to organizations, for filtering traffic that crosses their cloud network segments.  The AWS firewalls are managed using a concept called Security Groups. 


Security Groups are:

  1. Stateful  -- easier to manage, by just setting rules for one direction.
  2. VPC Scoped -- work in any AZ or Subnet
  3. Allow rules only -- everything is implicitly denied (Whitelisting only)

To further enhance and enrich its security filtering capabilities AWS also offers a feature called Network Access Control Lists (NACLs).  Like security groups, each NACL is a list of rules, but there also have Deny rules and Order to apply rules can be specified.
NACLS are:
  1. Stateless -- Inbound and Outbound rules must always be configured. 
  2. Subnet Scoped --Must be explictly associated to one or more subnets
  3. Allow and Deny both rules can be set
  4. Rules processed in order -- when a rule is matched, no rules further down the list are evaluated
  5. Rules processed at the subnet boundary

How are Network rule applied to specific EC2 instance ? Answer is "It’s all about the order"

Since NACL has the ability to write both ‘allow’ rules and ‘deny’ rules, the order of the rules now becomes important.  If you switch the order of the rules between a ‘deny’ and ‘allow’ rule, then you’re potentially changing your filtering policy quite dramatically. To manage this, AWS uses the concept of a ‘rule number’ within each NACL.  By specifying the rule number, you can identify the correct order of the rules for your needs. You can choose which traffic you deny at the outset, and which you then actively allow. As such, with NACLs you can manage security tasks in a way that you cannot do with security groups alone.  However an instance inherits security rules from both the security groups, and from the NACLs .
-  For inbound traffic, AWS’s infrastructure first assesses the NACL rules.  If traffic gets through the NACL, then all the security groups that are associated with that specific instance are evaluated, and the order in which this happens within and among the security groups is unimportant because they are all ‘allow’ rules.
-  For outbound traffic, this order is reversed:  the traffic is first evaluated against the security groups, and then finally against the NACL that is associated with the relevant subnet.
Now that we understand how SG and NACL concepts work for AWS instances we will now see what are default SG and NACL applied when Databricks spins up instances for clusters in newly provisioned Shard.


Currently every EC2 instance in Databricks which comes up in the specific VPC configured will be associated with 2 security groups by default which are

1) *-worker - This is managed by Databricks engineering (Has port ranges and source CIDR already specified and cannot be changed)
2) *-worker-unmanaged - This can be customized to whitelist ports and source traffic as appropriate.

Lets see practically how to get to SG's from one of our spark cluster spun up.

1) Click on Spark UI tab to get the hostname for Driver.



2) Details for the instance can be pulled up from searching under EC2 dashboard.




SG-id (sg-a2614edc) ) for Managed SG .




UnManaged SG

3)  Security inbound rule at instance level can be viewed as below


Managed SG inbound rules



Un-Managed SG inbound rules

4) Security out-inbound rule at instance level can be viewed as below



                                                 Managed SG outbound rules

                                                         Un-managed SG outbound rules



Below is list of ports and why they are used opened and what services run or communicate on them.

Port                     Port use

22                        SSH to core instance
2200                    Used to ssh to Spark Containers (Driver or workers)
4040-4100           Used internally by Spark
7077                     Spark Driver port
10000                  JDBC port for third-party applications such as Tableau and ZoomData
32768-61000.      Ephemeral port range used to bind Spark UIs so they can be accessed by the webapp 
6060                    Cluster Manager and Webapp
7072                     Cluster Manager: Data Daemon ops port
8080                    Cluster Manager: Node daemon port
8081                     Cluster Manager: Ganglia
8649, 8651          Cluster Manager: JDBC
10000                   Cluster Manager: JDBC
29998-29999.    Cluster Manager: Tachyon
32768-61000      WebApp: Ephemeral ports for UI
8649(UDP).         WebApp: Ganglia


NACL :

NACL related to DB are wide open and would be good way to block traffic on any specific port. Like I have worked on few issues where all DNS traffic from specific CIDR needs to be blocked , if the inbound rules are updated to block all the traffic on port 53 from specific source CIDR it would be sufficient.

                                            NACL Inbound rules allowing all traffic

                                            NACL Outbound rules allowing all traffic







No comments:

Post a Comment