design a data center for ADS applications

background

A system’s back end can be made up of a number of bare metal servers, data storage facilities, virtual machines, a security mechanism, and services, all built in conformance with a deployment model, and all together responsible for providing a service.

points to consider

  • it’s the primary authority and responsibility of the back end to provide a built-in security mechanism, traffic control and protocols.

  • a central server is responsible for managing and running the system, systematically reviewing the traffic and client request to make certain that everything is running smoothly.

service viewport

busniess services

the most common busniess services or application-as-a-service(aaas) in ADS data center includes:

  • massively scenario simualtion

  • open-loop re-play simulation

  • AI training

  • sensor data analysis

  • test driven algorithms dev

platform as a service

busniess services can be considered as the most abstract level services, paas consider how to support the upper level needs. a few components are obvious:

  • distributed data storage(aws s3, ceph)

  • massively data analysis(hadoop)

  • massively compute nodes(k8s/docker)

  • vdi server pool

  • sql

  • web server for end-user UI

IaaS

to support PaaS, we need CPUs, GPUs, network, either in physical mode or virtual mode. the common private cloud vendor would suggest a general virtualization layer, to manage all resources in one shot. but there is always an balance between the easy manage and performance lost.

for large auto industry OEMs, no doubt easy manage is crucial. so it’s suggested to implement virtual layer(either vmware or customized kvm). if not, I wonder the self-maintainaince will be a disaster in future.

security in cloud

who is Cybersecurity professional

the guy who provide security during the development stages of software systems, networks, and data centers.

must make security measures for any information by designing various defensive systems and strategies against intruders.

The specialist must create new defensive systems and protocols and report incidents. Granting permissions and privileges to authorized users is also their job.

The cybersecurity professional must maintain IT security controls documentation, recognize the security gaps, and prepare an action plan accordingly.

Cybersecurity professionals enable security in IT infrastructure, data, edge devices, and networks.

Azure Security: best practices

  • control network access

    At this ring you typically find Firewall policies, Distributed Denial of Service (DDoS) prevention, Intrusion Detection and Intrusion Prevention systems (IDS/IPS), Web Content Filtering, and Vulnerability Management such as Network Anti-Malware, Application Controls, and Antivirus.

    The second ring is often a Network Security Group (or NSG) applied to the subnet. Network Security Groups allow you to filter network traffic to and from Azure resources in an Azure virtual network.

    all subnets in an Azure Virtual Network (VNet) can communicate freely. By using a network security group for network access control between subnets, you can establish a different security zone or role for each subnet. As such, all subnets should be associated with a properly configured Network Security Group. 
    
    With a virtual server, there is a third ring which is a Network Security Group (NSG) applied to virtual machines network interfaces, 
    
    avoid exposure to the internet with a dedicated WAN connection. Azure offers both site-to-site VPN and ExpressRoute for this purpose.
    
  • disable remote access (ssh/rdp)

disable remote access to vm from internet. ssh/rdp only should be provided over a secure dedicated conenction using Just-In-time(JIT) vm access.

the Just-In-Time VM access policy configures at the NSG to lock down the virtual machines remote management ports. When an authorized user requires access to the VM, they will use Just-In-Time VM Access to request access for up to three hours. After the requested time has elapsed, Azure locks the management ports down to help reduce susceptibility to an attack.

  • update vm

You need to run antivirus and anti-malware. and requires system updates for VMs hosted in Azure

  • safeguard sensitive data

  • enable encryption

image

aws security: instance level security

  • AWS security groups, provides security at the protocol and port access level, working much the same way as a firewall – contains a set of rules that filter traffic coming in and out of an EC2 instance.

  • os security path management

  • key pairs(public and private key to login EC2 instance)

aws security: network ACL and subnets: network level security

aws security: bastion hosts

the connectivity flowing from an end-user to resources on a private subnet through a bastion host:

image

  • updates to bastion host

skip bastion if using Session Manager, to securely connect to private instance in virtual private cloud wthout needing bastion host or key-pairs

now push keys for short periods of time and use IAM policies to restrict access as you see fit. This reduces your compliance and audit footprint as well

  • NAT(network address translation) [gateway] instance

allows private instance outgoing connectivity to the internet while at the same time blocking inboud traffic from the internet

  • VPC(virtual private cloud) peering

image

aws security: identity and access management(IAM)

governs and control user acces to VPC resoruces, it achieves the goal through Users/Group/Roles and Policies.

network topology in cloud

network topology at first

I would consider the data center network topology and security mechanism from the following four points:

  • internal network topology

basically the data center will have an internal network, to connect infrastructures, e.g. data storage nodes, k8s compute nodes, hadoop compute nodes, web server nodes, vdi server nodes e.t.c.

  • network gateway to end-users

there should be a unique network gateway in data center, which is the only network IO for end-user access.

  • network gateway to other private/public cloud

we also need connect to other IT infrastructure, so needanother network gateway.

the two gateways above, can be either virtual gateway or physical gateway, depends on our hardware. e.g. vlan, bridge, or physical gateway

  • admin pass-through network

network gateway is the normal user access port, but for admin, specially for sysm management, trouble-shooting, e.t.c, we need a pass-through network, which basically directly connect to internal network of data center,
admin pass-through network is speed limited, so only for special admin usage.

h3c: 浅谈数据中心网络架构的发展

  • 接入层,用于连接所有的计算节点,在较大规模的数据中心中,通常以柜顶交换机的形式存在;
  • 汇聚层,用于接入层的互联,并作为该汇聚区域二三层的边界,同时各种防火墙、负载均衡等业务也部署于此;
  • 核心层,用于汇聚层的的互联,并实现整个数据中心与外部网络的三层通信。

传统的数据中心内,服务器主要用于对外提供业务访问,不同的业务通过安全分区及VLAN隔离。一个分区通常集中了该业务所需的计算、网络及存储资源,不同的分区之间或者禁止互访,或者经由核心交换通过三层网络交互,数据中心的网络流量大部分集中于南北向.

在这种设计下,不同分区间计算资源无法共享,资源利用率低下的问题越来越突出。通过虚拟化技术、云计算管理技术等,将各个分区间的资源进行池化,实现数据中心内资源的有效利用。而随着这些新技术的兴起和应用,新的业务需求如虚拟机迁移、数据同步、数据备份、协同计算等在数据中心内开始实现部署,数据中心内部东西向流量开始大幅度增加。

h3c: two-layer network arch

  • 网络三层互联,或称为,数据中心前端网络互联。“前端网络”,是指数据中心面向企业园区的出口。不同数据中心的前端网络通过ip实现互联,园区或分支的客户端通过前端网络访问各数据中心。
  • 网络两层互联,或称为,数据中心服务器网络互联。在不同数据中心服务器网络接入层,构建一个跨数据中心的搭二层网络(vlan),以满足服务器集群或虚拟机动态迁移等场景

  • san互联,也称为,后端存储网络互联。借助传输技术,实现主中心、灾备中心间磁盘阵列的数据复制

二层互联的业务需求:保证服务器的高可用集群。

二层互联设计要点:面对中小企业客户(ip网络)

openstack neutron: network for cloud

  • data flow in data center:
  • manage(API) network, basically the internal managed network

  • user network

  • external network, including vpn, firewall

  • storage network, connect from computing nodes to storage ndoes

  • NSX arch

image

the left most part is computing nodes, for customers bussniess, the dataflow includes: user-network, storage, internal-network, all of which requires 3 NIC

the middle part is infrastructure, including managing nodes, shared storage, which brings IP based storage for the left most part.

the right edge part, is external(internet) network services for users, including network to users, as well as network to Internet, firewall, public IP address translation e.t.c.

image

ceph subnets

image

when consider ceph storage for k8s computing nodes, the public network also includes network switches to k8s, as well as normal user data access network switches.

  • cluster network (Gb NIC, including osd and monitor)

  • ceph client(to k8s) network (gb)

  • ceph admin/user network(mb)

k8s subnets

understand k8s network:pods, service, ingress

  • pods, all containers in one pod, share the same network namespace. the network namespace of pod is different from that of the host machine, but the two is connected by docker bridge

  • services, handle the load balance among pods, as well as encapsulate the IPs of pods, so we don’t directly deal with the local dynamic IPs of pods.

  • how to access k8s service from exteral, or how user acces the k8s hosted in a remote data center? ingress for k8s

image

flannel
setup a two-layer network, the ip of pod is assigend by flannel, each node will has a flannel0 virtual NIC, used for node-node communication.

gpu vdi subnets

as mentioned in gpu vdi, most solution has a customized vdi client, the vm manager internal network is handled by the vendors, maybe communicate through one Gb NIC, the client-server is simple TCP/ip.

webserver subnets

a few things in mind:

  • using vm.

web servers are better to deploy in vm, so whenever there is a hardware failure, it can detect and automatically transfer to another vm.

  • communicate with in-house services

in ADS data center, most web application need access data from either sql or even storage services directly, which means the web server need both external ingress services as well as handling internal IP access.

network manager

the above subnet classification only consider each component itself, for a data center in whole, it’s better to manage all networks of internal subnets and access to external Internet in one module: network manager.

as mentioned in the previous section: security in cloud, the network manager module can further add some security mechanisms.

refere

cloud arch: front end & back end

Microsoft Azure security tech training courses

introduction the Foundation certificate in Cyber Security

datacenter network: topology and routing

miniNet for data center network topology

openstack neutron: two-layer network