一、概述
1.1 背景介绍
多团队共用一个K8s集群时,权限管理是绕不开的问题。开发人员误删了生产namespace的Deployment、实习生用kubectl delete把整个namespace干掉了、CI/CD的ServiceAccount权限过大导致安全审计不通过——这些都是没做好RBAC的后果。
Kubernetes的RBAC(Role-Based Access Control)基于角色的访问控制,通过Role/ClusterRole定义权限,通过RoleBinding/ClusterRoleBinding把权限绑定到用户、组或ServiceAccount。RBAC从1.8版本GA,是K8s默认的授权模式。
本文覆盖RBAC的完整配置方案,包括用户认证、角色设计、权限分配、审计日志,基于Kubernetes 1.28.x版本。
1.2 技术特点
- 最小权限原则:只授予完成工作所需的最小权限,不给多余权限
- 命名空间隔离:Role作用于单个namespace,ClusterRole作用于整个集群,两个层级的权限控制
- 灵活绑定:一个Role可以绑定给多个用户,一个用户可以绑定多个Role,多对多关系
- 内置角色:K8s预置了view、edit、admin、cluster-admin等ClusterRole,覆盖常见场景
1.3 适用场景
- 场景一:多团队共用集群,每个团队只能操作自己namespace中的资源
- 场景二:CI/CD流水线的ServiceAccount,只授予部署所需的最小权限
- 场景三:运维人员分级管理,初级运维只读权限,高级运维有写权限,管理员有集群级权限
- 场景四:安全合规要求,审计谁在什么时间做了什么操作
1.4 环境要求
| 组件 | 版本要求 | 说明 |
|---|---|---|
| Kubernetes | 1.24+ | RBAC从1.8 GA,本文使用1.28 |
| kubectl | 与集群版本匹配 | 用于管理RBAC资源 |
| 认证方式 | X.509证书/OIDC/ServiceAccount | 至少配置一种用户认证方式 |
| 审计日志 | apiserver开启audit-log | 记录所有API操作 |
二、详细步骤
2.1 准备工作
2.1.1 RBAC核心概念
RBAC由四个资源对象组成:
“`text Role(命名空间级别权限) ClusterRole(集群级别权限) ↓ 绑定 ↓ 绑定 RoleBinding(命名空间级别绑定) ClusterRoleBinding(集群级别绑定) ↓ 绑定到 ↓ 绑定到 User / Group / ServiceAccount User / Group / ServiceAccount “`
“`bash
查看集群当前的RBAC配置
kubectl get roles -A kubectl get clusterroles kubectl get rolebindings -A kubectl get clusterrolebindings
查看内置的ClusterRole
kubectl get clusterrole view -o yaml kubectl get clusterrole edit -o yaml kubectl get clusterrole admin -o yaml kubectl get clusterrole cluster-admin -o yaml “`
2.1.2 认证方式选择
K8s本身不管理用户,用户认证由外部系统负责:
| 认证方式 | 适用场景 | 复杂度 | 说明 |
|---|---|---|---|
| X.509客户端证书 | 小团队、运维人员 | 低 | 用kubeadm签发证书,CN作为用户名,O作为组名 |
| OIDC(OpenID Connect) | 大团队、企业SSO | 中 | 对接Keycloak/Dex等身份提供商 |
| ServiceAccount Token | CI/CD、应用内访问 | 低 | K8s原生支持,每个namespace自动创建default SA |
| Webhook Token | 自定义认证 | 高 | 对接企业内部认证系统 |
2.1.3 权限规划
生产环境建议的角色体系:
| 角色 | 权限范围 | 适用人员 |
|---|---|---|
| cluster-admin | 集群所有资源的所有操作 | K8s管理员(1-2人) |
| namespace-admin | 指定namespace的所有操作 | 团队负责人 |
| developer | 指定namespace的Deployment/Pod/Service/ConfigMap读写 | 开发人员 |
| viewer | 指定namespace的只读权限 | 测试人员、实习生 |
| ci-deployer | 指定namespace的Deployment/Service更新权限 | CI/CD ServiceAccount |
| log-reader | 指定namespace的Pod日志读取权限 | 日志排查人员 |
2.2 核心配置
2.2.1 创建用户证书(X.509方式)
为开发人员创建K8s访问证书:
“`bash
1. 生成私钥
openssl genrsa -out developer-zhangsan.key 2048
2. 生成证书签名请求(CSR)
CN=zhangsan 作为用户名,O=dev-team 作为组名 openssl req -new -key developer-zhangsan.key \ -out developer-zhangsan.csr \ -subj “/CN=zhangsan/O=dev-team”
3. 用K8s CA签发证书(有效期365天)
方式一:直接用CA证书签发
openssl x509 -req -in developer-zhangsan.csr \ -CA /etc/kubernetes/pki/ca.crt \ -CAkey /etc/kubernetes/pki/ca.key \ -CAcreateserial \ -out developer-zhangsan.crt \ -days 365 “`
“`bash
方式二:通过K8s CertificateSigningRequest API签发(推荐)
cat << EOF | kubectl apply -f – apiVersion: certificates.k8s.io/v1 kind: CertificateSigningRequest metadata: name: zhangsan-csr spec: request: $(cat developer-zhangsan.csr | base64 | tr -d '\n') signerName: kubernetes.io/kube-apiserver-client usages:
- client auth expirationSeconds: 31536000 EOF
审批CSR
kubectl certificate approve zhangsan-csr
获取签发的证书
kubectl get csr zhangsan-csr -o jsonpath='{.status.certificate}’ | base64 -d > developer-zhangsan.crt “`
2.2.2 创建kubeconfig文件
“`bash
为zhangsan创建kubeconfig
CLUSTER_NAME=”prod-cluster” API_SERVER=”https://k8s-api-lb:8443” CA_CERT=”/etc/kubernetes/pki/ca.crt”
设置集群信息
kubectl config set-cluster \${CLUSTER_NAME} \ –certificate-authority=\${CA_CERT} \ –embed-certs=true \ –server=\${API_SERVER} \ –kubeconfig=zhangsan-kubeconfig
设置用户凭证
kubectl config set-credentials zhangsan \ –client-certificate=developer-zhangsan.crt \ –client-key=developer-zhangsan.key \ –embed-certs=true \ –kubeconfig=zhangsan-kubeconfig
设置上下文
kubectl config set-context zhangsan-context \ –cluster=\${CLUSTER_NAME} \ –namespace=team-backend \ –user=zhangsan \ –kubeconfig=zhangsan-kubeconfig
设置默认上下文
kubectl config use-context zhangsan-context \ –kubeconfig=zhangsan-kubeconfig
验证(此时还没有绑定角色,会报权限不足)
kubectl get pods –kubeconfig=zhangsan-kubeconfig
Error: forbidden
“`
2.2.3 定义Role和ClusterRole
“`yaml
文件:rbac-roles.yaml
1. 开发人员角色(namespace级别)
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: developer namespace: team-backend rules:
Deployment管理
- apiGroups: [“apps”]
resources: [“deployments”,”replicasets”,”statefulsets”]
verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”]
Pod操作(不给delete,防止误删)
- apiGroups: [“”]
resources: [“pods”,”pods/log”,”pods/exec”]
verbs: [“get”,”list”,”watch”]
Service和Ingress
- apiGroups: [“”] resources: [“services”,”endpoints”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”]
- apiGroups: [“networking.k8s.io”] resources: [“ingresses”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”] ####### ConfigMap和Secret(只读Secret,防止泄露)
- apiGroups: [“”] resources: [“configmaps”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”]
- apiGroups: [“”]
resources: [“secrets”]
verbs: [“get”,”list”,”watch”]
HPA
- apiGroups: [“autoscaling”]
resources: [“horizontalpodautoscalers”]
verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”]
Events(只读)
-
apiGroups: [“”] resources: [“events”] verbs: [“get”,”list”,”watch”]
2. 只读角色
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: viewer namespace: team-backend rules:
- apiGroups: [“”,”apps”,”batch”,”networking.k8s.io”,”autoscaling”]
resources: [“*”]
verbs: [“get”,”list”,”watch”]
明确排除Secret的data字段(只能看到metadata)
-
apiGroups: [“”] resources: [“secrets”] verbs: [“list”]
3. CI/CD部署角色
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: ci-deployer namespace: team-backend rules:
- apiGroups: [“apps”] resources: [“deployments”] verbs: [“get”,”list”,”watch”,”update”,”patch”]
- apiGroups: [“”] resources: [“services”] verbs: [“get”,”list”,”watch”,”update”,”patch”]
- apiGroups: [“”] resources: [“configmaps”,”secrets”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”]
- apiGroups: [“networking.k8s.io”] resources: [“ingresses”] verbs: [“get”,”list”,”watch”,”update”,”patch”]
-
apiGroups: [“”] resources: [“pods”] verbs: [“get”,”list”,”watch”]
4. 命名空间管理员(ClusterRole,可绑定到不同namespace)
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: namespace-admin rules:
- apiGroups: [“”,”apps”,”batch”,”networking.k8s.io”,”autoscaling”,”policy”]
resources: [““]
verbs: [““]
不给namespace和node的操作权限
- apiGroups: [“rbac.authorization.k8s.io”] resources: [“roles”,”rolebindings”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”,”delete”] “`
注意:developer角色没有给Pod的`delete`权限,也没有给`pods/exec`的`create`权限(exec进入容器需要create权限)。生产环境不建议给开发人员exec权限,需要排查问题时由运维人员操作。
2.2.4 创建RoleBinding和ClusterRoleBinding
“`yaml
文件:rbac-bindings.yaml
1. 绑定developer角色给zhangsan用户
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: zhangsan-developer namespace: team-backend subjects:
- kind: User name: zhangsan apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io
2. 绑定developer角色给dev-team组(组内所有用户都有权限)
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: dev-team-developer namespace: team-backend subjects:
- kind: Group name: dev-team apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io
3. 绑定viewer角色给测试人员
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: qa-team-viewer namespace: team-backend subjects:
- kind: Group name: qa-team apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: viewer apiGroup: rbac.authorization.k8s.io
4. CI/CD ServiceAccount绑定
apiVersion: v1 kind: ServiceAccount metadata: name: ci-deployer namespace: team-backend
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: ci-deployer-binding namespace: team-backend subjects:
- kind: ServiceAccount name: ci-deployer namespace: team-backend roleRef: kind: Role name: ci-deployer apiGroup: rbac.authorization.k8s.io
5. namespace-admin绑定给团队负责人(ClusterRole绑定到特定namespace)
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: team-lead-admin namespace: team-backend subjects:
- kind: User name: lisi apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: namespace-admin apiGroup: rbac.authorization.k8s.io “`
“`bash kubectl apply -f rbac-roles.yaml kubectl apply -f rbac-bindings.yaml
验证zhangsan的权限
kubectl auth can-i get pods –namespace=team-backend –as=zhangsan yes kubectl auth can-i delete pods –namespace=team-backend –as=zhangsan no kubectl auth can-i get pods –namespace=kube-system –as=zhangsan no “`
2.2.5 ServiceAccount Token管理
“`bash
K8s 1.24+不再自动为ServiceAccount创建永久Token
需要手动创建Token(有时效)
创建短期Token(1小时有效)
kubectl create token ci-deployer -n team-backend –duration=1h
创建长期Token Secret(不推荐,但CI/CD有时需要)
cat << 'EOF' | kubectl apply -f – apiVersion: v1 kind: Secret metadata: name: ci-deployer-token namespace: team-backend annotations: kubernetes.io/service-account.name: ci-deployer type: kubernetes.io/service-account-token EOF
获取Token
kubectl get secret ci-deployer-token -n team-backend -o jsonpath='{.data.token}’ | base64 -d
用Token创建kubeconfig
TOKEN=$(kubectl get secret ci-deployer-token -n team-backend -o jsonpath='{.data.token}’ | base64 -d)
kubectl config set-cluster prod-cluster \ –certificate-authority=/etc/kubernetes/pki/ca.crt \ –embed-certs=true \ –server=https://k8s-api-lb:8443 \ –kubeconfig=ci-deployer-kubeconfig
kubectl config set-credentials ci-deployer \ –token=\${TOKEN} \ –kubeconfig=ci-deployer-kubeconfig
kubectl config set-context ci-context \ –cluster=prod-cluster \ –namespace=team-backend \ –user=ci-deployer \ –kubeconfig=ci-deployer-kubeconfig
kubectl config use-context ci-context \ –kubeconfig=ci-deployer-kubeconfig “`
警告:长期Token Secret没有过期时间,泄露后攻击者可以一直使用。生产环境建议用短期Token配合Token刷新机制,或者用OIDC方式认证。
2.3 启动和验证
2.3.1 权限验证
“`bash #####使用kubectl auth can-i验证权限
检查当前用户权限
kubectl auth can-i –list –namespace=team-backend
模拟特定用户检查权限
kubectl auth can-i create deployments –namespace=team-backend –as=zhangsan kubectl auth can-i delete pods –namespace=team-backend –as=zhangsan kubectl auth can-i get secrets –namespace=team-backend –as=zhangsan kubectl auth can-i get nodes –as=zhangsan
模拟ServiceAccount检查权限
kubectl auth can-i update deployments –namespace=team-backend \ –as=system:serviceaccount:team-backend:ci-deployer
查看用户的所有权限
kubectl auth can-i –list –namespace=team-backend –as=zhangsan “`
2.3.2 实际操作验证
“`bash
用zhangsan的kubeconfig操作
export KUBECONFIG=zhangsan-kubeconfig
应该成功的操作
kubectl get pods -n team-backend kubectl get deployments -n team-backend kubectl get services -n team-backend
应该失败的操作
kubectl delete pod -n team-backend
Error: forbidden
kubectl get pods -n kube-system
Error: forbidden
kubectl get nodes
Error: forbidden
恢复管理员kubeconfig
unset KUBECONFIG “`
2.3.3 审计日志验证
“`bash
查看审计日志中zhangsan的操作记录
grep ‘”user”:{“username”:”zhangsan”‘ /var/log/kubernetes/audit.log | \ jq ‘{timestamp: .requestReceivedTimestamp, user: .user.username, verb: .verb, resource: .objectRef.resource, namespace: .objectRef.namespace}’ | \ tail -20 “`
三、示例代码和配置
3.1 完整配置示例
3.1.1 多团队RBAC完整方案
“`yaml
文件:multi-team-rbac.yaml
===== Namespace创建 =====
apiVersion: v1 kind: Namespace metadata: name: team-backend labels: team: backend
apiVersion: v1 kind: Namespace metadata: name: team-frontend labels: team: frontend
apiVersion: v1 kind: Namespace metadata: name: team-data labels: team: data
===== 通用ClusterRole定义 =====
跨namespace的只读权限(查看节点状态等)
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cluster-viewer rules:
- apiGroups: [“”] resources: [“nodes”,”namespaces”] verbs: [“get”,”list”,”watch”]
- apiGroups: [“storage.k8s.io”] resources: [“storageclasses”] verbs: [“get”,”list”,”watch”]
-
apiGroups: [“metrics.k8s.io”] resources: [“nodes”,”pods”] verbs: [“get”,”list”]
所有开发人员都能查看集群基本信息
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: all-developers-cluster-view subjects:
- kind: Group name: dev-team apiGroup: rbac.authorization.k8s.io
- kind: Group name: frontend-team apiGroup: rbac.authorization.k8s.io
- kind: Group name: data-team apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: cluster-viewer apiGroup: rbac.authorization.k8s.io “`
“`yaml
文件:team-backend-rbac.yaml
backend团队的namespace级别权限
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: backend-developers namespace: team-backend subjects:
-
kind: Group name: dev-team apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io
backend团队负责人有namespace-admin权限
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: backend-lead-admin namespace: team-backend subjects:
- kind: User name: lisi apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: namespace-admin apiGroup: rbac.authorization.k8s.io “`
3.1.2 用户证书批量创建脚本
“`bash #!/bin/bash
文件:/opt/scripts/create-k8s-user.sh
批量创建K8s用户证书和kubeconfig
set -euo pipefail
USERNAME=”\${1:?Usage: create-k8s-user.sh
}” GROUP=”\${2:?}” NAMESPACE=”\${3:?}” DAYS=365 CLUSTER_NAME=”prod-cluster” API_SERVER=”https://k8s-api-lb:8443″ CA_CERT=”/etc/kubernetes/pki/ca.crt” CA_KEY=”/etc/kubernetes/pki/ca.key” OUTPUT_DIR=”/data/k8s-users/\${USERNAME}” mkdir -p “\${OUTPUT_DIR}” echo “Creating K8s user: \${USERNAME}, Group: \${GROUP}, Namespace: \${NAMESPACE}” # 生成私钥 openssl genrsa -out “\${OUTPUT_DIR}/\${USERNAME}.key” 2048 # 生成CSR openssl req -new \\ -key “\${OUTPUT_DIR}/\${USERNAME}.key” \\ -out “\${OUTPUT_DIR}/\${USERNAME}.csr” \\ -subj “/CN=\${USERNAME}/O=\${GROUP}” # 签发证书 openssl x509 -req \\ -in “\${OUTPUT_DIR}/\${USERNAME}.csr” \\ -CA “\${CA_CERT}” \\ -CAkey “\${CA_KEY}” \\ -CAcreateserial \\ -out “\${OUTPUT_DIR}/\${USERNAME}.crt” \\ -days \${DAYS} # 创建kubeconfig kubectl config set-cluster \${CLUSTER_NAME} \\ –certificate-authority=\${CA_CERT} \\ –embed-certs=true \\ –server=\${API_SERVER} \\ –kubeconfig=”\${OUTPUT_DIR}/\${USERNAME}-kubeconfig” kubectl config set-credentials \${USERNAME} \\ –client-certificate=”\${OUTPUT_DIR}/\${USERNAME}.crt” \\ –client-key=”\${OUTPUT_DIR}/\${USERNAME}.key” \\ –embed-certs=true \\ –kubeconfig=”\${OUTPUT_DIR}/\${USERNAME}-kubeconfig” kubectl config set-context \${USERNAME}-context \\ –cluster=\${CLUSTER_NAME} \\ –namespace=\${NAMESPACE} \\ –user=\${USERNAME} \\ –kubeconfig=”\${OUTPUT_DIR}/\${USERNAME}-kubeconfig” kubectl config use-context \${USERNAME}-context \\ –kubeconfig=”\${OUTPUT_DIR}/\${USERNAME}-kubeconfig” # 清理CSR文件 rm -f “\${OUTPUT_DIR}/\${USERNAME}.csr” echo “Kubeconfig created: \${OUTPUT_DIR}/\${USERNAME}-kubeconfig” echo “Certificate expires: \$(openssl x509 -in \${OUTPUT_DIR}/\${USERNAME}.crt -noout -enddate)” echo “” echo “Next step: Create RoleBinding for \${USERNAME} in namespace \${NAMESPACE}” \`\`\` ### 3.2 实际应用案例 #### 案例一:CI/CD流水线最小权限配置 **场景描述**:GitLab CI流水线需要部署应用到K8s集群,要求只能更新指定namespace中的Deployment镜像版本,不能删除资源,不能访问其他namespace。 **实现代码**: \`\`\`yaml # 文件:ci-cd-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: gitlab-ci-deployer namespace: production — apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: ci-deploy-role namespace: production rules: # 只能更新Deployment(不能创建和删除) – apiGroups: [“apps”] resources: [“deployments”] verbs: [“get”,”list”,”watch”,”update”,”patch”] # 只能查看Pod状态(验证部署结果) – apiGroups: [“”] resources: [“pods”,”pods/log”] verbs: [“get”,”list”,”watch”] # 可以管理ConfigMap和Secret(部署时可能需要更新配置) – apiGroups: [“”] resources: [“configmaps”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”] – apiGroups: [“”] resources: [“secrets”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”] # 查看Service和Ingress – apiGroups: [“”] resources: [“services”] verbs: [“get”,”list”,”watch”] – apiGroups: [“networking.k8s.io”] resources: [“ingresses”] verbs: [“get”,”list”,”watch”] # 查看事件(排查部署问题) – apiGroups: [“”] resources: [“events”] verbs: [“get”,”list”,”watch”] — apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: gitlab-ci-deployer-binding namespace: production subjects: – kind: ServiceAccount name: gitlab-ci-deployer namespace: production roleRef: kind: Role name: ci-deploy-role apiGroup: rbac.authorization.k8s.io \`\`\` GitLab CI中使用: \`\`\`yaml # .gitlab-ci.yml deploy: stage: deploy image: bitnami/kubectl:1.28 script: – kubectl config set-cluster prod –server=$K8S_API_SERVER –certificate-authority=$K8S_CA_CERT – kubectl config set-credentials deployer –token=$K8S_TOKEN – kubectl config set-context deploy-ctx –cluster=prod –user=deployer –namespace=production – kubectl config use-context deploy-ctx – kubectl set image deployment/myapp myapp=registry.company.com/myapp:$CI_COMMIT_SHORT_SHA -n production – kubectl rollout status deployment/myapp -n production –timeout=300s \`\`\` #### 案例二:RBAC权限审计报告 **场景描述**:安全审计要求定期检查集群中的RBAC配置,找出权限过大的用户和ServiceAccount。 **实现脚本**: \`\`\`bash #!/bin/bash # 文件:/opt/scripts/rbac-audit.sh # RBAC权限审计脚本 set -euo pipefail REPORT_FILE=”/data/reports/rbac-audit-\$(date +%Y%m%d).txt” mkdir -p /data/reports { echo “==========================================” echo ” RBAC Audit Report – \$(date)” echo “==========================================” echo “” echo “=== 1. cluster-admin权限持有者 ===” echo “(这些用户/SA拥有集群最高权限,数量应该尽可能少)” kubectl get clusterrolebindings -o json | \\ jq -r ‘.items[] | select(.roleRef.name==”cluster-admin”) | .subjects[]? | “\\(.kind): \\(.name) (namespace: \\(.namespace // “cluster-wide”))”‘ echo “” echo “=== 2. 拥有Secret读取权限的非系统ServiceAccount ===” for ns in \$(kubectl get ns -o jsonpath='{.items[*].metadata.name}’); do for sa in \$(kubectl get sa -n “\$ns” -o jsonpath='{.items[*].metadata.name}’ 2>/dev/null); do if [[ “\$sa” != “default” ]] && kubectl auth can-i get secrets -n “\$ns” –as=”system:serviceaccount:\${ns}:\${sa}” 2>/dev/null | grep -q “yes”; then echo ” \${ns}/\${sa} – can read secrets” fi done done echo “” echo “=== 3. 拥有wildcard权限的Role/ClusterRole ===” echo “(使用*通配符的角色权限过于宽泛)” kubectl get roles -A -o json | \\ jq -r ‘.items[] | select(.rules[]? | .verbs[]? == “*” or .resources[]? == “*”) | “\\(.metadata.namespace)/\\(.metadata.name)”‘ kubectl get clusterroles -o json | \\ jq -r ‘.items[] | select(.metadata.name | startswith(“system:”) | not) | select(.rules[]? | .verbs[]? == “*” or .resources[]? == “*”) | .metadata.name’ echo “” echo “=== 4. 未绑定任何角色的ServiceAccount ===” for ns in \$(kubectl get ns -o jsonpath='{.items[*].metadata.name}’); do for sa in \$(kubectl get sa -n “\$ns” -o jsonpath='{.items[*].metadata.name}’ 2>/dev/null); do if [[ “\$sa” != “default” ]]; then bindings=\$(kubectl get rolebindings,clusterrolebindings -A -o json | \\ jq -r “.items[] | select(.subjects[]? | .kind==\\”ServiceAccount\\” and .name==\\”\${sa}\\” and .namespace==\\”\${ns}\\”) | .metadata.name” 2>/dev/null) if [[ -z “\$bindings” ]]; then echo ” \${ns}/\${sa} – no role bindings” fi fi done done echo “” echo “=== 5. 长期Token Secret统计 ===” kubectl get secrets -A -o json | \\ jq -r ‘.items[] | select(.type==”kubernetes.io/service-account-token”) | “\\(.metadata.namespace)/\\(.metadata.name) -> SA: \\(.metadata.annotations[“kubernetes.io/service-account.name”])”‘ echo “” echo “==========================================” echo ” Audit completed at \$(date)” echo “==========================================” } > “\${REPORT_FILE}” echo “Report saved to: \${REPORT_FILE}” \`\`\` — ## 四、最佳实践和注意事项 ### 4.1 最佳实践 #### 4.1.1 性能优化 – **减少ClusterRoleBinding数量**:每个ClusterRoleBinding都会被apiserver在每次请求时评估。100+个ClusterRoleBinding会增加apiserver的授权延迟。尽量用Group绑定替代逐个User绑定。 \`\`\`yaml # 不推荐:每个用户单独绑定 # 10个用户 = 10个RoleBinding # 推荐:用Group统一绑定 # 10个用户属于同一个Group = 1个RoleBinding subjects: – kind: Group name: dev-team apiGroup: rbac.authorization.k8s.io \`\`\` – **避免在Role中使用通配符**:\`resources: [“*”]\`和\`verbs: [“*”]\`会匹配所有资源和操作,不仅权限过大,apiserver在评估时也需要展开通配符,增加计算开销。明确列出需要的资源和操作。 – **合理使用aggregated ClusterRole**:K8s内置的view、edit、admin角色支持聚合,自定义CRD的权限可以通过label自动聚合到这些角色中,不需要单独创建绑定。 \`\`\`yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: myapp-crd-viewer labels: # 自动聚合到内置的view角色 rbac.authorization.k8s.io/aggregate-to-view: “true” rules: – apiGroups: [“myapp.company.com”] resources: [“myresources”] verbs: [“get”,”list”,”watch”] \`\`\` #### 4.1.2 安全加固 – **禁止default ServiceAccount自动挂载Token**:每个namespace的default SA默认会把Token挂载到所有Pod中,如果Pod被入侵,攻击者可以用这个Token访问API。 \`\`\`yaml # 方式一:在ServiceAccount上禁用 apiVersion: v1 kind: ServiceAccount metadata: name: default namespace: production automountServiceAccountToken: false # 方式二:在Pod上禁用 spec: automountServiceAccountToken: false containers: – name: app image: myapp:v1.0 \`\`\` – **定期轮换证书和Token**:用户证书设置合理的有效期(建议90天),ServiceAccount Token使用短期Token(1-24小时),配合Token刷新机制。 \`\`\`bash # 检查所有用户证书的过期时间 for crt in /data/k8s-users/*//*.crt; do echo “\${crt}: \$(openssl x509 -in \${crt} -noout -enddate)” done \`\`\` – **限制ServiceAccount的Token受众**:K8s 1.21+支持Bound Service Account Token,限制Token只能用于特定的API Server。 \`\`\`bash # 创建限定受众的Token kubectl create token ci-deployer -n production \\ –audience=https://k8s-api-lb:8443 \\ –duration=1h \`\`\` #### 4.1.3 高可用配置 – **HA方案一**:RBAC配置纳入GitOps管理,所有Role/RoleBinding/ClusterRole/ClusterRoleBinding存储在Git仓库,通过ArgoCD自动同步 – **HA方案二**:OIDC认证对接企业SSO(Keycloak/Azure AD),用户管理在SSO侧完成,K8s只做授权 – **备份策略**:每天导出所有RBAC资源的YAML,保留30天历史 ### 4.2 注意事项 #### 4.2.1 配置注意事项 **警告**:RBAC配置错误可能导致用户无法访问集群或权限过大,修改前务必用\`kubectl auth can-i\`验证。 – **注意** RoleBinding只能引用同namespace的Role,但可以引用ClusterRole(此时ClusterRole的权限被限制在RoleBinding所在的namespace内)。这是一个常用技巧:定义一次ClusterRole,在多个namespace中通过RoleBinding复用。 – **注意** 删除namespace会同时删除该namespace下的所有Role和RoleBinding,但不会删除ClusterRole和ClusterRoleBinding。重建namespace后需要重新创建RoleBinding。 – **注意** \`kubectl auth can-i\`的\`–as\`参数模拟的是用户身份,不会检查用户是否真实存在。即使用户不存在,只要有对应的RoleBinding就会返回yes。 #### 4.2.2 常见错误 | 错误现象 | 原因分析 | 解决方案 | |———-|———-|———-| | \`forbidden: User “xxx” cannot get resource “pods”\` | 用户没有对应namespace的pods读取权限 | 创建Role和RoleBinding授予权限 | | RoleBinding创建成功但权限不生效 | RoleBinding的subjects中用户名或SA名拼写错误 | \`kubectl get rolebinding -o yaml\` 检查subjects | | ServiceAccount Token无法认证 | Token过期或SA被删除重建(Token失效) | 重新创建Token | | \`cannot create resource “roles” in API group\` | 用户没有管理RBAC资源的权限 | 需要cluster-admin或有rbac管理权限的角色 | | 用户能访问不应该访问的namespace | 使用了ClusterRoleBinding而非RoleBinding | 改用RoleBinding限制到特定namespace | | Pod内无法访问API | automountServiceAccountToken设为false | 确认Pod需要API访问,手动挂载Token | #### 4.2.3 兼容性问题 – **版本兼容**:RBAC API \`rbac.authorization.k8s.io/v1\`从1.8 GA,所有现代K8s版本都支持 – **平台兼容**:云厂商托管K8s通常有额外的IAM集成,需要同时配置云IAM和K8s RBAC – **组件依赖**:OIDC认证需要apiserver配置\`–oidc-issuer-url\`等参数,kubeadm集群需要修改apiserver manifest — ## 五、故障排查和监控 ### 5.1 故障排查 #### 5.1.1 日志查看 \`\`\`bash # 查看apiserver审计日志中的授权失败记录 grep ‘”code”:403’ /var/log/kubernetes/audit.log | \\ jq ‘{time: .requestReceivedTimestamp, user: .user.username, verb: .verb, resource: .objectRef.resource, ns: .objectRef.namespace}’ | \\ tail -20 # 查看apiserver日志中的认证失败 journalctl -u kubelet | grep “Unable to authenticate” # 查看特定用户的操作记录 grep ‘”username”:”zhangsan”‘ /var/log/kubernetes/audit.log | \\ jq ‘{time: .requestReceivedTimestamp, verb: .verb, resource: .objectRef.resource, code: .responseStatus.code}’ | \\ tail -30 \`\`\` #### 5.1.2 常见问题排查 **问题一:用户报403 Forbidden** \`\`\`bash # 诊断命令 kubectl auth can-i -n –as= kubectl get rolebindings -n -o json | \\ jq ‘.items[] | select(.subjects[]?.name==” “) | .metadata.name’ \`\`\` **解决方案**: 1. 确认用户名是否正确(证书的CN字段) 2. 确认RoleBinding是否在正确的namespace 3. 确认Role中是否包含所需的apiGroups、resources和verbs 4. 用\`kubectl auth can-i –list –as= -n \`查看用户的完整权限列表 **问题二:ServiceAccount Token认证失败** \`\`\`bash # 诊断命令 kubectl get sa -n kubectl get secret -n | grep # 验证Token是否有效 TOKEN=$(kubectl create token -n ) curl -k -H “Authorization: Bearer \${TOKEN}” https://k8s-api-lb:8443/api/v1/namespaces/ /pods \`\`\` **解决方案**: – SA不存在:重新创建ServiceAccount – Token过期:K8s 1.24+的Token有时效,重新创建 – SA被删除重建:旧Token失效,需要重新获取Token **问题三:OIDC认证配置后用户无法登录** – **症状**:kubectl报\`Unauthorized\` – **排查**: \`\`\`bash # 检查apiserver的OIDC配置 ps aux | grep kube-apiserver | grep oidc # 验证OIDC Token curl -s https://keycloak.company.com/realms/k8s/.well-known/openid-configuration # 检查apiserver日志 crictl logs $(crictl ps –name kube-apiserver -q) 2>&1 | grep -i oidc \`\`\` – **解决**:确认apiserver的\`–oidc-issuer-url\`、\`–oidc-client-id\`、\`–oidc-username-claim\`配置正确 #### 5.1.3 调试模式 \`\`\`bash # 查看用户的完整权限 kubectl auth can-i –list –as=zhangsan -n team-backend # 查看ServiceAccount的权限 kubectl auth can-i –list –as=system:serviceaccount:team-backend:ci-deployer -n team-backend # 查看Role的详细规则 kubectl get role developer -n team-backend -o yaml # 查看谁绑定了某个ClusterRole kubectl get clusterrolebindings -o json | \\ jq -r ‘.items[] | select(.roleRef.name==”cluster-admin”) | {name: .metadata.name, subjects: .subjects}’ # 模拟请求测试 kubectl auth can-i create deployments -n team-backend –as=zhangsan –as-group=dev-team \`\`\` ### 5.2 性能监控 #### 5.2.1 关键指标监控 \`\`\`bash # 查看apiserver的授权延迟 kubectl get –raw /metrics | grep apiserver_authorization_duration # 查看认证失败次数 kubectl get –raw /metrics | grep apiserver_authentication_attempts # 查看RBAC评估次数 kubectl get –raw /metrics | grep apiserver_authorization_decisions_total \`\`\` #### 5.2.2 监控指标说明 | 指标名称 | 正常范围 | 告警阈值 | 说明 | |———-|———-|———-|——| | 授权延迟(P99) | 10ms | RBAC规则过多会增加延迟 | | 认证失败次数 | 50次/小时 | 频繁失败可能是暴力破解 | | 403响应比例 | 10% | 高比例说明权限配置有问题或有异常访问 | | cluster-admin绑定数 | 10 | cluster-admin权限应严格控制 | | 长期Token数量 | 按需 | 增长异常 | 长期Token应定期清理 | #### 5.2.3 Prometheus监控规则 \`\`\`yaml # 文件:rbac-alerts.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: rbac-alerts namespace: monitoring spec: groups: – name: rbac-security rules: – alert: HighAuthenticationFailureRate expr: | increase(apiserver_authentication_attempts{result=”failure”}[1h]) > 50 for: 5m labels: severity: warning annotations: summary: “High authentication failure rate detected” description: “More than 50 auth failures in the last hour” – alert: UnauthorizedAccessAttempts expr: | increase(apiserver_authorization_decisions_total{decision=”forbid”}[1h]) > 100 for: 5m labels: severity: warning annotations: summary: “High number of unauthorized access attempts” – alert: NewClusterAdminBinding expr: | changes(kube_clusterrolebinding_info{clusterrolebinding=~”.*admin.*”}[1h]) > 0 for: 1m labels: severity: critical annotations: summary: “New cluster-admin binding detected” – alert: ServiceAccountTokenLeakSuspicion expr: | increase(apiserver_authentication_attempts{result=”success”}[5m]) > 1000 for: 5m labels: severity: warning annotations: summary: “Unusually high API authentication rate, possible token leak” \`\`\` ### 5.3 备份与恢复 #### 5.3.1 备份策略 \`\`\`bash #!/bin/bash # RBAC配置备份脚本 # 文件:/opt/scripts/rbac-backup.sh set -euo pipefail BACKUP_DIR=”/data/rbac-backup/\$(date +%Y%m%d)” mkdir -p “\${BACKUP_DIR}” # 备份所有RBAC资源 kubectl get roles -A -o yaml > “\${BACKUP_DIR}/roles.yaml” kubectl get clusterroles -o yaml > “\${BACKUP_DIR}/clusterroles.yaml” kubectl get rolebindings -A -o yaml > “\${BACKUP_DIR}/rolebindings.yaml” kubectl get clusterrolebindings -o yaml > “\${BACKUP_DIR}/clusterrolebindings.yaml” kubectl get serviceaccounts -A -o yaml > “\${BACKUP_DIR}/serviceaccounts.yaml” # 压缩 tar czf “/data/rbac-backup/rbac-\$(date +%Y%m%d).tar.gz” -C “/data/rbac-backup” “\$(date +%Y%m%d)” rm -rf “\${BACKUP_DIR}” # 清理30天前的备份 find /data/rbac-backup -name “rbac-*.tar.gz” -mtime +30 -delete echo “[$(date)] RBAC backup completed” \`\`\` #### 5.3.2 恢复流程 1. **停止服务**:通知所有用户暂停操作 2. **恢复数据**:\`kubectl apply -f \${BACKUP_DIR}/clusterroles.yaml && kubectl apply -f \${BACKUP_DIR}/roles.yaml\` 3. **验证完整性**:\`kubectl get roles -A\`和\`kubectl get rolebindings -A\`确认恢复 4. **重启服务**:通知用户恢复操作,验证权限是否正常 — ## 六、总结 ### 6.1 技术要点回顾 – **要点一**:RBAC的核心是四个资源对象——Role定义namespace级别权限,ClusterRole定义集群级别权限,RoleBinding和ClusterRoleBinding把权限绑定到用户/组/ServiceAccount。理解这四个对象的关系是配置RBAC的基础 – **要点二**:最小权限原则是RBAC配置的第一准则。开发人员不给Pod的delete权限,CI/CD的ServiceAccount只给Deployment的update权限,测试人员只给只读权限。宁可权限不够再加,不要一开始就给大权限 – **要点三**:用Group绑定替代逐个User绑定,一个RoleBinding绑定一个Group就覆盖了组内所有用户。证书的O字段作为组名,OIDC的groups claim作为组名,统一管理效率高 – **要点四**:K8s 1.24+不再自动为ServiceAccount创建永久Token,生产环境用短期Token(\`kubectl create token\`)配合Token刷新机制,避免长期Token泄露的安全风险 – **要点五**:RBAC配置纳入GitOps管理,所有Role/ClusterRole/RoleBinding/ClusterRoleBinding存储在Git仓库,通过ArgoCD自动同步。配合审计日志定期审查权限,确保没有权限膨胀 ### 6.2 进阶学习方向 1. **OIDC集成企业SSO**:对接Keycloak/Dex/Azure AD实现统一身份认证,用户在SSO侧管理,K8s只做授权。大团队必备方案 – 学习资源:Kubernetes OIDC认证 – 实践建议:先在测试集群对接Dex,理解OIDC Token的签发和验证流程,再对接企业SSO 2. **OPA/Gatekeeper策略引擎**:RBAC控制的是”谁能做什么操作”,OPA控制的是”操作的内容是否合规”。两者配合实现完整的安全策略 – 学习资源:Gatekeeper项目 – 实践建议:从简单的策略开始,如禁止使用latest镜像标签、强制设置resource limits 3. **Kubernetes审计日志深度分析**:配置apiserver的audit-policy,按资源类型和操作分级记录,对接ELK或Loki做日志分析和告警 ### 6.3 参考资料 – Kubernetes RBAC官方文档 – RBAC完整配置说明 – Kubernetes认证机制 – 各种认证方式详解 – kubectl auth can-i – 权限检查工具 – Kubernetes审计日志 – 审计日志配置指南 — ## 附录 ### A. 命令速查表 \`\`\`bash # 角色管理 kubectl get roles -A # 查看所有namespace的Role kubectl get clusterroles # 查看ClusterRole kubectl get role -n -o yaml # 查看Role详细规则 kubectl create role –verb=get,list –resource=pods -n # 命令行创建Role kubectl create clusterrole –verb=get,list –resource=nodes # 创建ClusterRole # 绑定管理 kubectl get rolebindings -A # 查看所有RoleBinding kubectl get clusterrolebindings # 查看ClusterRoleBinding kubectl create rolebinding –role= –user= -n # 创建RoleBinding kubectl create clusterrolebinding –clusterrole= –user= # 创建ClusterRoleBinding # 权限检查 kubectl auth can-i -n # 检查当前用户权限 kubectl auth can-i -n –as= # 模拟用户检查 kubectl auth can-i –list -n –as= # 查看用户完整权限列表 kubectl auth can-i –as=system:serviceaccount:: # 检查SA权限 # ServiceAccount管理 kubectl get sa -A # 查看所有ServiceAccount kubectl create sa -n # 创建ServiceAccount kubectl create token -n –duration=1h # 创建短期Token # 证书管理 kubectl get csr # 查看证书签名请求 kubectl certificate approve # 审批CSR kubectl certificate deny # 拒绝CSR # 审计排查 kubectl get events -A –field-selector reason=Forbidden # 查看权限拒绝事件 kubectl describe rolebinding -n # 查看绑定详情 \`\`\` ### B. 配置参数详解 **Role/ClusterRole rules字段**: | 字段 | 说明 | 示例 | |——|——|——| | \`apiGroups\` | API组,空字符串\`””\`表示core API | \`[“”]\` = Pod/Service,\`[“apps”]\` = Deployment | | \`resources\` | 资源类型 | \`[“pods”, “pods/log”, “pods/exec”]\` | | \`verbs\` | 操作动词 | \`[“get”, “list”, “watch”, “create”, “update”, “patch”, “delete”]\` | | \`resourceNames\` | 限定具体资源名称(可选) | \`[“my-configmap”]\` 只能操作这个ConfigMap | **常用apiGroups对照表**: | apiGroup | 包含的资源 | |———-|————| | \`””\` (core) | pods, services, configmaps, secrets, namespaces, nodes, endpoints, events, persistentvolumeclaims | | \`apps\` | deployments, replicasets, statefulsets, daemonsets | | \`batch\` | jobs, cronjobs | | \`networking.k8s.io\` | ingresses, networkpolicies | | \`rbac.authorization.k8s.io\` | roles, rolebindings, clusterroles, clusterrolebindings | | \`autoscaling\` | horizontalpodautoscalers | | \`policy\` | poddisruptionbudgets | | \`storage.k8s.io\` | storageclasses, volumeattachments | | \`certificates.k8s.io\` | certificatesigningrequests | **内置ClusterRole说明**: | ClusterRole | 权限范围 | 适用场景 | |————-|———-|———-| | \`cluster-admin\` | 集群所有资源的所有操作 | K8s管理员,数量严格控制 | | \`admin\` | namespace内所有资源(不含ResourceQuota和namespace本身) | 团队负责人 | | \`edit\` | namespace内大部分资源的读写(不含Role/RoleBinding) | 开发人员 | | \`view\` | namespace内大部分资源的只读(不含Secret) | 测试人员、只读用户 | | \`system:node\` | kubelet需要的权限 | 节点自动绑定,不要手动修改 | **RoleBinding subjects字段**: | Kind | 说明 | name格式 | |——|——|———-| | \`User\` | 用户,对应证书CN字段或OIDC username claim | \`zhangsan\` | | \`Group\` | 用户组,对应证书O字段或OIDC groups claim | \`dev-team\` | | \`ServiceAccount\` | 服务账号,需要指定namespace | \`ci-deployer\`(需同时指定namespace字段) | ### C. 术语表 | 术语 | 英文 | 解释 | |——|——|——| | RBAC | Role-Based Access Control | 基于角色的访问控制,K8s默认的授权模式 | | Role | 角色 | namespace级别的权限定义,包含一组API操作规则 | | ClusterRole | 集群角色 | 集群级别的权限定义,可作用于所有namespace或集群资源 | | RoleBinding | 角色绑定 | 将Role或ClusterRole绑定到用户/组/SA,作用于特定namespace | | ClusterRoleBinding | 集群角色绑定 | 将ClusterRole绑定到用户/组/SA,作用于整个集群 | | ServiceAccount | 服务账号 | K8s原生的身份标识,用于Pod内访问API或CI/CD认证 | | CSR | Certificate Signing Request | 证书签名请求,用于通过K8s API签发用户证书 | | OIDC | OpenID Connect | 基于OAuth 2.0的身份认证协议,用于对接企业SSO | | CN | Common Name | X.509证书的通用名称字段,K8s用作用户名 | | O | Organization | X.509证书的组织字段,K8s用作用户组名 | | Audit Log | 审计日志 | apiserver记录的所有API操作日志,用于安全审计和问题排查 | | Token | 令牌 | ServiceAccount的认证凭证,K8s 1.24+默认使用短期Token | | Aggregated ClusterRole | 聚合集群角色 | 通过标签自动合并多个ClusterRole的权限,内置view/edit/admin支持聚合 |