Authentication — verifying the identity of a user as human or machine — is the foundation of any successful data governance program. Often, other components of data governance, like authorization and access management, get more attention, but neither is effective without proper user authentication first. Does it matter how fine-grained your access controls are if you don’t know who the user is? Not really. In reality, you have to have authentication in place before authorization.
It’s also important not to confuse authentication with authorization, a common mistake. While authentication is the process of verifying identity, authorization involves granting access to data and services based on that identity, associated roles and attributes, and other factors.
In the real world, authenticating your identity is sometimes as simple as flashing your driver’s license. Things get more complicated when it comes to authenticating users who are seeking access to cloud-based data and analytics services. Amazon Web Services (AWS), in particular, poses a number of authentication challenges. Namely, AWS offers a number of different authentication methods depending on which of its services you are using. It is common for enterprises to use a number of different AWS services simultaneously, making applying consistent authentication across these services non-trivial.
So how do you actually establish effective user authentication on AWS? First, let’s take a quick look at the key concepts you need to understand to implement effective authentication for AWS: Identity and Access Management (IAM) User, IAM Role, and Security Token Service (STS). Most work with AWS Federated Services, which itself uses SAML and Kerberos (for EMR Spark, Hive, etc). We’ll keep these brief, as AWS has much more detailed information on each should you want to dig deeper.
IAM USER
An IAM user represents either a human or an application user that you create. It consists of names and credentials. These could be (1) username and password for the AWS management console or (2) a combination of access key ID and secret access key when using the API for code or (3) a command prompt when using the AWS CLI or AWS PowerShell tools.
IAM ROLE
An IAM role is an identity to which you assign permissions, which can then be adopted by users. It is especially helpful in enabling users to communicate with or access one AWS service from another. Imagine that you want to enable users to access S3 from EC2. In AWS, since you can’t directly assign policies to a service, you must create a role whose permission includes access to S3, then assign that role to EC2. Take a look at this example to understand how exactly it works.
SECURITY TOKEN SERVICE
The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate through identity federation (see below). The best use case of STS is where you have identity federation, delegation, cross-account access, and IAM roles.
FEDERATED SERVICE
Federation Services allows you to centrally manage access to AWS resources using a single sign-on tool or using your enterprise directory. Identity and security information is exchanged between the application and identity provider using services like SAML. You can dig into the architecture details and a guide to implementing AWS Federated Authentication with Active Directory Federation Services (AD FS) here.
SAML
Security Assertion Markup Language 2.0 (SAML) is a standard to provide user identities for authentication and authorization. Imagine logging into one system using your username and password, which then authenticates you across various other applications and services. With SAML, you type your password or provide login credentials only once but gain access to multiple services. In AWS, when you login using SAML, the user is mapped to one of the IAM roles. More on SAML from AWS here.
KERBEROS
Kerberos is an user and services authentication protocol that uses the concept of principal, which is a unique identity within the Kerberos protocol. Kerberos provides enhanced security as no credentials are sent over the network. There is a concept of realm and Kerberos Distribution Center (KDC) that provides the means for principals to authenticate. As part of Amazon EMR, Kerberos plays a key role in authenticating users logged into the EC2 instance and provides security when users try to submit remote jobs to YARN or try to access services like HiveServer2 remotely.
Below is a handy chart identifying various authentication methods and concepts as applied to a number of popular AWS data services. If you’re looking for a service that is not listed, please reach out to Privacera for more details.
Service | IAM User (Access/Secret Keys) | IAM Role | Kerberos | DB User |
S3 | X | X | ||
EMR Hive | X | |||
EMR Spark | X | |||
DynamoDB | X | X | ||
RedShift | X | X | ||
Glue | X | X | ||
Athena | X | X | ||
Lamda | X | X | ||
RDS | X | X | ||
AuroraDB | X | X | ||
AWS CLI | X | X |
Closing Remarks
Now, with user authentication in place, we can now move on to the next step in the data governance and security process: access control and policy management. We’ll dig into that step, including authorization and fine-grained access control, in the next post.
Learn more about Privacera here, or contact us to schedule a call to discuss how we can help your organization meet its dual mandate of balancing data democratization with security to maximize business insights while ensuring privacy and compliance.