Recent Trend

Create recursive image rotation animations
HTML challenge for Hacktoberfest 2020

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
Go实现的Trojan代理,支持多路复用/路由功能/CDN中转/Shadowsocks混淆插件,多平台,无依赖。A Trojan proxy written in Go. An unidentifiable mechanism that helps you bypass GFW.
The best library for implementation of all Data Structures and Algorithms - Trees + Graph Algorithms too!
Repository for the free online book Machine Learning from Scratch (link below!)
基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化
Hetty is an HTTP toolkit for security research. It aims to become an open source alternative to commercial software like Burp Suite Pro, with powerful features tailored to the needs of the infosec and
Fast and Simple Serverless Functions for Kubernetes
The official implementation of our SIGGRAPH 2020 paper Interactive Video Stylization Using Few-Shot Patch-Based Training
Bare metal Raspberry Pi 3 tutorials
This repository contains codes for various data structures and algorithms in C, C++, Java, Python.
A small C compiler
A list of awesome beginners-friendly projects.
?✨ Help beginners to contribute to open source projects
A World of Warcraft addon manager written in Rust.
Unity Open Project #1: Action-adventure
Repository for C++/C codes and algos.Star the repo too.
A library for answering questions using data you cannot see
一个Google Drive搜索引擎
NVIDIA PyTorch GAN library with distributed and mixed precision support
? Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas
Shopping cart built with MERN & Redux
A Ruby/Rack web server built for concurrency
Kubebuilder - SDK for building Kubernetes APIs using CRDs
The official repo for the design of the C# programming language
Developer Utilities for macOS
❄️ Elsa is a minimal, fast and secure runtime for Javascript and Typescript written in Go
Flutter-Python rubiks cube solver.
Code for the paper "Jukebox: A Generative Model for Music"
Azure Command-Line Interface
Terraform module which creates VPC resources on AWS
A more or less universal SSL unpinning tool for iOS
? Material Component Framework for Vue
Kubernetes Native Edge Computing Framework (project under CNCF)
Multiple companies give out swag for Hacktoberfest, and this repo tries to list them all.
? visx | visualization components
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
Windows Package Manager CLI (aka winget)
A SQL database implemented purely in TypeScript type annotations.
This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at
A good looking terminal emulator which mimics the old cathode display...
Mirror of Apache RocketMQ
A cross platform framework designed for Web developer. Introduction video -
Tinyhttpd 是J. David Blackstone在1999年写的一个不到 500 行的超轻量型 Http Server,用来学习非常不错,可以帮助我们真正理解服务器程序的本质。官网:
Python wrapper for TA-Lib (
Must-read papers on graph neural networks (GNN)
Impostor - An open source reimplementation of the Among Us Server
The Project is real time application in opencv using first order model
Script to setup Windows 10 LTSC/1903/1909/2004/2009
This repo contains annotated research papers that I found really good and useful
Create beautiful system diagrams with Go
Explorations in reactive UI patterns
Modern, lightweight and efficient 2D level editor
Disk Usage/Free Utility
Hazel Engine
A list of useful payloads and bypass for Web Application Security and Pentest/CTF
Simple, private file sharing from the makers of Firefox
V2ray , Trojan, Trojan-go, NaiveProxy, shadowsocksR install tools for windows V2ray,Trojan,Trojan-go, NaiveProxy, shadowsocksR的一键安装工具windows下用(一键科学上网)
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Repo for Vue 3.0 (currently in RC)
This is the official source code of FreeCAD, a free and opensource multiplatform 3D parametric modeler. Issues are managed on our own bug tracker at
A minimalist knowledge base manager
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
A little tool to play with Windows security
Empresas que constantemente oferecem vagas para junior e estagiários
A modern desktop interface for Linux. Improve your user experience and get rid of the anarchy of traditional desktop workflows. Designed to simplify navigation and reduce the need to manipulate window
⏰ Day.js 2KB immutable date library alternative to Moment.js with the same modern API
EPFL Machine Learning Course, Fall 2019
Set up a modern web app by running one command.
Background Matting: The World is Your Green Screen
A lightweight, pure-Swift library for downloading and caching images from the web.
Connect, secure, control, and observe services.
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral
Yearn solidity smart contracts
RocksDB/LevelDB inspired key-value database in Go
Laughs at your expense
Parse, validate, manipulate, and display dates in javascript.
Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.
Elegant transition library for iOS & tvOS
Depth-Aware Video Frame Interpolation (CVPR 2019)
Tensorflow2.0 ?? is delicious, just eat it! ??
✉️ A temporary email right from your terminal
Team Fortress 2, but with a lot of fixes, QoL improvements and performance optimizations!
PoC for Zerologon - all research credits go to Tom Tervoort of Secura
The repository for high quality TypeScript type definitions.
Ruby on Rails
R & stats illustrations by @allison_horst
My implementation of various GAN (generative adversarial networks) architectures like vanilla GAN, cGAN, DCGAN, etc.
Data science interview questions and answers

A proof-of-concept jupyter extension which converts english queries into relevant python code
Learn Python for free using open-source notebooks in Hebrew.
The Free Software Media System
BIGTREETECH SKR-mini-E3 motherboard is a ultra-quiet, low-power, high-quality 3D printing machine control board. It is launched by the 3D printing team of Shenzhen BIGTREE technology co., LTD. This bo
A simple tool for managing Xiaomi devices on desktop using ADB and Fastboot
Get a MacOS or Linux shell, for free, in around 2 minutes
Serverless integration and compute platform. Free for developers.
A tool for exploring each layer in a docker image
Libra’s mission is to enable a simple global payment system and financial infrastructure that empowers billions of people.
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters,

Test tool for CVE-2020-1472
This is a repository containing the list of company wise questions available on leetcode premium
Makani was a project to develop a commercial-scale airborne wind turbine, culminating in a flight test of the Makani M600 off the coast of Norway. All Makani software has now been open-sourced. This r
Creates a .csv file of all players in the English Player League with their respective team and total fantasy points
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
GraalVM: Run Programs Faster Anywhere ?
TensorFlow's Visualization Toolkit
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
An open source VR headset with SteamVR supports for $200
A custom RPC framework implemented by Netty+Kyro+Zookeeper.(一款基于 Netty+Kyro+Zookeeper 实现的自定义 RPC 框架-附详细实现过程和相关教程。)
Virtual machines for iOS
用于在 Heroku 上部署 V2Ray Websocket,本项目不宜做为长期使用之对策。
Becoming 1% better at data science everyday


This is a document to help with .NET memory analysis and diagnostics.
A curated list of awesome things related to HarmonyOS. 华为鸿蒙操作系统。
DP^3T Radar COVID fork
Statistical and Algorithmic Investing Strategies for Everyone
Radar COVID Verification Service
Native iOS app using DP^3T iOS sdk to handle Exposure Notification framework from Apple
Native Android app using DP^3T Android sdk to handle Exposure Notifications API from Google

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
CockroachDB - the open source, cloud-native distributed SQL database.
Code and resources for Machine Learning for Algorithmic Trading, 2nd edition.
Open source hwp viewer and parser library powered by web technology
A collection of awesome things regarding React ecosystem
Project Connected Home over IP is a new Working Group within the Zigbee Alliance. This Working Group plans to develop and promote the adoption of a new connectivity standard to increase compatibility
⚡ Yolo universal target detection model combined with EfficientNet-lite, the calculation amount is only 230Mflops(0.23Bflops), and the model size is 1.3MB
A PHP framework for web artisans
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
?cim(cross IM) 适用于开发者的分布式即时通讯系统
WiFi Hash Purple Monster, store EAPOL & PMKID packets in an SD CARD using a M5STACK / ESP32 device

A collection of algorithms and data structures
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
We developed GRAT2 Command & Control (C2) project for learning purpose.

Utility to find AES keys in running processes
Companion webpage to the book "Mathematics For Machine Learning"
Ultimate Python study guide for newcomers and professionals alike. ? ? ?

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
An awesome list that curates the best Flutter libraries, tools, tutorials, articles and more.
? ? Technical-Interview guidelines written for those who started studying programming. I wish you all the best. ?

A group video call for the web. No signups. No downloads.
Bitcoin Core integration/staging tree
A high performance blog template for the 11ty static site generator.
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
A library for building Haskell IDE tooling
? The minimal & fast library for functional user interfaces
JDK main-line development
Alternative firmware for ESP8266 with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full docum
The A32NX Project is a community driven open source project to create a free Airbus A320neo in Microsoft Flight Simulator that is as close to reality as possible. It aims to enhance the default A320ne
Deep Learning for humans
A collection of open source and commercial tools that aid in red team operations.
Roadmap to becoming a data engineer in 2020
Decentralized deep learning framework in pytorch. Built to train models on thousands of volunteers across the world.
Scipio is a thread-per-core framework that aims to make the task of writing highly parallel asynchronous application in a thread-per-core architecture easier for rustaceans
? A free, fast and beautiful API request builder used by 75k+ developers.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
100 Must-Read NLP Papers
Easily and securely send things from one computer to another ? ?
State of the Art Natural Language Processing
Turn a $30 USB switch into a full-featured multi-monitor KVM switch
Find big moving stocks before they move using machine learning and anomaly detection
flink learning blog. 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监
A collection of various deep learning architectures, models, and tips
Fes.js 是一个管理台应用解决方案,提供初始项目、开发调试、编译打包的命令行工具,内置布局、权限、数据字典、状态管理、Api等多个模块,文件目录结构即路由,用户只需要编写页面内容。基于Vue.js,内置管理台常用能力,让用户写的更少,更简单。经过多个项目中打磨,趋于稳定。
The modern styling library. Near-zero runtime, server-side rendering, multi-variant support, and best-in-class developer experience.
Course 18.S191 at MIT, fall 2020 - Introduction to computational thinking with Julia
The tool for beautiful monitoring and metric analytics & dashboards for Graphite, InfluxDB & Prometheus & More


?⚡ Daily scikit-learn tips

Short JavaScript code snippets for all your development needs
GDAL is an open source X/MIT licensed translator library for raster and vector geospatial data formats.
To Be Top Javaer - Java工程师成神之路
A list of companies that sponsor employees from other countries.
A collection of public resources about how software companies test their software

htop - an interactive process viewer
Making Docker management easy.
The fantastic ORM library for Golang, aims to be developer friendly
Here you should find the best power supplies for your low-power projects
Computational Economics Course 2020 by Kenneth Judd
Vimium for macOS.
Script to remove Windows 10 bloatware.
Some Tutorials and Things to Do while Hunting That Vulnerability.
The all-in-one Red Team extension for Web Pentester ?

? Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
WebRTC for the Curious: Go beyond the APIs
Matplot++: A C++ Graphics Library for Data Visualization ??
Learn to Code While Building Apps - The Complete Flutter Development Bootcamp
Flutter App Developer Roadmap - A complete roadmap to learn Flutter App Development. I tried to learn flutter using this roadmap. If you want to add something please contribute to the project. Happy L

? SushiSwap smart contracts
?支持多家云存储的云盘系统 (A project helps you build your own cloud in minutes)
? Playground and cheatsheet for learning Python. Collection of Python scripts that are split by topics and contain code examples with explanations.
100+ Python challenging programming exercises
Jupyter notebooks for teaching/learning Python 3
An extension for VS Code that visualizes data during debugging.
2D and 3D physics engines focused on performances.
A set of best practices for JavaScript projects
Bring data to life with SVG, Canvas and HTML. ???
OpenBot leverages smartphones as brains for low-cost robots. We have designed a small electric vehicle that costs about $50 and serves as a robot body. Our software stack for Android smartphones suppo
Windows kernel and user mode emulation.
A book for learning the Vim editor

Futuristic Sci-Fi and Cyberpunk Graphical User Interface Framework for Web Apps
A collection of useful .gitignore templates
The uncompromising Python code formatter
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
? Search the information available on a webpage using natural language instead of an exact string match.
The Cloud Native Edge Router
This repo will contain source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN
A terminal-based presentation tool with colors and effects.
Gitpod is an open-source Kubernetes application providing prebuilt, collaborative development environments in your browser - powered by VS Code.

A complete native navigation solution for React Native
Minimal self-contained examples of standard Kubernetes features and patterns in YAML
Visual Studio Code
A C++ header-only HTTP/HTTPS server and client library
Personal notes for SAA-C02 test from:
A demo project showcasing the production setup of the SwiftUI app with Clean Architecture
Turn (almost) any Python command line program into a full GUI application with one line
Generates LaTeX math description from Python functions.


⚡️ Volt Bootstrap 5 Admin Dashboard Template with vanilla Javascript
Intel Wi-Fi Drivers
Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
Video discussing this curriculum:
A Google Chrome / Firefox extension that blocks NSFW images from the web pages that you load using TensorFlow JS.
Smart solution to solve sudoku in VR


Study guides for MIT's 15.003 Data Science Tools
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
Certified Kubernetes Administrator - CKA Course
Animation engine for explanatory math videos
? Linguagem de programação simples e moderna em português
open source training courses about distributed database and distributed systemes
Godot Engine – Multi-platform 2D and 3D game engine

A hyperparameter optimization framework
A new bootable USB solution.
Alternative Factorio Friday Fan Facts, also known as Alt-F4
?? A collection of amazing open source projects built by brazilian developers
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Safe interop between Rust and C++

⚡️ Simple, Modular & Accessible UI Components for your React Applications
Vanced Installer
This database is a record of NYPD misconduct complaints made by the public to the Civilian Complaint Review Board (CCRB).
A curated list of awesome things related to Django
? The perfect Front-End Checklist for modern websites and meticulous developers
a fetch written in posix shell without any external commands (linux only)
PanDownload Web, built with CloudFlare Workers
Minimum Viable Study Plan for Machine Learning Interviews from FAAG, Snapchat, LinkedIn.
? Everything is RSSible
? ? The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
Amplify Framework provides a declarative and easy-to-use interface across different categories of cloud operations.
An entity framework for Go
Tensors and Dynamic neural networks in Python with strong GPU acceleration
(WIP)fork from ElemeFE/element ,A Vue.js 3.0 UI Toolkit for Web
? PostHog is developer-friendly, open-source product analytics.
A curated list of awesome header-only C++ libraries
Hyperledger Fabric is an enterprise-grade permissioned distributed ledger framework for developing solutions and applications. Its modular and versatile design satisfies a broad range of industry use
Repository for Project Insight: NLP as a Service
Browser application with 9 open source frontend focused tools
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱?,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源
A Vue.js 3.0 UI Toolkit for Web
Autoscaling components for Kubernetes
All Submissions you make to Magento Inc. ("Magento") through GitHub are subject to the following terms and conditions: (1) You grant Magento a perpetual, worldwide, non-exclusive, no charge, royalty f
A tool to help migrate JavaScript code quickly and conveniently to TypeScript
Cut and paste your surroundings using AR

Network-wide ads & trackers blocking DNS server
Clean Object-oriented & Layered Architecture
? Diagram as Code for prototyping cloud system architectures
Object detection and instance segmentation toolkit based on PaddlePaddle.
Python library for converting Python calculations into rendered latex.
Complete Free Coding Bootcamp 2020 MERN Stack
Convert typed text to realistic handwriting!
Archivy is a self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your knowledge bank.
mall-swarm是一套微服务商城系统,采用了 Spring Cloud Hoxton & Alibaba、Spring Boot 2.3、Oauth2、MyBatis、Docker、Elasticsearch等核心技术,同时提供了基于Vue的管理后台方便快速搭建系统。mall-swarm在电商业务的基础集成了注册中心、配置中心、监控中心、网关等系统功能。文档齐全,附带全套Spring Clou
Umami is a simple, fast, website analytics alternative to Google Analytics.
Android sources for the Dutch Covid19 Notification App
Material del curso IIC2233 Programación Avanzada ?
A very simple script to connect locast to Plex's live tv/dvr feature.
H1st AI solves the critical “cold-start” problem of Industrial AI: encoding human expertise to augment the lack of data, while building a smooth transition toward a machine-learning future. This probl
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)
Ergonomic machine learning.
? Fast, simple and clean video downloader
Spot Micro Quadripeg Project
✅ Solutions to LeetCode by Go, 100% test coverage, runtime beats 100% / LeetCode 题解
Repositório contendo todos os desafios dos módulos do Bootcamp Gostack
A static devirtualizer for VMProtect x64 3.x. powered by VTIL.
Full-sized drag & drop event calendar
Generates LaTeX math description from Python functions.
This is the frontend (VueJS) of the Youtube clone called VueTube.
VueTube is a YouTube clone built with nodejs, expressjs & mongodb. This is the RESTful API repository.


A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.


A MongoDB UI built with Electron
Software modular synth


A web browser engine for the space age ?

Draft of the fastai book
Visual localization made easy
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
渗透测试有关的POC、EXP、脚本、提权、小工具等,欢迎补充、完善---About penetration-testing python-script poc getshell csrf xss cms php-getshell domainmod-xss penetration-testing-poc csrf-webshell cobub-razor cve rce sql sql-poc p
A list of commands, scripts, resources, and more that I have gathered and attempted to consolidate for use as OSCP (and more) study material. Commands in 'Usefulcommands' Keepnote. Bookmarks and readi
Drogon: A C++14/17 based HTTP web application framework running on Linux/macOS/Unix/Windows
Open-source live customer chat
Build interactive, publication-quality documents from Jupyter Notebooks
Collection of awesome Java project on Github(Github 上非常棒的 Java 开源项目集合).
A free video streaming service that runs on a ESP32
The Servo Browser Engine
Front-end framework with a built-in dark mode, designed for rapidly building beautiful dashboards and product pages.
EventNative is an open-source data collection framework
Go library for accessing the GitHub API
A stablizing reserve currency protocol
OpenMMLab's next-generation platform for general 3D object detection.
? Hunt down social media accounts by username across social networks
Best Practices, code samples, and documentation for Computer Vision.
? Clean Code concepts adapted for JavaScript
Build a full-featured administrative interface in ten minutes
GeoSn0w's OpenJailbreak Project, an open-source iOS 11 to iOS 13 Jailbreak project & vault.
Azure Quickstart Templates
A new Node.js resource built using Gatsby.js with React.js, TypeScript, Emotion, and Remark.
KOOM is an OOM killer on mobile platform by Kwai.
A refreshingly simple data-driven game engine built in Rust
Pytorch?? is delicious, just eat it! ??
? 2,000,000+ Unsplash images made available for research and machine learning
Malwoverview is a first response tool to perform an initial and quick triage in a directory containing malware samples, specific malware sample, suspect URL and domains. Additionally, it allows to dow
Streisand sets up a new server running your choice of WireGuard, OpenConnect, OpenSSH, OpenVPN, Shadowsocks, sslh, Stunnel, or a Tor bridge. It also generates custom instructions for all of these serv
Intel Owl: analyze files, domains, IPs in multiple ways from a single API at scale
The GitHub Archive Program & Arctic Code Vault
Complete container management platform
Using TLS 1.3 to evade censors, bypass network defenses, and blend in with the noise
? Path to a free self-taught education in Data Science!
Generate responsive pages and apps on Tailwind, Flutter and SwiftUI.

List of open source tools for AWS security: defensive, offensive, auditing, DFIR, etc.
Deep neural network to extract intelligent information from invoice documents.
Replacement icons for popular apps in the style of macOS Big Sur
[Open Source]. The improved version of AnimeGAN.
Bluezone - Bảo vệ mình, bảo vệ cộng đồng
A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.
An open-source project includes many scripts with no Access Token needed for Facebook users by directly manipulating the DOM.
MCinaBox - A Minecraft Java Edition Launcher on Android
Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcemen
Adds TikTok Shares for you.
The easiest way to automate your data
A collection of scripts to flash Tuya IoT devices to alternative firmwares
Crush is an attempt to make a command line shell that is also a powerful modern programming language.
Performant type-checking for python.
Polkadot Node Implementation

Cloud native service mesh for the rest of us.
V2rayU,基于v2ray核心的mac版客户端,用于科学上网,使用swift编写,支持vmess,shadowsocks,socks5等服务协议,支持订阅, 支持二维码,剪贴板导入,手动配置,二维码分享等

A set of free MIT-licensed high-quality SVG icons for UI development.
A framework for building native apps with React.
Console-based user interface toolkit for .NET applications.
Atlas: End-to-End 3D Scene Reconstruction from Posed Images
AWS SDK for the Go programming language.
Curated applications for Kubernetes
Seamless operability between C++11 and Python
MediaPipe is the simplest way for researchers and developers to build world-class ML solutions and applications for mobile, edge, cloud and the web.
A proposta do projeto é uma aplicação que possa ligar quem deseja aprender, com quer ensinar. É possível encontrar alunos para o que você leciona, ou encontrar o professor para aquela matéria que você
Add-on for real-time collaboration in Blender.
This repository holds the device support files for the iOS, and I will update it regularly.
Parsing gigabytes of JSON per second
A declarative JavaScript library for application development using cloud services.
An iOS library to natively render After Effects vector animations
All files for 6 axis robot arm with cycloidal gearboxes .

A repository for All algorithms implemented in Javascript (for educational purposes only)
Show your latest blog posts from any sources or StackOverflow activity on your GitHub profile/project readme automatically using the RSS feed
Questions to ask the company during your interview
An open-source platform for making universal native apps with React. Expo runs on Android, iOS, and the web.
955 不加班的公司名单 - 工作 955,work–life balance (工作与生活的平衡)
✅ Curated list of resources for college students
An open-source big data platform designed and optimized for the Internet of Things (IoT).
Jazzy theme for Django
Full stack, modern web application generator. Using FastAPI, PostgreSQL as database, Docker, automatic HTTPS and more.
? Some useful websites for programmers.
Linux/OSX/FreeBSD resource monitor
Enumerate and disable common sources of telemetry used by AV/EDR.
InstaGrabber, the open-source Instagram client for Android. Originally by @AwaisKing.

Helpful list of powershell scripts I have found/created
Source to
Simple and privacy-friendly alternative to Google Analytics
An open source, low-code machine learning library in Python
Automated decryption tool
A repository listing out the potential sources which will help you in preparing for a Data Science/Machine Learning interview. New resources added frequently.

Curso de programación en Python - 2do cuatrimestre 2020 - UNSAM
GPU Accelerated JavaScript
How to systematically secure anything: a repository about security engineering
A high performance X11 animated wallpaper setter
? JAVClub - 让你的大姐姐不再走丢
The "cloud" at home

? Instagram Bot - Tool for automated Instagram interactions
A cat(1) clone with wings.
A Deep Learning based project for colorizing and restoring old images (and video!)
this is downloadings of all free student subscription courses as pdf from GitHub student pack
? Small exercises to get you used to reading and writing Rust code!
Updated list of public BitTorrent trackers
React Native client application for COVID Shield on iOS and Android
A collection of improved binary search algorithms.

Port of the double tap on back of device feature from Android 11 to any armv8 Android device
Starter files, final projects and FAQ for my Complete JavaScript course
Official open source SVG icon library for Bootstrap.
OneFlow is a performance-centered and open-source deep learning framework.
WIP: Roadmap to becoming a machine learning engineer in 2020
Hypervisor Memory Introspection Core Library
IBM Fully Homomorphic Encryption Toolkit For Linux
Tiny minimal 1px icons designed to fit in the smallest places.
An open source project management tool with Kanban boards
Exposure notification client application / Application client de notification d'exposition
?谷粒-Chrome插件英雄榜, 为优秀的Chrome插件写一本中文说明书, 让Chrome插件英雄们造福人类~ ChromePluginHeroes, Write a Chinese manual for the excellent Chrome plugin, let the Chrome plugin heroes benefit the human~ 公众号「0加1」同步更新
SSPanel V3 魔改再次修改版
Gets the last 5 months of volume history for every ticker, and alerts you when a stock's volume exceeds 10 standard deviations from the mean within the last 3 days
Build forms in React, without the tears ?
Standard and Advanced Demos for courses
Public release of the TransCoder research project
This repo contains hourly-updated data dumps of bug bounty platform scopes (like Hackerone/Bugcrowd/Intigriti/etc) that are eligible for reports
Cracking the Coding Interview 6th Ed. Solutions
?? Windows 95 in Electron. Runs on macOS, Linux, and Windows.
SkyArk helps to discover, assess and secure the most privileged entities in Azure and AWS
Everything you need to know to get the job.
Getting Genymotion & Burpsuite setup for Android Mobile App Analysis
DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
A curated list of awesome frameworks, libraries and software for the Java programming language.

Tye is a tool that makes developing, testing, and deploying microservices and distributed applications easier. Project Tye includes a local orchestrator to make developing microservices easier and the
Design patterns implemented in Java
Modern Java - A Guide to Java 8
JHipster is a development platform to quickly generate, develop, & deploy modern web applications & microservice architectures.
Official repository for the STAYAWAY COVID mobile application
Microsoft REST API Guidelines
This is the Ultimate Windows 10 Script from a creation from multiple debloat scripts and gists from github.
Just Announced - "Learn Spring Security OAuth":
Otto makes machine learning an intuitive, natural language experience.? Facebook AI Challenge winner
This repository contains the source code for the paper First Order Motion Model for Image Animation
Laravel best practices
⭐️ Companies that don't have a broken hiring process
PyTorch implementation of YOLOv4
A virtual Apple Macintosh with System 8, running in Electron. I'm sorry.
Your most handy video processing software
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficien
The Swift Programming Language
Flutter makes it easy and fast to build beautiful apps for mobile and beyond.
Open and cheap DIY IP-KVM based on Raspberry Pi
.NET Decompiler with support for PDB generation, ReadyToRun, Metadata (&more) - cross-platform!
⚛️ Projeto feito durante a Imersão React da Alura
☄?️ The minimal, blazing-fast, and infinitely customizable prompt for any shell!
Leon Sans is a geometric sans-serif typeface made with code in 2019 by Jongmin Kim.
Order computer parts from a satellite orbiting around your minecraft world and build actual working computers with them!


Personal notes for SAA-C02 test from:

SAA-C02 Notes

These are my personal notes from Adrian Cantrill's (SAA-C02) course. Learning Aids from aws-sa-associate-saac02. There may be errors, so please purchase his course to get the original content and show support

Table of Contents


Cloud computing provides

  1. On-Demand Self-Service: Provision and terminate using a UI/CLI without human interaction.
  2. Broad Network Access: Access services over any networks on any devices using standard protocols and methods.
  3. Resource Pooling: Economies of scale, cheaper service.
  4. Rapid Elasticity: Scale up and down automatically in response to system load.
  5. Measured Service: Usage is measured. Pay only for what you consume.

Public vs Private vs Multi Cloud

Cloud Service Models

The Infrastructure Stack or Application Stack contains multiple components that make up the total service. There are parts that you manage as well as portions the vendor manages. The portions the vendor manages and you are charged for is the unit of consumption

  1. On-Premises: The individual manages all components from data to facilities. Provides the most flexibility, but also most IT intensive.
  2. Data Center Hosting: Place equipment in a building managed by a vendor. You pay for the facilities only.
  3. Infrastructure as a Service (IaaS): Vendor manages facilities and everything else related to servers up to the OS. You pay per second or minute for the OS used to the vendor. Lose some flexibility, but big risk reductions.
  4. Platform as a Service (PaaS): Good for running an application only. The unit of consumption is the runtime environment. You manage the application and the data, but the vendor manges all else.
  5. Software as a Service (SaaS): You consume the software as a service. This can be Outlook or Netflix. There are almost no risks or additional costs, but very little control.

There are additional services such as Function as a Service, Container as a Service, and DataBase as a Service which be explained later.


AWS Support Plans

Public vs Private Services

Refers to the networking only, not permissions.

AWS Global Infrastructure


AWS Region is an area of the world they have selected for a full deployment of AWS infrastructure.

Areas such as countries or states

AWS can only deploy regions as fast as their planning allows. Regions are often not near their customers.

AWS Edge Locations

Local distribution points. Useful for services such as Netflix so they can store data closer to customers for low latency high speed transfers.

If a customer wants to access data stored in Brisbane, they will stream data from the Sydney Region through an Edge Location hosted in Brisbane.

AWS Management

Regions are connected together with high speed networking. Some services such as EC2 need to be selected in a region. Some services are global such as IAM

Region's 3 Benefits

Regions and AZs

Region Name: Asia Pacific (Sydney) Region Code: ap-southeast-2

AWS will provide between 2 and 6 AZs per region. AZs are isolated compute, storage, networking, power, and facilities. Components are allowed to distribute load and resilience by using multiple zones.

AZs are connected to each other with high speed redundant networks.

Service Resilience

  1. Globally Resilient: IAM or Route 53. No way for them to go down. Data is replicated throughout multiple regions.
  2. Region Resilient: Operate as separate services in each region. Generally replicate data to multiple AZs in that region.
  3. AZ Resilient: Run from a single AZ. It is possible for hardware to fail in an AZ and the service to keep running because of redundant equipment, but should not be relied on.

AWS Default VPC

VPC is a virtual network inside of AWS. A VPC is within 1 account and 1 region which makes it regionally resilient. A VPC is private and isolated until decided otherwise.

One default VPC per region. Can have many custom VPCs which are all private by default.

Default VPC Facts

VPC CIDR - defines start and end ranges of the VPC. IP CIDR of a default VPC is always:

Configured to have one subnet in each AZ in the region by default.

Subnets are given one section of the IP ranges for the default service. In general do not use the Default VPC in a region because it is not flexible.

Default VPC is large because it uses the /16 range. A subnet is smaller such as /20 The higher the / number is, the smaller the grouping.

Two /17's will fit into a /16, sixteen /20 subnets can fit into one /16.

Elastic Compute Cloud (EC2)

Default compute service. Provides access to virtual machines called instances.

IaaS - Infrastructure as as Service

The unit of consumption is an instance EC2 instance is configured to launch into a single VPC subnet. Private service by default, public access must be configured. The VPC needs to support public access. If you use a custom VPC then you must handle the networking on your own.

EC2 deploys into one AZ. If it fails, the instance fails.

Different sizes and capabilities all use On-Demand Billing - Per second. Only pay for what you consume.

Charge for running the instance, CPU, memory and storage. Extra cost for any commercial software the instance deploys with.

Local on-host storage or Elastic Block Storage

Pricing based on:

Running State

Charged for all four categories.

Stopped State

Charged for EBS storage only.

Terminated State

No charges, deletes the disk and prevents all future charges.

AMI (Server Image)

AMI can use used to create an instance or created from an instance. AMIs in one region are not available from other regions.


AMI Types:

Connecting to EC2

Login to the instance using an SSH key pair. Private Key - Stored on local machine to initiate connection. Public Key - AWS places this key on the instance.

S3 (Default Storage Service)

Global Storage platform. Runs from all regions and is a public service. Can be accessed anywhere from the internet with an unlimited amount of users.

This should be the default storage platform

S3 is an object storage, not file, or block storage. You can't mount an S3 Bucket.


Can be thought of a file. Two main components:

Other components:


If the objects name starts with a slash such as /old/Koala1.jpg the UI will present this as a folder. In actuality this is not true, there are no folders.

CloudFormation Basics

Templates can modify infrastructure to, create, update and delete.

Written in YAML or JSON

## This is not mandatory unless a description is added
AWSTemplateFormatVersion: "version date"

## Give details as to what this template does.
## If you use this section, it MUST immediately follow the AWSTemplateFormatVersion.
  A sample template

## Can control the command line UI. The bigger your template, the more likely
## this section is needed
  template metadata

## Prompt the user for more data. Name of something, size of instance,
## data validation
  set of parameters

## Another optional section. Allows lookup tables, not used often
  set of mappings

## Decision making in the template. Things will only occur if a condition is met.
## Step 1: create condition
## Step 2: use the condition to do something else in the template
  set of conditions

  set of transforms

## The only mandatory field of this section
  set of resources

## Once the template is finished it can return data or information.
## Could return the admin or setup address of a word press blog.
  set of outputs


An example which creates an EC2 instance

  Instance: ## Logical Resource
    Type: 'AWS::EC2::Instance' ## This is what will be created
    Properties: ## Configure the resources in a particular way
      ImageId: !Ref LatestAmiId
      Instance Type: !Ref Instance Type
      KeyName: !Ref Keyname

Once a template is created, AWS will make a stack. This is a living and active representation of a template. One template can create infinite amount of stacks.

For any Logical Resources in the stack, CF will make a corresponding Physical Resources in your AWS account.

It is cloud formations job to keep the logical and physical resources in sync.

A template can be updated and then used to update the same stack.

CloudWatch Basics

Collects and manages operational data on your behalf.

Three products in one


Container for monitoring data. Naming can be anything so long as it's not AWS/service such as AWS/EC2. This is used for all metric data of that service


Time ordered set of data points such as:

This is not for a specific server. This could get things from different servers

Anytime CPU Utilization is reported, the datapoint will report

Dimensions separate data points for different things or perspectives within the same metric


Has two states ok or alarm.State can send an SNS or action. Third state can be insufficient data state. Not a problem, just wait.

Shared Responsibility Model

AWS: Responsible for security OF the cloud

Customer: Responsible for security IN the cloud

High Availability (HA), Fault-Tolerance (FT), and Disaster Recover (DR)

High Availability (HA)

Fault-Tolerance (FT)

Example: A patient is waiting for a life saving surgery and is under anesthetic. While being monitored, the life support system is dosing medicine. This type of system cannot only be highly available, even a movement of interruption is deadly.

Disaster Recover (DR)

This involves:

This is designed to keep the crucial and non replaceable parts of the system in place.

Domain Name System (DNS)

DNS is a discovery service. Translates machines into humans and vice-versa. It is a huge database and has to be distributed.

Parts of the DNS system


Find the Nameserver which hosts a particular Zonefile. Query that Nameserver for a record with that Zone. It then passes the information back to the client.

DNS Root

The starting point of DNS. DNS names are read right to left with multiple parts separated by periods.

The period is assumed to be there in a browser when it's not present. The DNS Root is hosted on DNS Root Servers (13). These are hosted by 12 major companies.

Root Hints is a pointer to the DNS Root server


  1. DNS client asks DNS Resolver for IP address of a given DNS name.
  2. Using the Root Hints file, the DNS Resolver communicates with one or more of the root servers to access the root zone and begin the process of finding the IP address.

The Root Zone is organized by IANA (Internet Assigned Numbers Authority). Their job is to manage the contents of the root zone. IANA is in charge of the DNS system because they control the root zone.

DNS Hierarchy

Assuming a laptop is querying DNS directly for and using a root hints file to know how to access a root server and query the root zone.

The top level domains are the only things to the left of the DNS name.

Registry maintains the zones for a TLD (e.g .ORG) Registrar has relationships with the .org TLD zone manager allowing domain registration

Route53 Fundamentals

Register Domains

Has relationships with all major registries

Route53 Details

Zonefiles in AWS Hosted on four managed name servers

DNS Record

TTL - Time To Live

This is a numeric setting on DNS records in seconds. Allows the admin to specify how long the query can be stored at the resolver server. If you need to upgrade the records, it is smart to lower the TTL value first.

Getting the answer from an Authoritative Source is known as an Authoritative Answer.

If another client queries the same thing, they will get back a Non-Authoritative response.


IAM Identity Policies

Identity Policies are attached to AWS Identities which are IAM users, IAM groups, and IAM roles. These are a set of security statements that ALLOW or DENY access to AWS resources.

When an identity attempts to access AWS resources, that identity needs to prove who it is to AWS, a process known as Authentication. Once authenticated, that identity is known as an authenticated identity

Statement Components

Priority Level

Inline Policies and Managed Policies

IAM Users

Identity used for anything requiring long-term AWS access

If you can name a thing to use the AWS account, this is an IAM user.

When a principal wants to request to perform an action, it will authenticate against an identity within IAM. An IAM user is an identity which can be used in this way.

There are two ways to authenticate:

Once the Principal has authenticated, it becomes an authenticated identity

Amazon Resource Name (ARN)

Uniquely identify resources within any AWS accounts.

This allows you to refer to a single or group of resources. This prevents individual resources from the same account but in different regions from being confused.

ARN generally follows the same format:


An example that leads to confusion:

These two ARNs do not overlap


IAM Groups

Containers for users. You cannot login to IAM groups They have no credentials of their own. Used solely for management of IAM users.

Groups bring two benefits

  1. Effective administrative style management of users based on the team
  2. Groups can have Inline and Managed policies attached.

AWS merges all of the policies from all groups the user is in together.

Resource Policy A bucket can have a policy associated with that bucket. It does so by referencing the identity using an ARN (Amazon Reference Name). A policy on a resource can reference IAM users and IAM roles by the ARN. A bucket can give access to one or more users or one or more roles.


An S3 Resource cannot grant access to a group, it is not an identity. Groups are used to allow permissions to be assigned to IAM users.

IAM Roles

A single thing that uses an identity is an IAM User.

IAM Roles are also identities that are used by large groups of individuals. If have more than 5000 principals, it could be a candidate for an IAM Role.

IAM Roles are assumed you become that role.

This can be used short term by other identities.

IAM Users can have inline or managed policies which control which permissions the identity gets within AWS

Policies which grant, allow or deny, permissions based on their associations.

IAM Roles have two types of roles can be attached.

If an identity is allowed on the Trust Policy, it is given a set of Temporary Security Credentials. Similar to access keys except they are time limited to expire. The identity will need to renew them by reassuming the role.

Every time the Temporary Security Credentials are used, the access is checked against the Permissions Policy. If you change the policy, the permissions of the temp credentials also change.

Roles are real identities and can be referenced within resource policies.

Secure Token Service (sts:AssumeRole) this is what generates the temporary security credentials (TSC).

When to use IAM Roles

Lambda Execution Role. For a given lambda function, you cannot determine the number of principals which suggested a Role might be the ideal identity to use.

When this is run, it uses the sts:AssumeRole to generate keys to CloudWatch and S3.

It is better when possible to use an IAM Role versus attaching a policy.

Emergency or out of the usual situations

Break Glass Situation - There is a key for something the team does not normally have access to. When you break the glass, you must have a reason to do. A role can have an Emergency Role which will allow further access if its really needed.

Adding AWS into existing corp environment

You may have an existing identity provider you are trying to allow access to. This may offer SSO (Single Sign On) or over 5000 identities. This is useful to reuse your existing identities for AWS. External accounts can't be used to access AWS directly. To solve this, you allow an IAM role in the AWS account to be assumed by one of the active directories. ID Federation allowing an external service the ability to assume a role.

Making an app with 1,000,000 users

Web Identity Federation uses IAM roles to allow broader access. These allow you to use an existing web identity such as google, facebook, or twitter to grant access to the app. We can trust these web identities and allow those identities to assume an IAM role to access web resources such as DynamoDB. No AWS Credentials are stored on the application. Can scale quickly and beyond.

Cross Account Access

You can use a role in the partner account and use that to upload objects to AWS resources.

AWS Organizations

Without an organization, each AWS account needs it's own set of IAM users as well as individual payment methods. If you have more than 5 to 10 accounts, you would want to use an org.

Take a single AWS account standard AWS account and create an org. The standard AWS account then becomes the master account. The master account can invite other existing standard AWS accounts. They will need to approve their joining to the org.

When standard AWS accounts become part of the org, they become member accounts. Organizations can only have one master accounts and zero or more member accounts

Organization Root

This is a container that can hold AWS member accounts or the master account. It could also contain organizational units which can contain other units or member accounts.

Consolidated billing

The individual billing for the member accounts is removed and they pass their billing to the master account. Inside an AWS organization, you get a single monthly bill for the master account which covers all the billing for each users. Can offer a discount with consolidation of reservations and volume discounts

Create new accounts in an org

Adding accounts in an organization is easy with only an email needed. You no longer need IAM users in each accounts. You can use IAM roles to change these. It is best to have a single AWS account only used for login. Some enterprises may use an AWS account while smaller ones may use the master.

Role Switching

Allows you to switch between accounts from the command line

Service Control Policies

Can be used to restrict what member accounts in an org can do.

JSON policy document that can be attached:

The master account cannot be restricted by SCPs which means this should not be used because it is a security risk.

SCPs limit what the account, including root can do inside that account. They don't grant permissions themselves, just act as a barrier.

Allow List vs Deny List

Deny list is the default.

When you enable SCP on your org, AWS applies FullAWSAccess. This means SCPs have no effect because nothing is restricted. It has zero influence by themselves.

  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": "*",
    "Resource": "*"

SCPs by themselves don't grant permissions. When SCPs are enabled, there is an implicit deny.

You must then add any services you want to Deny such as DenyS3

  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Deny",
    "Action": "s3:*",
    "Resource": "*"

Deny List is a good default because it allows for the use of growing services offered by AWS. A lot less admin overhead.

Allow List allows you to be conscience of your costs.

  "Version": "2012-10-17",
  "Statement": [
        "Effect": "Allow",
        "Action": [
    "Resource": "*"

CloudWatch Logs

This is a public service, this can be used from AWS VPC or on premise environment.

This allows to store, monitor and access logging data.

Comes with some AWS Integrations. Security is provided with IAM roles or Service roles Can generate metrics based on logs metric filter

Architecture of CloudWatch Logs

It is a regional service us-east-1

Need logging sources such as external APIs or databases. This sends information as log events. These are stored in log streams. This is a sequence of log events from the same source.

Log Groups are containers for multiple logs streams of the same type of logging. This also stores configuration settings such as retention settings and permissions.

Once the settings are defined on a log group, they apply to all log streams in that log group. Metric filters are also applied on the log groups.

CloudTrail Essentials

Concerned with who did what.

Logs API calls or activities as CloudTrail Event

Stores the last 90 days of events in the Event History. This is enabled by default and is no additional cost.

To customize the service you need to create a new trail. Two types of events. Default only logs Management Events

CloudTrail Trail

Logs events for the AWS region it is created in. It is a regional service.

Once created, it can operate in two ways

Most services log events in the region they occur. The trail then must be a one region trail in that region or an all region trail to log that event.

A small number of services log events globally to one region. Global services such as IAM or STS or CloudFront always log their events to us-east-1

A trail must have this enabled to have this logged.

AWS services are largely split into regional services or global services.

When the services log, they log in the region they are created or to us-east-1 if they are a global service.

A trail can store events in an S3 bucket as a compressed JSON file. It can also use CloudWatch Logs to output the data.

CloudTrail products can create an organizational trail. This allows a single management point for all the APIs and management events for that org.

CloudTrail Exam PowerUp

CloudTrail Pricing


S3 Security

S3 is private by default! The only identity which has any initial access to an S3 bucket is the account root user of the account which owns that bucket.

S3 Bucket Policy

This is a resource policy

Different from an identity policy

Each bucket can only have one policy, but it can have multiple statements.

ACLs (Legacy)

A way to apply a subresource to objects and buckets. These are legacy and AWS does not recommend their use. They are inflexible and allow simple permissions.

S3 Exam PowerUp

When to use Identity Policy or Bucket Policy:



ACLs: NEVER - unless you must.

S3 Static Hosting

Normal access is via AWS APIs. This allows access via HTTP using a web browser.

When you enable static website hosting you need two HTML files:

Static website hosting creates a website endpoint.

This is influenced by the bucket name and region it is in. This cannot be changed.

You can use a custom domain for a bucket, but then the bucket name matters. The name of the bucket must match the domain.


Instead of using EC2 to host an entire website, the compute service can generate a HTML file which points to the resources hosted on a static bucket. This ensures the media is retrieved from S3 and not EC2.

Out-of-band pages

This may be an error page to display maintenance if the server goes offline. We could then change our DNS and move customers to a backup website on S3.

S3 Pricing

Object Versioning and MFA Delete

Without Versioning:


The latest or current version is always returned when an object version is not requested.

When an object is deleted, AWS puts a delete marker on the object and hides all previous versions. You could delete this marker to enable the item.

To delete an object, you must delete all the versions of that object using their version marker.

MFA Delete

Enabled within version configuration in a bucket. This means MFA is required to change bucket versioning state. MFA is required to delete versions of an object.

In order to change a version state or delete a particular version of an object, you need to provide the serial number of your MFA token as well as the code it generates. These are concatenated and passed with any API calls.

S3 Performance Optimization

Single PUT Upload

Multipart Upload

S3 Accelerated Transfer

Encryption 101

Encryption at Rest

Encryption in Transit


Symmetric Encryption

The key is handed from one entity to another before the data. This is difficult because the key needs to be transferred securely. If the data is time sensitive, the key needs to be arranged beforehand.

Asymmetric Encryption

The public key is uploaded to cloud storage. The data is encrypted and sent back to the original entity. The private key can decrypt the data.

This is secure because stolen public keys can only encrypt data. Private keys must be handled securely.


Encryption by itself does not prove who encrypted the data.

  1. An entity can encrypt a message with their private key.
  2. Their public key is hosted in an accessible location.
  3. The receiving party can use the public key to confirm who sent the message.


Encryption is obvious when used. There is no denying that the data was encrypted. Someone could force you to decrypt the data packet.

A file can be hidden in an image or other file. If it difficult to find the message unless you know what to look for.

One party would take another party's public key and encrypt some data to create ciphertext. That ciphertext can be hidden in another file so long as both parties know how the data will be hidden.

Key Management Service (KMS)

CMKs - Customer Master Keys

It is logical and contains

Data Encryption Key (DEK)

KMS does not store the DEK, once provided to a user or service, it is discarded. KMS doesn't actually perform the encryption or decryption of data using the DEK or anything past generating them.

When the DEK is generated, KMS provides two version.


  1. DEK is generated right before something is encrypted.
  2. The data is encrypted with the plaintext version of the DEK.
  3. Discard the plaintext data version of the DEK.
  4. The encrypted DEK is stored next to the ciphertext generated earlier.

KMS Key Concepts

All CMKs support key rotation.

CMK itself contains:

KMS can create an alias which is a shortcut to a particular CMK. Aliases are also per region. You can create a MyApp1 alias in all regions but they would be separate aliases, and in each region it would be pointing potentially at a different CMK.

Key Policy (resource policy)

KMS Key Demo

Linux/macOS commands

aws kms encrypt \
    --key-id alias/catrobot \
    --plaintext fileb://battleplans.txt \
    --output text \
    --query CiphertextBlob \
    --profile iamadmin-general | base64 \
    --decode > not_battleplans.enc
aws kms decrypt \
    --ciphertext-blob fileb://not_battleplans.enc \
    --output text \
    --profile iamadmin-general \
    --query Plaintext | base64 --decode > decryptedplans.txt

Object Encryption

Buckets aren't encrypted, objects are. Multiple objects in a bucket can use a different encryption methods.

Two main methods of encryption S3 is capable of supporting. Both types are encryption at rest. Data sent from a user to S3 is automatically encrypted in transit outside of these methods.

Client-Side encryption

Server-Side encryption

SSE-C (Server-side encryption with customer provided keys)

SSE-C Encryption Steps

  1. When placing an object in S3, you provide encryption key and plaintext object
  2. Once the key and object arrive, it is encrypted.
  3. A hash of the key is taken and attached to the object. The hash can identify if the specific key was used to encrypt the object.
  4. The key is then discarded after the hash is taken.
  5. The encrypted and one-way hash are stored persistently on storage.

To decrypt the object, you must tell S3 which object to decrypt and provide it with the key used to encrypt it. If the key that you supply is correct, the proper hash, S3 will decrypt the object, discard the key, and return the plaintext version of the object.

SSE-S3 AES256 (Server-side encryption w/ Amazon S3 managed keys)

AWS handles both the encryption and decryption process as well as the key generation and management. This provides very little control over how the keys are used, but has little admin overhead.

SSE-S3 Encryption Steps

  1. When putting data into S3, only need to provide plaintext.
  2. S3 generates fully managed and rotated master key automatically.
  3. Object generates a key specific for each object that is uploaded.
  4. The master key is used to encrypt the specific object key, and the unencrypted version of that key is discarded.
  5. The encrypted file and encrypted key are stored side by side in S3.

Three Problems with this method:

SSE-KMS (Server-side encryption w/ customer master keys stored in AWS KMS)

Much like SSE-S3, where AWS handles both the keys and encryption process. KMS handles the master key and not S3. The first time an object is uploaded, S3 works with KMS to create an AWS managed CMK. This is the default key which gets used in the future.

Every time an object is uploaded, S3 uses a dedicated key to encrypt that object and that key is a data encryption key which KMS generates using the CMK. The CMK does not need to be managed by AWS and can be a customer managed CMK.

SSE-KMS Encryption Steps

  1. S3 is provided a plaintext version of the data encryption key as well as an encrypted version.
  2. The data is encrypted with the plaintext key and the key discarded.
  3. The encrypted key is stored alongside the encrypted object.

When uploading an object, you can create and use a customer managed CMK. This allows the user to control the permissions and the usage of the key material. In regulated industries, this is reason enough to use SSE-KMS You can also add logging and see any calls against this key from CloudTrail.

The best benefit is the role separation. To decrypt any object, you need access to the CMK that was used to generate the unique key that encrypted them. The CMK is used to decrypt the data encryption key for that object. That decrypted data encryption key is used to decrypt the object itself. If you don't have access to KMS, you don't have access to the object.

S3 Object Storage Classes

Picking a storage class can be done while uploading a specific object. The default is S3 standard. Once an object is uploaded to a specific class, it can be easily changed as long as some conditions are met.

Objects in S3 are stored in a specific region.

S3 Standard

All of the other storage classes trade some of these compromises for another.

S3 Standard-IA

Designed for data that isn't accessed often, long term storage, backups, disaster recovery files. The requirement for data to be safe is most important.

One Zone-IA

Great choice for secondary copies of primary data or backup copies.

If data is easily creatable from a primary data set, this would be a great place to store the output from another data set.

S3 Glacier

Retrieval methods:

S3 Glacier Deep Archive

S3 Intelligent-Tiering

This is good for objects that are unknown their access pattern.

Object Lifecycle Management

Intelligent-Tiering is used for objects where access patterns is unknown. A lifecycle configuration is a set of rules that consists of actions.

Transition Actions

Change the storage class over time such as:

Objects must flow downwards, they can't flow in the reverse direction.

Expiration Actions

Once an object has been uploaded and changed, you can purge older versions after 90 days to keep costs down.

S3 Replication

There are two types of S3 replication available.

Architecture for both is similar, only difference is if both buckets are in the same account or different accounts.

The replication configuration is applied to the source bucket and configures S3 to replicate from this source bucket to a destination bucket. It also configures the IAM role to use for the replication process. The role is configured to allow the S3 service to assume it based on its trust policy. The role's permission policy allows it to read objects on the source bucket and replicate them to the destination bucket.

When different accounts are used, the role is not by default trusted by the destination account. If configuring between accounts, you must add a bucket policy on the destination account to allow the IAM role from the source account access to the bucket.

S3 Replication Options

Important Replication Tips

Why use replication

SRR - Log Aggregation SRR - Sync production and test accounts SRR - Resilience with strict sovereignty requirements CRR - Global resilience improvements CRR - Latency reduction

S3 Presigned URL

A way to give another person or application access to a object inside an S3 bucket using your credentials in a safe way.

IAM admin can make a request to S3 to generate a presigned URL by providing:

S3 will create a presigned URL and return it. This URL will have encoded inside it the details that IAM admin provided. It will be configured to expire at a certain date and time as requested by the IAM admin user.

S3 Presigned URL Exam PowerUp

S3 Select and Glacier Select

This provides a ways to retrieve parts of objects and not the entire object.

If you retrieve a 5TB object, it takes time and consumes 5TB of data. Filtering at the client side doesn't reduce this cost.

S3 and Glacier select lets you use SQL-like statements to select part of the object which is returned in a filtered way. The filtering happens at the S3 service itself saving time and data.


Networking Refresher

IPv4 - RFC 791 (1981)

Dotted decimal notation for human readability.

There are just over 4 billion addresses. This was not very flexible because it was either too small or large for some corporations. Some IP addresses was always left unused.

Classful Addressing

Internet / Private IPs - RFC1918

These can't communicate over the internet and are used internally only

Classless inter-domain routing (CIDR)

CIDR networks are represented by the starting IP address of the network called the network address and the prefix.

CIDR Example:

IP address notations to remember is the equivalent of 1234 as a password. You should consider other ranges that people might use to ensure it does not overlap.



TCP and UDP are protocols built on top of IP.

TCP/UDP Segment has a source and destination port number. This allows devices to have multiple conversations at the same time. In AWS when data goes through network devices, filters can be set based on IP addresses and port numbers.

IPv6 - RFC 8200 (2017)


The value is hex and there are two octets per spacing or one hextet. The redundant zeros can be removed to create:


or you can remove them all entirely once per address


Each address is 128 bits long. They are addressed by the start of the network and the prefix. Since each grouping is 16 values, we can multiple the groups by this to achieve the prefix.

2001:0db8:28ac::/48 really means the network starts at 2001:0db8:28ac:0000:0000:0000:0000:0000 and finishes at 2001:0db8:28ac:ffff:ffff:ffff:ffff:ffff

::/0 represents all IPv6 addresses

VPC Sizing and Structure

VPC Consideration

Reserve 2+ network ranges per region being used per account. Think of the highest region you will operate in and add extra as a buffer.

An example using 4 AWS accounts.

How to size VPC

A subnet is located in one availability zone. Try to split each subnet into tiers (web, application, db, spare). Since each Region has at least 3 AZ's, it is a good practice to start splitting the network into 4 different AZs. This allows for at least one subnet in each AZ, and one spare. Taking a /16 subnet and splitting it 16 ways will make each a /20.

Custom VPC

Custom VPC Facts

IPv4 private and public IPs

Single assigned IPv6 /56 CIDR block

DNS provided by R53

Available on the base IP address of the VPC + 2. If the VPC is then the DNS IP will be

Two options that manage how DNS works in a VPC:

VPC Subnets

Reserved IP addresses

There are five IP addresses within every VPC subnet that you cannot use. Whatever size of the subnet, the IP addresses are five less than you expect.

If using ( -

DHCP Options Set

This is how computing devices receive IP addresses automatically. There is one options set applied to a VPC at one time and this configuration flows through to subnets.

IP allocation Options

VPC Routing and Internet Gateway

VPC Router is a highly available device available in every VPC which moves traffic from somewhere to somewhere else. Router has a network interface in every subnet in the VPC. Routes traffic between subnets.

Route tables defines what the VPC router will do with traffic when data leaves that subnet. A VPC is created with a main route table. If you don't associate a custom route table with a subnet, it uses the main route table of the VPC.

If you do associate a custom route table you create with a subnet, then the main route table is disassociated. A subnet can only have one route table associated at a time, but a route table can be associated by many subnets.

Route Tables

When traffic leaves the subnet that this route table is associated with, the VPC router reviews the IP packets looking for the destination address. The traffic will try to match the route against the route table. If there are more than one routes found as a match, the prefix is used as a priority. The higher the prefix, the more specific the route, thus higher priority. If the target says local, that means the destination is in the VPC itself. Local route can never be updated, they're always present and the local route always takes priority. This is the exception to the prefix rule.

Internet Gateway

A managed service that allows gateway traffic between the VPC and the internet or AWS Public Zones (S3, SQS, SNS, etc.)

Using IGW

In this example, an EC2 instance has:

The public address is not public and connected to the EC2 instance itself. Instead, the IGW creates a record that links the instance's private IP to the public IP. This is why when an EC2 instance is created it only sees the private IP address. This is IMPORTANT. For IPv4 it is not configured in the OS with the public address.

When the linux instance wants to communicate with the linux update service, it makes a packet of data. The packet has a source address of the EC2 instance and a destination address of the linux update server. At this point the packet is not configured with any public addressing and could not reach the linux update server.

The packet arrives at the internet gateway.

The IGW sees this is from the EC2 instance and analyzes the source IP address. It changes the packet source IP address from the linux EC2 server and puts on the public IP address that is routed from that instance. The IGW then pushes that packet on the public internet.

On the return, the inverse happens. As far as it is concerned, it does not know about the private address and instead uses the instance's public IP address.

If the instance uses an IPv6 address, that public address is good to go. The IGW does not translate the packet and only pushes it to a gateway.

Bastion Host / Jumpbox

It is an instance in a public subnet inside a VPC. These are used to allow incoming management connections. Once connected, you can then go on to access internal only VPC resources. Used as a management point or as an entry point for a private only VPC.

This is an inbound management point. Can be configured to only allow specific IP addresses or to authenticate with SSH. It can also integrate with your on premise identification service.

Network Access Control List (NACL)

Network Access Control Lists (NACLs) are a type of security filter (like firewalls) which can filter traffic as it enters or leaves a subnet.

All VPCs have a default NACL, this is associated with all subnets of that VPC by default. NACLs are used when traffic enters or leaves a subnet. Since they are attached to a subnet and not a resource, they only filter data as it crosses in or out. If two EC2 instances in a VPC communicate, the NACL does nothing because it is not involved.

NACLs have an inbound and outbound sets of rules.

When a specific rule set has been called, the one with the lowest rule number first. As soon as one rule is matched, the processing stops for that particular piece of traffic.

The action can be for the traffic to allow or deny the traffic.

Each rule has the following fields related to traffic


If all of those fields match, then the first rule will either allow or deny.

The rule at the bottom with * is the implicit deny This cannot be edited and is defaulted on each rule list. If no other rules match the traffic being evaluated, it will be denied.

NACLs example below

NACL Exam PowerUp

NACLs are processed in order starting at the lowest rule number until it gets to the catch all. A rule with a lower rule number will be processed before another rule with a higher rule number.

Security Groups


Network Address Translation (NAT) Gateway

Set of different processes that can address IP packets by changing their source or destination addresses.

IP masquerading, hides CIDR block behind one IP. This allows many IPv4 addresses to use one public IP for outgoing internet access. Incoming connections don't work. Outgoing connections can get a response returned.

NATGW cannot do port forwarding or be a bastion server. In that case it might be necessary to run a NAT EC2 instance instead.


EC2 provides Infrastructure as a Service (IaaS Product)

Virtualization 101

Servers are configured in three sections without virtualization.

Emulated Virtualization - Software Virtualization

Host OS operated on the HW and included a hypervisor (HV). SW ran in privileged mode and had full access to the HW. Guest OS wrapped in a VM and had devices mapped into their OS to emulate real HW. Drivers such as graphics cards were all SW emulated to allow the process to run properly.

The guest OS still believed they were running on real HW and tried to take control of the HW. The areas were not real and only allocated space to them for the moment.

The HV performs binary translation. System calls are intercepted and translated in SW on the way. The guest OS needs no modification, but slows down a lot.


Guest OS are modified and run in HV containers, except they do not use slow binary translation. The OS is modified to change the system calls to user calls. Instead of calling on the HW, they call on the HV using hypercalls. Areas of the OS call the HV instead of the HW.

Hardware Assisted Virtualization

The physical HW itself is virtualization aware. The CPU has specific functions so the HV can come in and support. When guest OS tries to run privileged instructions, they are trapped by the CPU and do not halt the process. They are redirected to the HV from the HW.

What matters for a VM is the input and output operations such as network transfer and disk IO. The problem is multiple OS try to access the same piece of hardware but they get caught up on sharing.

SR-IOV (Singe Route IO virtualization)

Allows a network or any card to present itself as many mini cards. As far as the HW is concerned, they are real dedicated cards for their use. No translation needs to be done by the HV. The physical card handles it all. In EC2 this feature is called enhanced networking.

EC2 Architecture and Resilience

EC2 instances are virtual machines run on EC2 hosts.


EC2 host contains

EC2 Networking (ENI)

When instances are provisioned within a specific subnet within a VPC A primary elastic network interface is provisioned in a subnet which maps to the physical hardware on the EC2 host. Subnets are also within one specific AZ. Instances can have multiple network interfaces, even within different subnets so long as they're within the same AZ.

An instance runs on a specific host. If you restart the instance it will stay on that host until either:

The instance will be relocated to another host in the same AZ. Instances cannot move to different AZs. Everything about their hardware is locked within one specific AZ. A migration is taking a copy of an instance and moving it to a different AZ.

In general instances of the same type and generation will occupy the same host. The only difference will generally be their size.

EC2 Strengths

Long running compute needs. Many other AWS services have run time limits.

Server style applications

EC2 Instance Types

Naming Scheme

R5dn.8xlarge - whole thing is the instance type. When in doubt give the full instance type

Storage Refresher

Three types of storage

Storage Performance

Block Size * IOPS = Throughput

This isn't the only part of the chain, but it is a simplification. A system might have a throughput cap. The IOPS might decrease as the block size increases.

Elastic Block Store (EBS)

General Purpose SSD (gp2)

Uses a performance bucket architecture based on the IOPS it can deliver. The GP2 starts with 5,400,000 IOPS allocated. It is all available instantly.

You can consume the capacity quickly or slowly over the life of the volume. The capacity is filled back based upon the volume size. Min of 100 IOPS added back to the bucket per second.

Above that, there are 3 IOPS/GiB of volume size. The max is 16,000 IOPS. This is the baseline performance

Default for boot volumes and should be the default for data volumes. Can only be attached to one EC2 instance at a time.

Provisioned IOPS SSD (io1)

You pay for capacity and the IOPs set on the volume. This is good if your volume size is small but need a lot of IOPS.

50:1 IOPS to GiB Ratio 64,000 is the max IOPS per volume assuming 16 KiB I/O.

Good for latency sensitive workloads such as mongoDB. Multi-attach allows them to attach to multiple EC2 instances at once.

HDD Volume Types

Two types

EBS Exam Power Up

EC2 Instance Store

Each instance has a collection of volumes that are locked to that specific host. If the instance moves, the data doesn't.

Instances can move between hosts for many reasons:

The number, size, and performance of instance store volumes vary based on the type of instance used. Some instances do not have any instance store volumes at all.

Instance Store Exam PowerUp

EBS vs Instance Store

If the read/write can be handled by EBS, that should be default.

When to use EBS

When to use Instance Store

EBS Snapshots, restore, and fast snapshot restore

Snapshots are incremental volume copies to S3. The first is a full copy of data on the volume. This can take some time. EBS won't be impacted, but will take time in the background. Future snaps are incremental, consume less space and are quicker to perform.

If you delete an incremental snapshot, it moves data to ensure subsequent snapshots will work properly.

Volumes can be created (restored) from snapshots. Snapshots can be used to move EBS volumes between AZs. Snapshots can be used to migrate data between volumes.

Snapshot and volume performance

Fast Snapshot Restore (FSR) allows for immediate restoration. You can create 50 of these FSRs per region. When you enable it on a snapshot, you pick the snapshot specifically and the AZ that you want to be able to do instant restores to. Each combination of Snapshot and AZ counts as one FSR set. You can have 50 FSR sets per region. FSR is not free and can get expensive with lost of different snapshots.

Snapshot Consumption and Billing

Billed using a GB/month metric. 20 GB stored for half a month, represents 10 GB-month.

This is used data, not allocated data. If you have a 40 GB volume but only use 10 GB, you will only be charged for the allocated data. This is not how EBS itself works.

The data is incrementally stored which means doing a snapshot every 5 minutes will not necessarily increase the charge as opposed to doing one every hour.

EBS Encryption

Provides at rest encryption for block volumes and snapshots.

When you don't have EBS encryption, the volume is not encrypted. The physical hardware itself may be performing at rest encryption, but that is a separate thing.

When you set up an EBS volume initially, EBS uses KMS and a customer master key. This can be the EBS default (CMK) which is referred to as aws/ebs or it could be a customer managed CMK which you manage yourself.

That key is used by EBS when an encrypted volume is created. The CMK generates an encrypted data encryption key which is stored on the volume with the physical disk. This key can only be encrypted by KMS when a role with the proper permissions makes the request.

When the volume is first used, EBS asks CMS to decrypt the key and stores the decrypted key in memory on the EC2 host while it's being used. At all other times it's stored on the volume in encrypted form.

When the EC2 instance is using the encrypted volume, it can use the decrypted data encryption key to move data on and off the volume. It is used for all cryptographic operations when data is being used to and from the volume.

When data is stored at rest, it is stored as ciphertext.

If the EBS volume is ever moved, the key is discarded.

If a snapshot is made of an encrypted EBS volume, the same data encryption key is used for that snapshot. Anything made from this snapshot is also encrypted in the same way.

Every time you create a new EBS volume from scratch, it creates a new data encryption key.

EBS Encryption Exam Power Up

EC2 Network Interfaces, Instance IPs and DNS

An EC2 instance starts with at least one ENI - elastic network interface. An instance may have ENIs in separate subnets, but everything must be within one AZ.

When you launch an instance with Security Groups, they are on the network interface and not the instance.

Elastic Network Interface (ENI)

Has these properties

Secondary interfaces function in all the same ways as primary interfaces except you can detach interfaces and move them to other EC2 instances.

ENI Exam PowerUp

Public DNS for a given instance will resolve to the primary private IP address in a VPC. If you have instance to instance communication within the VPC, it will never leave the VPC. It does not need to touch the internet gateway.

Amazon Machine Image (AMI)

Images of EC2 instances that can launch more EC2 instance.

AMI Lifecycle

  1. Launch: EBS volumes are attached to EC2 devices using block IDs.

    • BOOT /dev/xvda
    • DATA /dev/xvdf
  2. Configure: customize the instance from applications or volume sizes.

  3. Create Image or AMI

    • AMI contains:
      • Permissions: who can use it, is it public or private
      • EBS snapshots are created from attached EBS volumes
      • Snapshots are referenced inside the AMI using block device mapping.
      • Table of data that links the snapshot IDs that you've just created when making that AMI and it has for each one of those snapshots, a device ID that the original volumes had on the EC2 instance.
  4. Launch: When launching an instance, the snapshots are used to create new EBS volumes in the AZ of the EC2 instance and contain the same block device mapping.

AMI Exam PowerUps

EC2 Pricing Models

On-Demand Instances

Spot Instances

Up to 90% off on-demand, but depends on the spare capacity. You can set a maximum hourly rate in a certain AZ in a certain region. If the max price you set is above the spot price, you pay only that spot price for the duration that you consume that instance. As the spot price increases, you pay more. Once this price increases past your maximum, it will terminate the instance. Great for data analytics when the process can occur later at a lower use time.

Reserved Instance

Up to 75% off on-demand. The trade off is commitment. You're buying capacity in advance for 1 or 3 years. Flexibility on how to pay

Best discounts are for 3 years all up front. Reserved in region, or AZ with capacity reservation. Reserved instances takes priority for AZ capacity. Can perform scheduled reservation when you can commit to specific time windows.

Great if you have a known stead state usage, email usage, domain server. Cheapest option with no tolerance for disruption.

Instance Status Checks and Autorecovery

Every instance has two high level status checks

Autorecovery can kick in and help,

Horizontal and Vertical Scaling

Vertical Scaling

As customer load increases, the server may need to grow to handle more data. The server can increase in capacity, but this will require a reboot.

Horizontal Scaling

As the customer load increases, this adds additional capacity. Instead of one running copy of an application, you can have multiple versions running on each server. This requires a load balancer. When customers try to access an application, the load balancer ensures the servers get equal parts of the load.

Instance Metadata

EC2 service provides data to instances Accessible inside all instances


Meta-data contains information on the environment the instance is in. You can find out about the networking or user-data among other things. This is not authenticated or encrypted. Anyone who can gain access to the instance can see the meta-data. This can be restricted by local firewall


Intro to Containers

Virtualization Problems

Using an EC2 virtual machine with Nitro Hypervisor, 4 GB ram, and 40 GB disk, the OS can consume 60-70% of the disk and much of the available memory. Containers leverage the similarities of multiple guest OS by removing duplicate resources. This allows applications to run in their own isolated environments.

Image Anatomy

A Docker image is composed of multiple layers and not a monolithic disk image. Each line of a Docker image creates a new filesystem layer on top of the previous. Images are created from scratch or a base image. Images contain read only layers, images are layer onto images.

Docker container is the same as a Docker image, except it has an additional READ/WRITE layer of the container.

If you have lots of containers with very similar base structures, they will share the parts that overlap. The other layers are reused between containers.

Container Registry

Registry or hub of container images. Dockerfile can create a container image where it gets stored in the container registry.

Docker hosts can run many containers based on one or more images. A single image can generate Containers on many different Docker hosts.

Container Key Concepts

Elastic Container Service (ECS) Concepts

ECS Service is configured via Service Definition and represents how many copies of a task you want to run for scaling and HA.

ECS Cluster Types

ECS Cluster manages:

EC2 mode

ECS cluster is created within a VPC. It benefits from the multiple AZs that are within that VPC. You specify an initial size which will drive an auto scaling group.

ECS using EC2 mode is not a serverless solution, you need to worry about capacity for your cluster.

The container instances are not delivered as a managed service, they are managed as normal EC2 instances. You can use spot pricing or prepaid EC2 servers.

Fargate mode

Removes more of the management overhead from ECS, no need to manage EC2.

Fargate shared infrastructure allows all customers to access from the same pool of resources.

Fargate deployment still uses a cluster with a VPC where AZs are specified.

For ECS tasks, they are injected into the VPC. Each task is given an elastic network interface which has an IP address within the VPC. They then run like a VPC resource.

You only pay for the container resources you use.

EC2 vs ECS(EC2) vs Fargate

If you already are using containers, use ECS.

EC2 mode is good for a large workload if you are price conscious. This allows for spot pricing and prepayment.

Fargate is great if you,


Bootstrapping EC2 using User Data

Bootstrapping is a process where scripts or other config steps can be run when an instance is first launched. This allows an instance to be brought to service in a particular configured state.

In systems automation, bootstrapping allows the system to self configure. In AWS this is EC2 Build Automation.

This could perform some software installs and post install configs.

Bootstrapping is done using user data and is injected into the instance in the same way that meta-data is. It is accessed using the meta-data IP.

Anything you pass in is executed by the instance OS only once on launch!

EC2 doesn't validate the user data. You can tell EC2 to pass in trash data and the data will be injected. The OS needs to understand the user data.

Bootstrapping Architecture

An AMI is used to launch an EC2 instance in the usual way to create an EBS volume that is attached to the EC2 instance. This is based on the block mapping inside the AMI.

Now the EC2 service provides some user data through to the EC2 instance. There is SW within the OS designed to look at the metadata IP for any user data. If it sees any user data, it executes this on launch of that instance.

This is treated like any other script the OS runs. At the end of running the script, the instance will be in:

User Data Key Points

EC2 doesn't know what the user data contains, it's just a block of data. The user data is not secure, anyone can see what gets passed in. For this reason it is important not to pass passwords or long term credentials.

User data is limited to 16 KB in size. Anything larger than this will need to pass a script to download the larger set of data.

User data can be modified if you stop the instance, change the user data, then restart the instance. This won't be executed since the instance has already started.


How quickly after you launch an instance is it ready for service? This includes the time for EC2 to configure the instance and any software downloads that are needed for the user. When looking at an AMI, this can be measured in minutes.

AMI baking will front load the time needed by configuring as much as possible.


cfn-init is a helper script installed on EC2 OS. This is a simple configuration management system.

This is executed as any other command by being passed into the instance as part of the user data and retrieves its directives from the CloudFormation stack and you define this data in the CloudFormation template called AWS::CloudFormation::Init.

cfn-init explained

Starts off with a CloudFormation template. This has a logical resource within it which is to create an EC2 instance. This has a specific section called Metadata. This then passes in the information passed in as UserData. cfn-init gets variables passed into the user data by CloudFormation.

It knows the desired state and can work towards a final configuration. This can monitor the user data and change things as the EC2 data changes.

CreationPolicy and Signals

If you pass in user data, there is no way for CloudFormation to know if the EC2 instance was provisioned properly. It may be marked as complete, but the instance could be broken.

A CreationPolicy is something which is added to a logical resource inside a CloudFormation template. You create it and supply a timeout value.

This waits for a signal from the resource itself before moving to a create complete state.

EC2 Instance Roles

IAM roles are the best practice ways for services to be granted permissions. EC2 instance roles are roles that an instance can assume and anything running in that instance has the permissions that role grants.

Starts with an IAM role with a permissions policy. EC2 instance role allows the EC2 service to assume that role.

The instance profile is the item that allows the permissions to get inside the instance. When you create an instance role in the console, an instance profile is created with the same name.

When IAM roles are assumed, you are provided temporary roles based on the permission assigned to that role. These credentials are passed through instance meta-data.

EC2 and the secure token service ensure the credentials never expire.

Key facts

AWS System Manager Parameter Store

Passing secrets into an EC2 instance is bad practice because anyone who has access to the meta-data has access to the secrets.

Parameter store allows for storage of configuration and secrets

Parameter Store:

System and Application Logging on EC2

CloudWatch and CloudWatch Logs cannot natively capture data inside an instance.

CloudWatch Agent is required for OS visible data. It sends this data into CW For CW to function, it needs configuration and permissions in addition to having the CW agent installed. The CW agent needs to know what information to inject into CW and CW Logs.

The agent also needs some permissions to interact with AWS. This is done with an IAM role as best practice. The IAM role has permissions to interact with CW logs. The IAM role is attached to the instance which provides the instance and anything running on the instance, permissions to manage CW logs.

The data requested is then injected in CW logs. There is one log group for each individual log we want to capture. There is one log stream for each group for each instance that needs management.

We can use parameter store to store the configuration for the CW agent.

EC2 Placement Groups

Cluster Placement

Pack instances close together

Achieves the highest level of performance available with EC2.

Best practice is to launch all of the instances within that group at the same time. If you launch with 9 instances and AWS places you in a place with capacity for 12, you are now limited in how many you can add.

Cluster placements need to be part of the same AZ. Cluster placement groups are generally the same rack, but they can even be the same EC2 host.

All members have direct connections to each other. They can achieve 10 Gbps single stream vs 5 Gbps normally. They also have the lowest latency and max PPS possible in AWS.

If the hardware fails, the entire cluster will fail.

Cluster Placement Exam PowerUp

Spread Placement

Keep instances separated

This provides the best resilience and availability. Spread groups can span multiple AZs. Information will be put on distinct racks with their own network or power supply. There is a limit of 7 instances per AZ. The more AZs in a region, the more instances inside a spread placement group.

Spread Placement Exam PowerUp

Use case: small number of critical instances that need to be kept separated from each other. Several mirrors of an application

Partition Placement

Groups of instances spread apart

If a problem occurs with one rack's networking or power, it will at most take out one instance.

The main difference is you can launch as many instances in each partition as you desire.

When you launch a partition group, you can allow AWS decide or you can specifically decide.

Partition Placement Exam PowerUp

EC2 Dedicated Hosts

EC2 host allocated to you in its entirety. Pay for the host itself which is designed for a family of instances. There are no instance charges. You can pay for a host on-demand or reservation with 1 or 3 year terms.

The host hardware has physical sockets and cores. This dictates how many instances can be run on the HW.

Hosts are designed for a specific size and family. If you purchase one host, you configure what type of instances you want to run on it. With the older VM system you cannot mix and match. The new Nitro system allows for mixing and matching host size.

Dedicated Hosts Limitations

Enhanced Networking

Enhanced networking uses SR-IOV. The physical network interface is aware of the virtualization. Each instance is given exclusive access to one part of a physical network interface card.

There is no charge for this and is available on most EC2 types. It allows for higher IO and lower host CPU usage This provides more bandwidth and higher packet per seconds. In general this provides lower latency.

EBS Optimized

Historically network on EC2 was shared with the same network stack used for both data networking and EBS storage networking.

EBS optimized instance means that some stack optimizations have taken place and dedicated capacity has been provided for that instance for EBS usage.

Most new instances support this and have this enabled by default for no charge.


Public Hosted Zones

A hosted zone is a DNS database for a given section of global DNS data. A public hosted zone is a type of R53 hosted zone which is hosted on R53 provided public DNS name servers. When creating a hosted zone, AWS provides at least 4 DNS name servers which host the zone.

This is globally resilient service due to multiple DNS servers.

Hosted zones are created automatically when you register a domain using R53.

Hosted zones can be created separately. If you want to register a domain elsewhere and use R53 to host the zone file and records for that domain, then you can specifically create a hosted zone and point at an externally registered domain at that zone. There is a monthly fee to host each hosted zone within R53 and a fee for any queries made to that service.

Hosted Zones are what the DNS system references via delegation and name server records. A hosted zone, when referenced in this way by the DNS system, is known as being authoritative for a domain. It becomes the single source of truth for a domain.

Route 53 Health Checks

Route checks will allow for periodic health checks on the servers. If one of the servers has a bug, this will be removed from the list.

If the bug gets fixed, the health check will pass and the server will be added back into a healthy state.

Health checks are separate from, but are used by records inside R53. You don't create health checks inside records themselves.

These are performed by a fleet of global health checkers. If you think they are bots and block them, this could cause alarms.

Checks occur every 30 seconds by default. This can be increased to 10 seconds for additional costs. These checks are per health checker. Since there are many you will automatically get one every few seconds. The 10 second option will complete multiple checks per second.

There could be one of three checks

It will be deemed healthy or unhealthy.

There are three types of checks.

Route 53 Routing Policies Examples


Database Refresher

Systems to store and manage data.

Relational (SQL)

Every row in a table must have a value for the primary key. There must be a value stored for every attribute in the table.

SQL systems are relational so we generally define relationships between tables as well. This is defined with a join table. A join table has a composite key which is a key formed of two parts. Composite keys together must be unique.

Keys in different tables are how the relationships between the tables are defined.

The Table schema and relationships must be defined in advance which can be hard to do.

Non-Relational (NoSQL)

Not a single thing, and is a catch all for everything else. There is generally no schema or a weak one.

Key-Value databases

This is just a list of keys and value pairs. So long as every key is unique, there is no real schema or structure needed. These are really fast and highly scalable. This is also used for in memory caching.

Wide Column Store

DynamoDB is an example of wide column store database.

Each row or item has one or more keys. One key is called the partition key. You can have additional keys other than the partition key called the sort or range key.

It can be single key (only partition key) or composite key (partition key and sort key).

Every item in a table can also have attributes, but they don't have to be the same between values. The only requirements is that every item inside the table has to use the same key structure and it has to have a unique key.


Documents are generally formatted using JSON or XML.

This is an extension of a key-value store where each document is interacted with via an ID that's unique to that document, but the value of the document contents are exposed to the database allowing you to interact with it.

Good for order databases, or collections, or contact stale databases.

Great for nested data items within a document structure such as user profiles.

Row Database (MySQL)

Often called OLTP (Online Transactional Processing Databases).

If you needed to read the price of one item you need that row first. If you wanted to query all of the sizes of every order, you will need to check for each row.

Great for things which deal in rows and items where they are constantly accessed, modified, and removed.

Column Database (Redshift)

Instead of storing data in rows on disk, they store it based on columns. The data is the same, but it's grouped together on disk, based on column so every order value is stored together, every product item, color, size, and price are all grouped together.

This is bad for transactional style processing, but great for reporting or when all values for a specific size are required.


Relationships between things are formally defined and stored along in the database itself with the data. They are not calculated each and every time you run a query. These are great for relationship driven data.

Nodes are objects inside a graph database. They can have properties.

Edges are relationships between the nodes. They have a direction.

Relationships themselves can also have attached data, so name value pairs. We might want to store the start date of any employment relationship.

Can store massive amounts of complex relationships between data or between nodes in a database.

Databases on EC2

It is always a bad idea to do this.

Reasons EC2 Database might make sense

Reasons why you really shouldn't run a database on EC2

Relational Database Service (RDS)

Amazon Aurora. This is so different from normal RDS, it is a separate product.

RDS Database Instance

Runs one of a few types of database engines and can contain multiple user created databases. Create one when you provision the instance, but multiple ones can be created after.

When you create a database instance, the way you access it is using a database host-name, a CNAME, and this resolves to the database instance itself.

RDS uses standard database engines so you can access an RDS instance using the same tooling as if you were accessing a self-managed database.

The database can be optimized for:

db.m5 general db.r5 memory db.t3 burst

There is an associated size and AZ selected.

When you provision an instance, you provision dedicated storage to that instance. This is EBS storage located in the same AZ. RDS is vulnerable to failures in that AZ.

The storage can be allocated with SSD or magnetic.

io1 - lots of IOPS and consistent low latency gp2 - same burst pool architecture as it does on EC2, used by default magnetic - compatibility mostly for long term historic uses

Billing is per instance and hourly rate for that compute. You are billed for storage allocated.

RDS Multi AZ (High-Availability)

This is an option that you can enable on RDS instances. Secondary hardware is allocated inside another AZ. This is referred to as the standby replica or standby replica instance. The standby replica has its own storage in the same AZ as it's located.

RDS enables synchronous replication from the primary instance to the standby replica.

RDS Access ONLY via database CNAME. The CNAME will point at the primary instance. You cannot access the standby replica for any reason via RDS.

The standby replica cannot be used for extra capacity.

Synchronous Replication means:

  1. Database writes happen.
  2. Primary database instance commits changes.
  3. Same time as the write is happening, standby replication is happening.
  4. Standby replica commits writes.

If any error occurs with the primary database, AWS detects this and will failover within 60 to 120 seconds to change to the new database.

This does not provide fault tolerance as there will be some impact during change.

RDS Exam PowerUp

RDS Backup and Restores

RPO - Recovery Point Objective

RTO - Recovery Time Objective

RDS Backups

First snap is full copy of the data used on the RDS volume. From then on, the snapshots are incremental and only store the change in data.

When any snapshot occurs, there's a brief interruption to the flow of data between the compute resource and the storage. If you are using single AZ, this can impact your application. If you are using Multi-AZ, the snapshot occurs on the standby replica.

Manual snapshots don't expire, you have to clean them yourself. Automatic Snapshots can be configured to make things easier.

In addition to automated backup, every 5 minutes database transaction logs are saved to S3. Transaction logs store the actual data which changes inside a database so the actual operations that are executed. This allows a database to be restored to a point in time often with 5 minute granularity.

Automatic cleanups can be anywhere from 0 to 35 days. This means you can restore to any point in that time frame. This will use both the snapshots and the translation logs.

When you delete the database, they can be retained but they will expire based on their retention period.

The only way to maintain backups is to create a final snapshot which will not expire automatically.

RDS Backup Exam PowerUp

RDS Read-Replicas

Kept in sync using asynchronous replication

It is written fully to the primary and standby instance first. Once its stored on disk, it is then pushed to the replica. This means there could be a small lag. These can be created in the same region or a different region. This is known as cross region replication. AWS handles all of the encryption, configuration, and networking without intervention.

Why do these matter

READ performance

Read Replicas provide near 0 RPO

Amazon Aurora

Aurora architecture is VERY different from RDS.

It uses a cluster which is:

Aurora cluster functions across a number of availability zones.

There is a primary instance and a number of replicas. The read applications from applications can use the replicas.

There is a shared storage of max 64 TiB across all replicas. This uses 6 copies across AZs.

All instances have access to these storage nodes. This replication happens at the storage level. No extra resources are consumed during replication.

By default the primary instance is the only one who can write. The replicas will have read access.

Aurora automatically detect hardware failures on the shared storage. If there is a failure, it immediately repairs that area of disk and recreates that data with no corruption.

With Aurora you can have up to 15 replicas and any of them can be a failover target. The failover operation will be quicker because it doesn't have to make any storage modifications.

Aurora Endpoints

Aurora clusters like RDS use endpoints, so these are DNS addresses which are used to connect to the cluster. Unlike RDS, Aurora clusters have multiple endpoints that are available for an application.

Minimum endpoints


Aurora Restore, Clone and Backtrack

Backups in Aurora work in the same way as RDS. Restores create a brand new cluster.

Backtrack must be enabled on a per cluster basis. This allows you to roll back your data base to a previous point in time. This helps for data corruption.

You can adjust the window backtrack will work for.

Fast clones make a new database much faster than copying all the data. It references the original storage and only write the differences between the two. It uses a tiny amount of storage and only stores data that's changed in the clone or changed in the original after you make the clone.

Aurora Serverless

Provides a version of Aurora database product without managing the resources. You still create a cluster, but it uses ACUs or Aurora Capacity Units.

For a cluster, you can set a min and max ACU based on the load and can even go down to 0 to be paused. In this case you would only be billed for storage consumed.

Billing is based on resources used on a per-second basis.

Same resilience as Aurora (6 copies across AZs).

ACUs are stateless and shared across many AWS customers and have no local storage. They can be allocated to your Aurora Serverless cluster rapidly when required. Once ACUs are allocated to a cluster, they have access to cluster storage in the same way as an Aurora Provisioned cluster.

There is a shared proxy fleet. When a customer interacts with the data they are actually communicating with the proxy fleet. The proxy fleet brokers an application with the ACU and ensures you can scale in and out without worrying about usage. This is managed by AWS on your behalf.

Aurora Serverless - Use Cases

Aurora Global Database

Introduces the idea of secondary regions with up to 16 read only replicas. Replication from primary region to secondary regions happens at the storage layer and typically occurs within one second.

Aurora Multi-Master Writes

Allows an aurora cluster to have multiple instances capable of reads and writes.

Single-master Mode

Aurora Multi-master has no endpoint or load balancing. An application can connect with one or both of the instances inside a multi-master cluster.

When one of the R/W nodes receives a write request from the application, it immediately proposes that data be committed to all of the storage notes in that cluster. At this point, each node that makes up a cluster either confirms or rejects the proposed change. It will reject if this conflicts with something already in flight.

The writing instance is looking for a bunch of nodes to agree. If the group rejects it, it cancels the write in error. If it commits, it will replicate on all storage nodes in the cluster.

This also ensures storage is updated on in-memory cache's of other nodes.

If a writer goes down in a multi-master cluster, the application will shift all future load over to a new writer with little if any disruption.

Database Migration Service (DMS)

A managed database migration service. Starts with a replication instance which runs on top of an EC2 instance. This replication instance runs one or more replication tasks. This is where the configuration is defined for the migration of databases. This runs using a replication instance.

Need to define the source and destination endpoints. These point at the physical source and target databases. One of these end points must be on AWS.

Full load migration is a one off process which transfers everything at once. This requires the database to be down during this process. This might take several days.

Instead Full Load + CDC allows for a full load transfer to occur and it monitors any changes that happens during this time. Any of the captured changes can be applied to the target.

CDC only migration is good if you have a vendor solution that works quickly and only changes need to be captured.

Schema Conversion Tool or SCT can perform conversions between database types.


EFS Architecture

EFS moves the instances closer to being stateless.

Elastic File System Explained

EFS runs inside a VPC. Inside EFS you create file systems and these use POSIX permissions. EFS is made available inside a VPC via mount targets. Mount targets have IP addresses taken from the IP address range of the subnet they're inside. For HA, you need to make sure that you put mount targets in each AZ the system runs in.

You can use hybrid networking to connect to the same mount targets.

EFS Exam PowerUp


Load Balancing Fundamentals

Using one server is risky because that server can have performance issues or be completely unavailable, thus bringing down an application.

A better solution is to use multiple servers. Without load balancing, this could bring additional problems.

Load Balancers Architecture

The user connects to a load balancer that is set to listens on port 80 and 443.

Within AWS, the configuration for which ports the load balancer listens on is called a listener.

The user is connected to the load balancer and not the actual server.

Behind the load balancer, there is an application server. At a high level when the user connects to the load balancer, it distributes that load to servers on the application server. The users client thinks it is talking directly to the application server.

LB will run health checks against all of the servers. If one of the servers does fail, the load balancer will realize this and stop sending connections to that server. From the users client, the application always works.

As long as 1+ servers are operational, the LB is operational. Clients shouldn't see errors that occur with one server.

LB Exam PowerUp

Application Load Balancer (ALB)

ALB is a layer 7 or Application Layer Load Balancer. It is capable of inspecting data that passes through. It can understand the application layer http and https and take actions based on things in those protocols like paths, headers, and hosts.

Capacity that you have as part of an ALB increases automatically based on the load which passes through that ALB. This is made of multiple ALB nodes each running in different AZs. This makes them scalable and highly available.

Load balancing can be internet facing or internal. The difference is whether the nodes of the LB, the things which run in the AZs have public IP addresses or not.

Internet facing LB is designed to be connected to, from public internet based clients, and load balance them across targets.

Internal load balancer is not accessible from the internet and is used to load balance inside a VPC only.

Load balancer sits between a client and one or more servers. Front end or listening side, accepts connections from a client. Back end is used for distribution to the targets.

LB billed on hourly rate and Load Balancer Capacity Unit LCU. LCU that you consume is based on the highest value for all of the individual measurements. You pay a certain number of LCUs based on your load over that hour.

Cross zone load balancing

Each node that is part of the load balancer is able to distribute load across all instances across all AZ that are registered with that LB, even if its not in the same AZ. It is the reason we can achieve a balanced distribution of connections behind a load balancer.

It can also provide health checks on the target servers. If all instances are shown as healthy, it can distribute evenly.

ALB can support a wide array of targets. Targets are grouped within target groups and an individual target can be a member of multiple groups. It's the groups which ALBs distribute connections to. You could create rules to direct traffic to different Target Groups based on their DNS.

ALB Exam PowerUp

Launch Configuration and Templates

They are documents which allow you to config an EC2 instance in advance. Anything you usually define at the point of launching an instance can be selected with a Launch Configuration (LC) or Launch Template (LT).

LTs are newer and provide more features than LCs like versioning.

Both of these are not editable. You define them once and that configuration is locked. If you need to adjust a configuration, you must make a new one and launch it.

LTs can be used to save time when provisioning EC2 instances from the console UI / CLI.

Autoscaling Groups

Provision or terminate instances to keep at the desired level Scaling Policies can trigger this based on metrics.

Autoscaling Groups will distribute EC2 instances to try and keep the AZs equal.

Scaling Policies

Manual Scaling - manually adjust the desired capacity Scheduled Scaling - time based adjustments Dynamic Scaling

Cooldown Period is how long to wait at the end of a scaling action before scaling again. There is a minimum billable duration for an EC2 instance. Currently this is 300 seconds.

Self healing occurs when an instance has failed and AWS provisions a new instance in its place. This will fix most problems that are isolated to one instance.

AGS can use the load balancer health checks rather than EC2. ALB status checks can be much richer than EC2 checks because they can monitor the status of HTTP and HTTPS requests. This makes them more application aware.

Network Load Balancer (NLB)

Part of AWS Version 2 series of load balancers. NLBs are Layer 4, only understand TCP and UDP.

Can't interpret HTTP or HTTPs, but this makes it much faster in latency. If you see anything about latency and HTTP and HTTPS are not involved, this should default to a NLB.

There is nothing stopping NLB from load balancing on HTTP just by routing data. They would do this really fast and can deliver millions of requests per second.

Only member of the load balancing family that can be provided a static IP. There is 1 interface per AZ. Can also use Elastic IPs (whitelisting) and should be used for this purpose.

Can perform SSL pass through.

NLB can load balance non HTTP/S applications, doesn't care about anything above TCP/UDP. This means it can handle load balancing for FTP or things that aren't HTTP or HTTPS.

SSL Offload and Session Stickiness

Bridging - Default mode

One or more clients makes one or more connections to a load balancer. The load balancer is configured so its listener uses HTTPS, SSL connections occur between the client and the load balancer.

The load balancer then needs an SSL certificate that matches the domain name that the application uses. AWS has access to this certificate. If you need to be careful of where your certificates are stored, you may have a problem with this system.

ELB initiates a new SSL connection to backend instances with a removed HTTPS certificate. This can take actions based on the content of the HTTP.

The application local balancer requires a SSL certificate because it needs to decrypt any data that's being encrypted by the client. Once decrypted, it will interpret it then create new encrypted sessions between it and the back end EC2 instances. The EC2 instance will need matching SSL certificates.

Needs the compute for the cryptographic operations. Every EC2 instance must perform these cryptographic operations. This overhead can be significant.

The main benefit is the elastic load balancer gets to see the unencrypted HTTP and can take actions based on what's contained in this plain text protocol.

Pass-through - Network Load Balancer

The client connects, but the load balancer passes the connection along without decrypting the data at all. The instances still need the SSL certificates, but the load balancer does not. Specifically it's a network load balancer which is able to perform this style of connection.

The load balancer is configured for TCP, it can see the source or destinations, but it never touches the encrypted connection. The certificate never needs to be seen by AWS.

Negative is you don't get any load balancing based on the HTTP part because that is never exposed to the load balancer. The EC2 instances still need the compute cryptographic overhead.


Clients connect to the load balancer using HTTPS and are terminated on the load balancer. The LB needs an SSL certificate to decrypt the data, but on the backend the data is sent via HTTP. While there is a certificate required on the load balancer, this is not needed on the LB.

Data is in plaintext form across AWS's network. Not a problem for most.

Connection Stickiness

If there is no stickiness, each time the customer logs on they will have a stateless experience. If the state is stored on a particular server, sessions can't be load balanced across multiple servers.

There is an option available within elastic load balancers called Session Stickiness. And within an application load balancer this is enabled on a target group. If enabled, the first time a user makes a request, the load balancer generates a cookie called AWSALB with a duration. A valid duration is between one second and seven days. For this time, sessions will be sent to the same backend instance. This will happen until:

This could cause backend unevenness because one user will always be forced to the same server no matter what the distributed load is. Applications should be designed to hold session stickiness somewhere other than EC2.


Architecture Evolution


This is the least cost effective way to architect systems.


Evolving with Queues

Event Driven Architecture

AWS Lambda

Lambda Architecture

Best practice is to make it very small and very specialized. Lambda function code, when executed is known as being invoked. When invoked, it runs inside a runtime environment that matches the language the script is written in. The runtime environment is allocated a certain amount of memory and an appropriate amount of CPU. The more memory you allocate, the more CPU it gets, and the more the function costs to invoke per second.

Lambda functions can be given an IAM role or execution role. The execution role is passed into the runtime environment. Whenever that function executes, the code inside has access to whatever permissions the role's permission policy provides.

Lambda can be invoked in an event-driven or manual way. Each time you invoke a lambda function, the environment provided is new. Never store anything inside the runtime environment, it is ephemeral.

Lambda functions by default are public services and can access any websites. By default they cannot access private VPC resources, but can be configured to do so if needed. Once configured, they can only access resources within a VPC. Unless you have configured your VPC to have all of the configuration needed to have public internet access or access to the AWS public space endpoints, then the Lambda will not have access.

The Lambda runtime is stateless, so you should always use AWS services for input and output. Something like DynamoDB or S3. If a Lambda is invoked by an event, it gets details of the event given to it at startup.

Lambda functions can run up to 15 minutes. That is the max limit.

Key Considerations

CloudWatch Events and EventBridge

Delivers near real time stream of system events that describe changes in AWS products and services. EventBridge will replace CW Events. EventBridge can also handle events from third parties. Both share the same underlying architecture. AWS is now encouraging a migration to EB.

CloudWatch Events Key Concepts

They can observe if X happens at Y time(s), do Z.

EventBridge is basically CloudWatch Events V2 that uses the same underlying APIs and has the same architecture, but with additional features. Things created in one can be visible in the other for now.

Both systems have a default Event bus for a single AWS account. A bus is a stream of events which occur for any supported service inside an AWS account. In CW Events, there is only one bus (implicit), this is not exposed. EventBridge can have additional event buses for your applications or third party applications and services. These can be interacted with in the same way as the default bus.

In both services, you create rules and these rules pattern match events which occur on the buses and when they see an event which matches, they deliver that event to a target. Alternatively you can have schedule based rules which match a certain date and time or ranges of dates and times.

Rules match incoming events or schedules. The rule matches an event and routes that event to one or more targets as you define on that rule.

Architecturally at the heart of event bridge is the default account event bus. This is a stream of events generated by supported services within the AWS account. Rules are created and these are linked to a specific event bus or the default event bus. Once the rule completes pattern matching, the rule is executed and moves that event that it matched through to one or more targets. The events themselves are JSON structures and the data can be used by the targets.

Application Programming Interface (API) Gateway

Great during an architecture evolution because the endpoints don't change.

  1. Create a managed API and point at the existing monolithic application.
  2. Using API gateway allows the business to evolve along the way slowly. This might move some of the data to fargate and aurora architecture.
  3. Move to a full serverless architecture with DynamoDB.


This is not one single thing, you manage few if any servers. This aims to remove overhead and risk as much as possible. Applications are a collection of small and specialized functions that do one thing really well and then stop.

These functions are stateless and run in ephemeral environments. Every time they run, they obtain the data that they need, they do something and then optionally, they store the result persistently somehow or deliver the output to something else.

Generally, everything is event driven. Nothing is running until it's required. While not being used, there should be little to no cost.

Should use managed services when possible.

Aim is to consume as a service whatever you can, code as little as possible, and use function as a service for any general purpose compute needs, and then use all of those building blocks together to create your application.

Example of Serverless

A user wants to upload videos to a website for transcoding.

  1. User browses to a static website that is running the uploader. The JS runs directly from the web browser.
  2. Third party auth provider, google in this case, authenticates via token.
  3. AWS cannot use tokens provided by third parties. Cognito is called to swap the third party token for AWS credentials.
  4. Service uses these temporary credentials to upload a video to S3 bucket.
  5. Bucket will generate an event once it has completed the upload.
  6. A lambda triggers to transcode the video as needed. The transcoder will get the original S3 bucket video location and will use this for its workload.
  7. Output will be added to a new transcode bucket and will put an entry into DynamoDB.
  8. User can interact with another Lambda to pull the media from the transcode bucket using the DynamoDB entry.

Simple Notification Service (SNS)


AWS Step Functions

There are many problems with lambdas limitations that can be solved with a state machine. A state machine is a workflow. It has a start point, end point and in between there are states. States are things inside a State Machine which can do things. States can do things, and take in data, modify data, and output data.

State machine is designed to perform an activity or workflow with lots of individual components and maintain the idea of data between those states.

Maximum duration for a state machine execution is 1 year.

Two types of workflow

Started via API Gateway, IOT Rules, EventBridge, Lambda. Generally used for back end processing.

With State machines you can use a template to create and export State Machines once they're configured to your liking, it's called Amazon States Language or ASL. It's based on JSON.

State machines are provided permission to interact with other AWS services via IAM roles.

Step Function States

States are the things inside a workflow, the things which occur. These states are available.

Simple Queue Service (SQS)

Public service that provides fully managed highly available message queues.

Billed on requests not messages. A request is a single request to SQS. One request can return 0 - 10 messages up to 64KB data in total. Since requests can return 0 messages, frequently polling a SQS Queue, makes it less effective.

Two ways to poll

Messages can live on SQS Queue for up to 15 days. They offer KMS encryption at rest. Server side encryption. Data is encrypted in transit with SQS and any clients.

Access to a queue is based on identity policies or a queue policy. Queue policies only can allow access from an outside account. This is a resource policy.


Kinesis data records (1MB) are stored across shards and are the blocks of data for a stream.

Kinesis Data Firehose connects to a Kinesis stream. It can move the data from a stream onto S3 or another service.

SQS vs Kinesis




Architecture Basics

Caching Optimization

Parameters can be passed on the url such as query string parameter. An example is ?language=en and ?language=es

Caching will cache each string parameter storing two different objects. You must use the same string parameters again to retrieve them. If you remove them and the object is not caching it will need to be fetched first.

If string parameters aren't involved in the caching, you can select no to forward them to the origin.

If the application does use query string parameters, you can use all of them for caching or just selected ones.

AWS Certificate Manager (ACM)

Origin Access Identity (OAI)

  1. Identity can be associated with a CloudFront distribution.
  2. The edge locations gain this identity.
  3. Create or adjust the bucket policy on the S3 origin. Add an explicit allow for the OAI. Can remove any other explicit allows on the OAI. This leaves the implicit deny.

As long as accesses are coming from the edge locations, it will know they are from the OAI and allow them. Any direct attempts will not use the OAI and will only get the implicit deny.

Best practice is to create one OAI per CloudFront distribution to manage permissions.

AWS Global Accelerator


VPC Flow Logs

Egress-Only Internet Gateway

VPC Endpoints (Gateway)

Allow a private only resource inside a VPC or any resource inside a private only VPC access to S3 and DynamoDB.

Normally when you want to access a public service through a VPC, you need infrastructure. You would create an IGW and attach it to the VPC. Resources inside need to be granted IP address or implement one or more NAT gateways which allow instances with private IP addresses to access these public services.

When you allocate a gateway endpoint to a subnet, a prefix list is added to the route table. The target is the gateway endpoint. Any traffic destined for S3, goes via the gateway endpoint. The gateway endpoint is highly available for all AZs in a region by default.

With a gateway endpoint you set which subnet will be used with it and it will configure automatically. A gateway endpoint is a VPC gateway object. Endpoint policy controls what things can be connected to by that endpoint.

Gateway endpoints can only be used to access services in the same region. Can't access cross-region services.

S3 buckets can be set to private only by allowing access ONLY from a gateway endpoint. For anything else, the implicit deny will apply.

They are only accessible from inside that specific VPC.

VPC Endpoints (Interface)

Gateway Endpoints vs Interface Endpoints

Gateway endpoints work using prefix lists and route tables so they do not need changes to the applications. The application thinks it's communicating directly with S3 or DynamoDB and all we're doing by using a gateway endpoint is influencing the route that the traffic flow uses. Instead of using IGW, it goes via gateway endpoint and can use private IP addressing. highly available

Interface Endpoints uses DNS and a private IP address for the interface endpoint. You can either use the endpoint specific DNS names or you can enable PrivateDNS which overrides the default and allows unmodified applications to access the services using the interface endpoint. This doesn't use routing and only DNS. not highly available

VPC Peering

Direct encrypted network link between two and only two VPCs. Peering connection can be in the same or cross region and in the same or across accounts.

When you create a VPC peer, you can enable an option so that public hostnames of services in the peered VPC resolve to the private internal IPs. You can use the same DNS names if its in peered VPCs or not. If you attempt to resolve the public DNS hostname of an EC2 instance, it will resolve to the private IP address of the EC2 instance.

VPCs in the same region can reference each other by using security group id. You can do the same efficient referencing and nesting of security groups that you can do if you're inside the same VPC. This is a feature that only works with VPC peers inside the same region.

In different regions, you can utilize security groups, but you'll need to reference IP addresses or IP ranges. If VPC peers are in the same region, then you can do the logical referencing of an entire security group.

VPC peering connects ONLY TWO

VPC Peering does not support transitive peering. If you want to connect 3 VPCs, you need 3 connections. You can't route through interconnected VPCs.

VPC Peering Connections CANNOT be created with overlapping VPC CIDRs.


AWS Site-to-Site VPN

AWS Direct Connect (DX)

Has one physical cable with no high availability and no encryption. DX Port Provisioning is likely quick, the cross-connect takes longer. Can take weeks or month for physical cable to be installed. Generally use a VPN first then bring a DX in and leave VPN as backup.

DX provides NO ENCRYPTION and needs to be managed on a per application basis. There is a common way around this limitation. The Public VIF allows connections to AWS public services. Inside the VPC we already have a virtual private gateway, because this is used for any private VIFs running over the Direct Connect. Creating a virtual private gateway creates end points that are located inside the AWS public zone with public IP addresses. These end points have already been created and they already exist. We can create a VPN and instead of using the public internet as the transit network, you can use the public VIF running over Direct Connect.

You run an IPSEC VPN over the public VIF, over the Direct Connect connection, you get all of the benefits of Direct Connect such as high speeds, and all the benefits of IPSEC encryption.

AWS Transit Gateway (TGW)

Storage Gateway

Snowball / Edge / Snowmobile

Designed to move large amounts of data IN and OUT of AWS. Physical storage the size of a suitcase or truck. Ordered from AWS, use, then return.


Snowball Edge


Portable data center within a shipping container on a truck. This is a special order and is not available in high volume. Ideal for single location where 10 PB+ is required. Max is 100 PB per snowmobile.

AWS Directory Service

Directories stores objects, users, groups, computers, servers, file shares with a structure called a domain / tree. Multiple trees can be grouped into a forest.

Devices can join a directory so laptops, desktops, and servers can all have a centralized management and authentication. You can sign into multiple devices with the same username and password.

One common directory is Active Directory by Microsoft and its full name is Microsoft Active Directory Domain Services or AD DS.

Directory Modes

AWS DataSync

AWS DataSync Components

FSx for Windows File Server

Words to look for


AWS Secrets Manager

Secrets Manager Example

  1. The Secrets Manager SDK retrieves database credentials.
  2. SDK uses IAM credentials to retrieve the secrets.
  3. Application uses the secrets to access the database.
  4. Periodically, a lambda function is invoked to rotate the secrets.
  5. The Lambda uses an execution role to get permissions.

Secrets are secured using KMS so you never risk any leakage via physical access to the AWS hardware and KMS ensures role separation.

AWS Shield and WAF (Web Application Firewall)

Provides against DDoS attacks with AWS resources. This is a denial of service attack. Normally not possible to block them by using individual IP addresses. Without detailed analysis, the traffic looks like normal requests to your website.

Example of Architecture

Shield standard automatically looks at the data before any data reaches past Route53. The user is directed to the closest CloudFront location. Again, shield standard looks at the data again before it moves on.

WAF Rules are defined and included in a WEBACL which is associated to a cloud front distribution and deployed to the edge.

Shield advanced can then intercept traffic when it reaches the load balancer. Once the data reaches the VPC, it has been filtered at Layer 3, 4, and 7 already.

Layer 7 filtering is only provided by WAF.


KMS is the key management service within AWS. It is used for encryption within AWS and it integrates with other AWS products. Can generate keys, manage keys, and can integrate for encryption. The problem is this is a shared service. You're using a service which other accounts within AWS also use. Although the permissions are strict, AWS still does manage the hardware for KMS. KMS is a hardware security module or HSM. These are industry standard pieces of hardware which are designed to manage keys and perform cryptographic operations.

You can run your own HSM on premise. Cloud HSM is a true "single tenant" hardware security module (HSM) that's hosted within the AWS cloud. AWS provisions the HW, but it is impossible for them to help. There is no way to recover data from them if access is lost.

Fully FIPS 140-2 Level 3 (KSM is L2 overall, but some is L3) IF you require level 3 overall, you MUST use CloudHSM.

KSM all actions are performed with AWS CLI and IAM roles.

HSM will not integrate with AWS by design and uses industry standard APIs.

KMS can use CloudHSM as a custom key store, CloudHSM integrates with KMS.

HSM is not highly available and runs within one AZ. To be HA, you need at least two HSM devices and one in each AZ you use. Once HSM is in a cluster, they replicate all policies in sync automatically.

HSM needs an endpoint in the subnet of the VPC to allow resources access to the cluster.

AWS has no access to the HSM appliances which store the keys.

Cloud HSM Use Cases


DynamoDB Architecture

NoSQL Database as a Service (DBaaS)

Dynamo DB Tables

In DynamoDB, capacity means speed. If you choose on-demand capacity model you don't have to worry about capacity. You only pay for the operations for the table. If you choose provisioned capacity, you must set this on a per table basis.

Capacity is set per WCU or RCU

1 WCU means you can write 1KB per second to that table 1 RCU means you can read 4KB per second for that table

Dynamo DB Backups

On-demand Backups: Similar to manual RDS snapshots. Full backup of the table that is retained until you manually remove that backup. This can be used to restore data in the same region or cross-region. You can adjust indexes, or adjust encryption settings.

Point-in-time Recovery: Must be enabled on each table and is off by default. This allows continuous record of changes for 35 days to allow you to replay any point in that window to a 1 second granularity.

Dynamo DB Considerations

Access to Dynamo is from the console, CLI, or API. You don't have SQL access.

Billing based on:

Can purchase reserved capacity with a cheaper rate for a longer term commit.

DynamoDB Operations, Consistency, and Performance

DynamoDB Reading and Writing

On-Demand: Unknown or unpredictable load on a table. This is also good for as little admin overhead as possible. Pay a price per million Read or Write units. This is as much as 5 times the price as provisioned.

Provisioned: RCU and WCU set on a per table basis.

Every operation consumes at least 1 RCU/WCU

1 RCU = 1 x 4KB read operation per second. This rounds up. 1 WCU = 1 x 1KB write operation per second.

Every single table has a WCU and RCU burst pool. This is 500 seconds of RCU or WCU as set by the table.


You have to pick one Partition Key (PK) value to start.

The PK can be the sensor unit, the Sort Key (SK) can be the day of the week you want to look at.

Query accepts a single PK value and optionally a SK or range. Capacity consumed is the size of all returned items. Further filtering discards data, but capacity is still consumed.

In this example you can only query for one weather station.

If you query a PK it can return all fields items that match. It is always more efficient to pull as much data as needed per query to save RCU.

You have to query for at least one item of PK and are charged for the response of that query operation.

If you filter data and only look at one attribute, you will still be charged for pulling all the attributes against that query.


Least efficient when pulling data from Dynamo, but the most flexible.

Scan moves through the table item by item consuming the capacity of every item. Even if you consume less than the whole table, it will charge based on that. It adds up all the values scanned and will charge rounding up.

DynamoDB Consistency Model

Eventually Consistent: easier to implement and scales better Strongly (Immediately) Consistent: more costly to achieve

Every piece of data is replicated between storage nodes. There is one Leader storage node and every other node follows.

Writes are always directed to the leader node. Once the leader is complete, it is consistent. It then starts the process of replication. This typically takes milliseconds and assumes the lack of any faults on the storage nodes.

Eventual consistent could lead to stale data if a node is checked before replication completes. You get a discount for this risk.

A strongly consistent read always uses the leader node and is less scalable.

Not every application can tolerate eventual consistency. If you have a stock database or medical information, you must use strongly consistent reads. If you can tolerate the cost savings you can scale better.

WCU Example Calculation

RCU Example Calculation

DynamoDB Streams and Triggers

DynamoDB stream is a time ordered list of changes to items in a DynamoDB table. A stream is a 24 hour rolling window of the changes. It uses Kinesis streams on the backend.

This is enabled on a per table basis. This records

Different view types influence what is in the stream.

There are four view types that it can be configured with:

Pre or post change state might be empty if you use insert or delete

Trigger Concepts

Allow for actions to take place in the event of a change in data

Item change generates an event that contains the data which was changed. The specifics depend on the view type. The action is taken using that data. This will combine the capabilities of stream and lambda. Lambda will complete some compute based on this trigger.

This is great for reporting and analytics in the event of changes such as stock levels or data aggregation. Good for data aggregation for stock or voting apps. This can provide messages or notifications and eliminates the need to poll databases.

DynamoDB Local (LSI) and Global (GSI) Secondary Indexes

Local Secondary Indexes (LSI)

Global Secondary Index (GSI)

LSI and GSI Considerations

GSI as default and only use LSI when strong consistency is required

Indexes are designed when data is in a base table needs an alternative access pattern. This is great for a security team or data science team to look at other attributes from the original purpose.

DynamoDB Global Tables

DynamoDB Accelerator (DAX)

This is an in memory cache for Dynamo.

Traditional Cache: The application needs to access some data and checks the cache. If the cache doesn't have the data, this is known as a cache miss. The application then loads directly from the database. It then updates the cache with the new data. Subsequent queries will load data from the cache as a cache hit and it will be faster

DAX: The application instance has DAX SDK added on. DAX and dynamoDB are one in the same. Application uses DAX SDK and makes a single call for the data which is returned by DAX. If DAX has the data, then the data is returned directly. If not it will talk to Dynamo and get the data. It will then cache it for future use. The benefit of this system is there is only one set of API calls using one SKD. It is tightly integrated and much less admin overhead.

DAX Architecture

This runs from within a VPC and is designed to be deployed to multiple AZs in that VPC. Must be deployed across AZs to ensure it is highly available.

DAX is a cluster service where nodes are placed into different AZs. There is a primary node which is the read and write note. This replicates out to other nodes which are replica nodes and function as read replicas. With this architecture, we have an EC2 instance running an application and the DAX SDK. This will communicate with the cluster. On the other side, the cluster communicates with DynamoDB.

DAX maintains two different caches. First is the item cache and this caches individual items which are retrieved via the GetItem or BatchGetItem operation. These operate on single items and must specify the items partition or sort key.

There is a query cache which holds data and the parameters used for the original query or scan. Whole query or scan operations can be rerun and return the same cached data.

Every DAX cluster has an endpoint which will load balance across the cluster. If data is retrieved from DAX directly, then it's called a cache hit and the results can be returned in microseconds.

Any cache misses, so when DAX has to consult DynamoDB, these are generally returned in single digit milliseconds. Now in writing data to DynamoDB, DAX can use write-through caching, so that data is written into DAX at the same time as being written into the database.

If a cache miss occurs while reading, the data is also written to the primary node of the cluster and the data is retrieved. And then it's replicated from the primary node to the replica nodes.

When writing data to DAX, it can use write-through. Data is written to the database, then written to DAX.

DAX Considerations

Amazon Athena

Athena Explained

The source data is stored on S3 and Athena can read from this data. In Athena you are defining a way to get the original data and defining how it should show up for what you want to see.

Tables are defined in advance in a data catalog and data is projected through when read. It allows SQL-like queries on data without transforming the data itself.

This can be saved in the console or fed to other visualization tools.

You can optimize the original data set to reduce the amount of space uses for the data and reduce the costs for querying that data.