在Spotinst,我们必须先喝自己的香槟,然后才会把产品发布给顾客。作为一家以成本为中心的公司,我们正努力在Spot Instances上部署每一种可能的服务Elastigroup。
这周我有机会和我的同事Alex Friedman聊天,他领导着我们的DevOps,我问他我们如何在Spot实例上运行我们庞大的Elastic (ElasticSearch)集群。我们正在使用Elastic的内置特性来实现高可用性,将数据存储在EBS磁盘上以维护Elastic数据和状态,并使用Elastigroup来实现平稳的EC2 Spot迁移和管理,这有助于我们大幅降低成本,更快地扩展,甚至变得更高可用性。(我不知道你们中有多少人丢失了Elastic节点,并在几分钟内设法恢复,没有性能或数据丢失,Alex做到了!)
我已经请Alex整理了一份我们内部如何使用它的指南,我很高兴与你们分享。
让我们进入细节,这里是高级步骤,我们将讨论;
- 为ElasticSearch部署弹性组主节点。(开始
cerebro
用于在一个主服务器上进行监视) - 为ElasticSearch部署弹性组数据节点。(在配置中使用Master节点的ip)
看看我们的跑步指南吧Elasticsearch在Kubernetes上
ElasticSearch主节点的Elastigroup
为主节点创建一个新的有状态弹性组
重要的配置
阿兹/子网:在选择多个az时,特别是在启用ES复制时,要注意跨az成本。
实例类型:面向内存的类型,R3
R4
是更可取的,因为ES的高内存消耗
AMI:在这个例子中,我使用Amazon Linux 2 LTS候选2
Elastigroup有状态设置:在您的vpc/子网中查找可用的私有ip,并在指定私有ip
选项(因为我们想为主节点保留静态地址)
主用户数据:
- 安装ES, Java从互联网回购
- 的变量
设备= " / dev / xvdc "
应该与EBS相匹配设备名称
配置如下所述。
# !/usr/bin/env bash device="/dev/xvdc" data_path="/var/lib/elasticsearch" sleep_sec=10 elastic_pckg="elasticsearch-6.2.4-1. "noarch" logical_volume="lv_elastic01" volume_group="vg_elastic01" device_lv="/dev/${volume_group}/${logical_volume}" function install_java {echo "安装jdk" rpm_file_src="http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm" rpm_file_dest="/tmp/jdk-8u131-linux-x64. exe " rpm_file_src=" /tmp/jdk-8u131-linux-x64. exe "rpm" wget——no-check-certificate——no-cookies——header "Cookie: oracelicense =accept-securebackup-cookie" -O "$rpm_file_dest" "$rpm_file_src" rpm -Uvh "$rpm_file_dest"} function install_es {echo "Install elasticsearch" cat > /etc/ yum.rebs .d/elasticsearch. exe "elasticsearch-6. repo << 'EOF' [elasticsearch-6.]x] name=Elasticsearch repository for 6。X packages baseurl=https://artifacts.elastic.co/packages/6。X /yum gpgcheck=1 gpgkey=https://artifacts.elastic。cat > /etc/elasticsearch/elasticsearch. co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md EOF yum install $elastic_pckg -y echo "Update elasticsearch config" cat > /etc/elasticsearch/elasticsearch. co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-mdyml < < EOF ' # ======================== Elasticsearch配置 ========================= # ---------------------------------- 集群 ----------------------------------- # 为您的集群使用描述性名称:cluster.name: es-test # ------------------------------------ 节点 ------------------------------------ node.name: ${主机名}节点。Master: true节点。数据:假节点。摄入:假search.remote.connect:假 # ----------------------------------- 路径 ------------------------------------ # 路径目录来存储数据(用逗号分隔多个位置):路径。data: /var/lib/elasticsearch #日志文件路径:Path。日志:/var/log/elasticsearch # ---------------------------------- 网络 ----------------------------------- # 设置地址绑定到特定的IP (IPv4和IPv6):网络。主持人:0.0.0.0 # --------------------------------- 发现 ---------------------------------- # # 通过一个初始的主机列表来执行新节点启动时发现:#默认的主机列表(127.0.0.1," [::1]"]discovery.zen.ping.unicast.hosts:[“10.14.11.70”、“10.14.11.71”,“10.14.11.72”】#防止“分裂通过配置大部分节点(master-eligible节点总数/ 2 + 1):discovery.zen。根据inst. type #grep Xm /etc/elasticsearch/jvm更新jvm Xmx/Xms。options } function install_cerebro { echo "Install Docker" kernel_installed=$(uname -r) if [[ "$kernel_installed" =~ .*amzn.* ]]; then yum install docker-17.06.2ce-1.102.amzn2.x86_64 -y else echo "Amazon Linux is required, not installing docker/cerebro" fi } function handle_mounts { echo "Wait until we have the EBS attached (new or reattached)" ls -l "$device" > /dev/null while [ $? -ne 0 ]; do echo "Device $device is still NOT available, sleeping..." sleep $sleep_sec ls -l "$device" > /dev/null done echo "Device $device is available" echo "Check if the instance is new or recycled" lsblk "$device" --output FSTYPE | grep LVM > /dev/null if [ $? -ne 0 ]; then echo "Device $device is new, creating LVM & formatting" pvcreate "$device" pvdisplay vgcreate vg_elastic01 "$device" vgdisplay lvcreate -l 100%FREE -n "$logical_volume" "$volume_group" lvdisplay mkfs -t ext4 $device_lv else echo "Device $device was reattached" fi echo "Add to entry to fstab" UUID=$(blkid $device_lv -o value | head -1) echo "UUID=$UUID $data_path ext4 _netdev 0 0" >> /etc/fstab echo "Make sure mount is available" mount -a > /dev/null while [ $? -ne 0 ]; do echo "Error mounting all filesystems from /etc/fstab, sleeping..." sleep 2; mount -a > /dev/null done chown -R elasticsearch:elasticsearch "$data_path" echo "Mounted all filesystems from /etc/fstab, proceeding" } function start_apps { echo "Start elasticsearch" systemctl start elasticsearch.service } function main { ## Installations can be offloaded to AMI install_java install_es install_cerebro handle_mounts start_apps } main
在Elastigroup创建的最后阶段添加所需大小的EBS卷
" blockdevicemaps ": [{"deviceName": "/dev/xvdc", "ebs": {" deleteonterminate ": false, "volumeSize": 50, "volumeType": "gp2"}}],
点击“创建然后等待大师们发射。
ElasticSearch数据节点的Elastigroup
使用与上面相同的步骤,只是略有不同用户和数据
脚本由于做主/数据节点的差异。
不需要维护特定的私有ip。
数据节点user-data:
# !/usr/bin/env bash device="/dev/xvdc" data_path="/var/lib/elasticsearch" sleep_sec=10 elastic_pckg="elasticsearch-6.2.4-1. "noarch" logical_volume="lv_elastic01" volume_group="vg_elastic01" device_lv="/dev/${volume_group}/${logical_volume}" function install_java {echo "安装jdk" rpm_file_src="http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm" rpm_file_dest="/tmp/jdk-8u131-linux-x64. exe " rpm_file_src=" /tmp/jdk-8u131-linux-x64. exe "rpm" wget——no-check-certificate——no-cookies——header "Cookie: oracelicense =accept-securebackup-cookie" -O "$rpm_file_dest" "$rpm_file_src" rpm -Uvh "$rpm_file_dest"} function install_es {echo "Install elasticsearch" cat > /etc/ yum.rebs .d/elasticsearch. exe "elasticsearch-6. repo << 'EOF' [elasticsearch-6.]x] name=Elasticsearch repository for 6。X packages baseurl=https://artifacts.elastic.co/packages/6。X /yum gpgcheck=1 gpgkey=https://artifacts.elastic。cat > /etc/elasticsearch/elasticsearch. co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md EOF yum install $elastic_pckg -y echo "Update elasticsearch config" cat > /etc/elasticsearch/elasticsearch. co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-mdyml < < EOF ' # ======================== Elasticsearch配置 ========================= # ---------------------------------- 集群 ----------------------------------- # 为您的集群使用描述性名称:cluster.name: es-test # ------------------------------------ 节点 ------------------------------------ node.name: ${主机名}节点。师父:假 # ----------------------------------- 路径 ------------------------------------ # 路径目录来存储数据(用逗号分隔多个位置):路径。data: /var/lib/elasticsearch #日志文件路径:Path。日志:/var/log/elasticsearch # ---------------------------------- 网络 ----------------------------------- # 设置地址绑定到特定的IP (IPv4和IPv6):网络。主持人:0.0.0.0 # --------------------------------- 发现 ---------------------------------- # # 通过一个初始的主机列表来执行新节点启动时发现:#默认的主机列表(127.0.0.1," [::1]"]discovery.zen.ping.unicast.hosts:[“10.14.11.70”、“10.14.11.71”,“10.14.11.72”】#防止“分裂通过配置大部分节点(master-eligible节点总数/ 2 + 1):discovery.zen。根据inst. type #grep Xm /etc/elasticsearch/jvm更新jvm Xmx/Xms。函数handle_mounts {echo "Wait until we have EBS attach (new或reattach)" ls -l "$device" > /dev/null while [$? -ne 0 ]; do echo "Device $device is still NOT available, sleeping..." sleep $sleep_sec ls -l "$device" > /dev/null done echo "Device $device is available" echo "Check if the instance is new or recycled" lsblk "$device" --output FSTYPE | grep LVM > /dev/null if [ $? -ne 0 ]; then echo "Device $device is new, creating LVM & formatting" pvcreate "$device" pvdisplay vgcreate vg_elastic01 "$device" vgdisplay lvcreate -l 100%FREE -n "$logical_volume" "$volume_group" lvdisplay mkfs -t ext4 $device_lv else echo "Device $device was reattached" fi echo "Add to entry to fstab" UUID=$(blkid $device_lv -o value | head -1) echo "UUID=$UUID $data_path ext4 _netdev 0 0" >> /etc/fstab echo "Make sure mount is available" mount -a > /dev/null while [ $? -ne 0 ]; do echo "Error mounting all filesystems from /etc/fstab, sleeping..." sleep 2; mount -a > /dev/null done chown -R elasticsearch:elasticsearch "$data_path" echo "Mounted all filesystems from /etc/fstab, proceeding" } function start_apps { echo "Start elasticsearch" systemctl start elasticsearch.service } function main { ## Installations can be offloaded to AMI install_java install_es handle_mounts start_apps } main
后安装
Cerebro UI
在一个主节点上运行Cerebro
$ systemctl start docker。Service $ docker run -d -p 9000:9000——name cerebro yannart/cerebro:最新的
在端口上授予对Cerebro的访问权限9000
启用延迟分片分配
她的故事的一个重要部分是使延迟分片分配,以便在Spot实例被替换时更好地支持碎片替换。
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d' {"settings": {"index.unassigned.node_left.delayed_timeout": "7m"}