Shopify 开源 > statsd-instrument 中文

这是一个用于 statsd (https://github.com/statsd/statsd) 的 Ruby 客户端。它提供了一种轻量级的方式来跟踪和度量应用程序中的指标。

我们通过 UDP 套接字发送数据来调用 statsd。 UDP 套接字速度很快，但不可靠，无法保证您的数据会到达其位置。换句话说，即发即忘。这对于这种情况来说是完美的，因为它意味着您的代码不会因尝试记录统计信息而陷入困境。我们在每个请求中多次向 statsd 发送数据，并且没有注意到性能下降。

有关 StatsD 的更多信息，请参阅 StatsD 项目的 README。

配置

建议通过设置环境变量来配置此库。支持以下环境变量：

STATSD_ADDR: (默认 localhost:8125) 发送 StatsD UDP 数据报的地址。
STATSD_IMPLEMENTATION: (默认: datadog)。您正在使用的 StatsD 实现。支持 statsd 和 datadog。某些功能仅在某些实现中可用。
STATSD_ENV: StatsD 将在其中运行的环境。如果未显式设置此项，则将根据其他环境变量（例如 RAILS_ENV 或 ENV）确定。该库的行为会有所不同。
- 在 production 和 staging 环境中，该库实际上会发送 UDP 数据包。
- 在 test 环境中，它将吞噬所有调用，但允许您捕获它们以进行测试。请参阅下文有关编写测试的说明。
- 在 development 和所有其他环境中，它会将所有调用写入日志 (StatsD.logger，默认情况下写入 STDOUT)。
STATSD_SAMPLE_RATE: (默认: 1.0) 用于所有指标的默认采样率。这可用于减少此库的使用所产生的网络流量和 CPU 开销。这可以在指标方法调用中被覆盖。
STATSD_PREFIX: 应用于所有指标名称的前缀。这可以在指标方法调用中被覆盖。
STATSD_DEFAULT_TAGS: 要应用于所有指标的逗号分隔的标签列表。（注意：并非所有实现都支持标签。）
STATSD_BUFFER_CAPACITY: (默认: 5000) 在发出线程开始阻塞之前可以缓冲的最大事件数。增加此值可能有助于应用程序生成事件峰值。但是，如果应用程序发出事件的速度快于发送它们的速度，那么增加它将无济于事。如果设置为 0，则将禁用批处理，并且事件将在单独的 UDP 数据包中发送，这会慢得多。
STATSD_FLUSH_INTERVAL: (默认: 1) 已弃用。将此值设置为 0 等同于将 STATSD_BUFFER_CAPACITY 设置为 0。
STATSD_MAX_PACKET_SIZE: (默认: 1472) UDP 数据包的最大大小。如果您的网络配置正确以处理更大的数据包，您可以尝试增加此值以获得更好的性能，但大多数网络无法处理更大的数据包。

StatsD 键

StatsD 键看起来像 “admin.logins.api.success”。点号用作命名空间分隔符。

用法

您可以使用基本方法通过 StatsD 提交统计信息，也可以使用元编程方法来检测您的方法，并提供一些基本统计信息（调用计数、成功和失败以及计时）。

StatsD.measure

让您衡量特定方法执行所花费的时间。

# You can pass a key and a ms value
StatsD.measure('GoogleBase.insert', 2.55)

# or more commonly pass a block that calls your code
StatsD.measure('GoogleBase.insert') do
  GoogleBase.insert(product)
end

StatsD.increment

让您在 statsd 中递增一个键，以保持对某事物的计数。如果指定的键不存在，它将为您创建它。

# increments default to +1
StatsD.increment('GoogleBase.insert')
# you can also specify how much to increment the key by
StatsD.increment('GoogleBase.insert', 10)
# you can also specify a sample rate, so only 1/10 of events
# actually get to statsd. Useful for very high volume data
StatsD.increment('GoogleBase.insert', sample_rate: 0.1)

StatsD.gauge

Gauge 是一个单一的数值，它告诉您系统在某一时间点的状态。一个很好的例子是队列中的消息数量。

StatsD.gauge('GoogleBase.queued', 12, sample_rate: 1.0)

通常，您不应该太频繁地更新此值，因此无需对此类指标进行采样。

StatsD.set

一个集合会跟踪已看到的唯一值的数量。这很适合跟踪唯一访问者的数量。该值可以是字符串。

# Submit the customer ID to the set. It will only be counted if it hasn't been seen before.
StatsD.set('GoogleBase.customers', "12345", sample_rate: 1.0)

因为您正在计算唯一值，所以使用小于 1.0 的采样值可能会导致意想不到的、难以解释的结果。

StatsD.histogram

构建数值的直方图。

StatsD.histogram('Order.value', order.value_in_usd.to_f, tags: { source: 'POS' })

因为您正在计算唯一值，所以使用小于 1.0 的采样值可能会导致意想不到的、难以解释的结果。

注意：这仅受 beta datadog 实现的支持。

StatsD.distribution

一个修改后的 gauge，它会在采样周期内提交值的分布。在数据集上的算术和统计计算（百分位数、平均值等）在服务器端而不是像直方图那样在客户端执行。

StatsD.distribution('shipit.redis_connection', 3)

注意：这仅受 beta datadog 实现的支持。

StatsD.event

一个事件是一个 (标题，文本) 元组，可用于将指标与系统中发生的事情关联起来。这很适合将响应时间变化与新代码的部署相关联。

StatsD.event('shipit.deploy', 'started')

注意：这仅受 datadog 实现的支持。

事件支持其他元数据，例如 date_happened、 hostname、 aggregation_key、 priority、 source_type_name、 alert_type。

StatsD.service_check

一个事件是一个 (check_name, status) 元组，可用于监控您的应用程序所依赖的服务的状态。

StatsD.service_check('shipit.redis_connection', 'ok')

注意：这仅受 datadog 实现的支持。

服务检查支持其他元数据，例如 timestamp、 hostname、 message。

元编程方法

如前所述，最常见的是使用提供的元编程方法。这使您可以在一个文件中定义所有检测，而不会让您的代码充斥着检测细节。您应该通过使用 StatsD::Instrument 类扩展它来为检测启用类。

GoogleBase.extend StatsD::Instrument

然后使用下面提供的方法来检测您类中的方法。

statsd_measure

这将衡量方法运行所需的时间，并将结果提交给给定的键。

GoogleBase.statsd_measure :insert, 'GoogleBase.insert'

statsd_count

即使方法没有完成（即引发异常），也会递增给定的键。

GoogleBase.statsd_count :insert, 'GoogleBase.insert'

请注意，我在衡量此方法时使用了上面的“GoogleBase.insert”键，并且在计算方法调用时在这里重复使用了它。 StatsD 会自动将这两种统计信息分隔到命名空间中，因此此处不会发生键冲突。

statsd_count_if

只有在方法成功执行时，才会递增给定的键。

GoogleBase.statsd_count_if :insert, 'GoogleBase.insert'

因此，现在，如果 GoogleBase#insert 引发异常或返回 false (即 result == false)，我们将不会递增该键。如果您想为给定的方法定义成功意味着什么，您可以传递一个接受该方法结果的块。

GoogleBase.statsd_count_if :insert, 'GoogleBase.insert' do |response|
  response.code == 200
end

在上面的示例中，只有在块的结果返回 true 时，我们才会在 statsd 中递增该键。因此，该方法返回一个 Net::HTTP 响应，并且我们正在检查状态代码。

statsd_count_success

类似于 statsd_count_if，但如果成功，则会递增一个键；如果失败，则会递增另一个键。

GoogleBase.statsd_count_success :insert, 'GoogleBase.insert'

因此，如果此方法执行失败（引发异常或返回 false），我们将递增失败键（“GoogleBase.insert.failure”），否则我们将递增成功键（“GoogleBase.insert.success”）。请注意，我们在将其发送到 statsd 之前正在修改给定的键。

同样，您可以传递一个块来定义成功意味着什么。

GoogleBase.statsd_count_success :insert, 'GoogleBase.insert' do |response|
  response.code == 200
end

检测类方法

您可以使用元编程方法来检测类方法，就像检测实例方法一样。您只需在要检测的类的单例类上配置检测。

AWS::S3::Base.singleton_class.statsd_measure :request, 'S3.request'

动态指标名称

您可以使用 lambda 函数而不是字符串来动态设置指标的名称。 lambda 函数必须接受两个参数：调用该函数的对象和传递的参数数组。

GoogleBase.statsd_count :insert, lambda{|object, args| object.class.to_s.downcase + "." + args.first.to_s + ".insert" }

测试

此库附带一个名为 StatsD::Instrument::Assertions 和 StatsD::Instrument::Matchers 的模块，以帮助你编写测试来验证 StatsD 是否被正确调用。

minitest

class MyTestcase < Minitest::Test
  include StatsD::Instrument::Assertions

  def test_some_metrics
    # This will pass if there is exactly one matching StatsD call
    # it will ignore any other, non matching calls.
    assert_statsd_increment('counter.name', sample_rate: 1.0) do
      StatsD.increment('unrelated') # doesn't match
      StatsD.increment('counter.name', sample_rate: 1.0) # matches
      StatsD.increment('counter.name', sample_rate: 0.1) # doesn't match
    end

    # Set `times` if there will be multiple matches:
    assert_statsd_increment('counter.name', times: 2) do
      StatsD.increment('unrelated') # doesn't match
      StatsD.increment('counter.name', sample_rate: 1.0) # matches
      StatsD.increment('counter.name', sample_rate: 0.1) # matches too
    end
  end

  def test_no_udp_traffic
    # Verifies no StatsD calls occurred at all.
    assert_no_statsd_calls do
      do_some_work
    end

    # Verifies no StatsD calls occurred for the given metric.
    assert_no_statsd_calls('metric_name') do
      do_some_work
    end
  end

  def test_more_complicated_stuff
    # capture_statsd_calls will capture all the StatsD calls in the
    # given block, and returns them as an array. You can then run your
    # own assertions on it.
    metrics = capture_statsd_calls do
      StatsD.increment('mycounter', sample_rate: 0.01)
    end

    assert_equal 1, metrics.length
    assert_equal 'mycounter', metrics[0].name
    assert_equal :c, metrics[0].type
    assert_equal 1, metrics[0].value
    assert_equal 0.01, metrics[0].sample_rate
  end
end

RSpec

RSpec.configure do |config|
  config.include StatsD::Instrument::Matchers
end

RSpec.describe 'Matchers' do
  context 'trigger_statsd_increment' do
    it 'will pass if there is exactly one matching StatsD call' do
      expect { StatsD.increment('counter') }.to trigger_statsd_increment('counter')
    end

    it 'will pass if it matches the correct number of times' do
      expect {
        2.times do
          StatsD.increment('counter')
        end
      }.to trigger_statsd_increment('counter', times: 2)
    end

    it 'will pass if it matches argument' do
      expect {
        StatsD.measure('counter', 0.3001)
      }.to trigger_statsd_measure('counter', value: be_between(0.29, 0.31))
    end

    it 'will pass if there is no matching StatsD call on negative expectation' do
      expect { StatsD.increment('other_counter') }.not_to trigger_statsd_increment('counter')
    end
  end
end

注意

兼容性

该库已在 Ruby 2.3 及更高版本上进行了测试。除了 MRI 之外，我们没有在不同的 Ruby 实现上进行测试，但我们预计它也可以在其他实现上工作。

对 DNS 的依赖

StatsD 开箱即用，设置为通过 UDP 进行单向发送即忘模式。将 StatsD 主机配置为非 IP 地址将触发 DNS 查找（即同步 TCP 往返）。这在具有共享 DNS 基础设施（如 AWS）的云环境中尤其成问题。

使用硬编码的 IP 地址可以避免 DNS 查找，但通常需要部署应用程序才能更改。
在 /etc/hosts 中硬编码 DNS/IP 对允许在不重新部署应用程序的情况下更改 IP，但随着服务器数量的增加，它无法扩展。
安装使用 DNS TTL 的缓存软件（如 nscd）可以避免大多数 DNS 查找，但会使更改的确切时间不确定。

链接

该库是为 shopify.com 开发的，并以 MIT 许可授权。

API 文档
更新日志涵盖了各版本之间的更改。
如果您有兴趣为该库做出贡献，请参阅贡献说明。

statsd-instrument

一个用于 Ruby 应用程序的 StatsD 客户端。提供元编程方法，可以将 StatsD 指标注入到您的代码中。

配置