Byte Friendly

Random thoughts on programming and related topics.

How to Work With Large YAML Files and Not Go Crazy

| Comments

Large Rails apps have large locale .yml files. Some have files so large that it is not feasible to simply open them in editor and work with them. Sure, you can edit them just fine, but you can’t efficiently search them.

Say, you’re implementing a new page. It has “Balance” (for example) field on it. You think: “Hm, we have several other pages with this field. Surely, it had been I18n’d before. Let’s look in locale files”. You open 10 kLOC en-US.yml file, start searching and find 15 entries of “balance:” string, with varying levels of indentation. What are the full names of these keys? You have no idea.

I googled for quite a while and, to my surprise, haven’t found a yaml browser with search, which would show me FQN of a key. Here’s my little script that I wrote to help me in navigating these ginormous files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#! /usr/bin/env ruby

require 'yaml'
require 'colorize'

filename = ARGV[0]
pattern_text = ARGV[1]

unless filename && pattern_text
  puts "Usage: grep_yaml.rb filename pattern"
  exit(1)
end

pattern = Regexp.new(pattern_text, :nocase)
p pattern

hash = YAML.load_file(filename)

def recurse(obj, pattern, current_path = [], &block)
  case obj
  when String
    path = current_path.join('.')
    if obj =~ pattern || path =~ pattern
      yield [path, obj]
    end
  when Hash
    obj.each do |k, v|
      recurse(v, pattern, current_path + [k], &block)
    end
  end
end

recurse(hash, pattern) do |path, value|
  line = "#{path}:\t#{value}"
  line = line.gsub(pattern) {|match| match.green }
  puts line
end

Example usage (grep one of locale files of Discourse project):

Much more comfortable, isn’t it?

The source is uploaded as gist, for your copying/forking convenience. I should probably make a proper rubygem out of it. But, you know, when the immediate problem is solved, there are always more important things to work on. :/

Hope you find it useful.

Comments