Byte Friendly

Random thoughts on programming and related topics.

How to Freeze Capybara Integration/feature Spec

| Comments

I normally don’t write feature specs a lot, because usually the process is a bit of a pain. Especially when you’re trying to cover an existing legacy page with feature specs. Here’s one little trick that made my work a log easier, hopefully it will help you too :)

The problem is that sometimes the spec fails and you get a vague idea why (element by selector is not found or something like that), but by the time an error is printed, the browser window is long gone (you see the browser if you use selenium capybara driver, for example). So you want to pause a test and dig around, find out what goes wrong.

Capybara provides you with a page object which you can use to interact with the browser (click buttons, find elements, etc.), but I found it very inconvenient for quick poking around. Well, turns out (I didn’t know this before) that you can pause a test with something like binding.pry or sleep. Then you can inspect the page under test in the browser which will be left open and not frozen (it’s the separate process, so when main test process sleeps, browser is unaffected). Developer tools of modern browsers are fantastic, so it’s a shame not to use them.

Oh yes, you can also use this technique to do repeating test setup. Say, the business reports a bug to you, you want to test it, but you don’t want to keep a permanent feature spec (they’re dog slow). What you can do is create a temporary test that will set everything up (create three different types of accounts, two projects and what not) and just put a binding.pry at the end. From then you can tail log/test.log and test the site manually. After you’re done with the bug, you just delete the test and commit the changes.

Anyway, I hope this is googlable and will save somebody some time.

How to Make Custom Commands in Atom Extensions

| Comments

A couple of days ago I finally got my invite to Atom editor. A bit late, but I still was excited.

It makes overall positive impression and has some essential things bundled (save on lost focus, trim trailing whitespaces, etc.) But still there are some things lacking. So I decided to check out its praised extensibility.

Again, docs look good and cover some first steps. But I spent a good couple of hours today, trying to invoke my custom command.

1
2
3
4
5
6
7
module.exports =
  activate: (state) ->

  process: ->
    console.log 'doing custom action'
    editor = atom.workspace.getActiveEditor()
    editor.insertText('hello')

process in the listing above is a simple action. It gets current editor and inserts string “hello” at the cursor. I defined an editor shortcut, like this:

1
2
'.editor':
  'cmd-ctrl-shift-e': 'myext:process'

where myext is the name of my extension. But upon pressing the hotkey, nothing happened. A couple of hours later I got the solution. My mistake was in assuming that Atom will somehow discover and dynamically call that exported method process. It’s exported for a reason, right? Well, no. It turns out that you have to bind commands manually.

1
2
3
4
5
6
7
8
module.exports =
  activate: (state) ->
    atom.workspaceView.command "myext:process", => @process()

  process: ->
    console.log 'doing custom action'
    editor = atom.workspace.getActiveEditor()
    editor.insertText('hello')

Next time, we’ll do something useful. :)

How to Work With Large YAML Files and Not Go Crazy

| Comments

Large Rails apps have large locale .yml files. Some have files so large that it is not feasible to simply open them in editor and work with them. Sure, you can edit them just fine, but you can’t efficiently search them.

Say, you’re implementing a new page. It has “Balance” (for example) field on it. You think: “Hm, we have several other pages with this field. Surely, it had been I18n’d before. Let’s look in locale files”. You open 10 kLOC en-US.yml file, start searching and find 15 entries of “balance:” string, with varying levels of indentation. What are the full names of these keys? You have no idea.

I googled for quite a while and, to my surprise, haven’t found a yaml browser with search, which would show me FQN of a key. Here’s my little script that I wrote to help me in navigating these ginormous files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#! /usr/bin/env ruby

require 'yaml'
require 'colorize'

filename = ARGV[0]
pattern_text = ARGV[1]

unless filename && pattern_text
  puts "Usage: grep_yaml.rb filename pattern"
  exit(1)
end

pattern = Regexp.new(pattern_text, :nocase)
p pattern

hash = YAML.load_file(filename)

def recurse(obj, pattern, current_path = [], &block)
  if obj.is_a?(String)
    path = current_path.join('.')
    if obj =~ pattern || path =~ pattern
      yield [path, obj]
    end
  elsif obj.is_a?(Hash)
    obj.each do |k, v|
      recurse(v, pattern, current_path + [k], &block)
    end
  end
end

recurse(hash, pattern) do |path, value|
  line = "#{path}:\t#{value}"
  line = line.gsub(pattern) {|match| match.green }
  puts line
end

Example usage (grep one of locale files of Discourse project):

Much more comfortable, isn’t it?

The source is uploaded as gist, for your copying/forking convenience. I should probably make a proper rubygem out of it. But, you know, when the immediate problem is solved, there are always more important things to work on. :/

Hope you find it useful.

Using OSX Notification Center in Your Programs

| Comments

There is this good feature in OSX, called Notification Center (which Apple may have stolen from Growl, who knows). It has an API, so you can post your events there. If you’re using Xcode and Apple frameworks, then you can stop reading now. However, if you’re programming in, say, ruby, you’re in much less fortunate position. There’s no official gem or library, so you are left on your own.

But don’t despair, we have you covered. There’s a cocoa app called terminal-notifier that serves as a bridge between Notification Center and your app. It is accessible via command line and is quite configurable. There’s also a ruby gem that wraps this tool, but the main value is that it is available in command line. Which means that you can use it from whatever language you want. For example, I use it in my stackoverflow question poller which is written in Go.

Dash - Now With Cheatsheets

| Comments

Just a couple of days ago I blogged about Dash. And now they have released new major version, with (at least one) new feature: cheatsheets. One of the things I just can’t hold in my memory is HTTP status codes. I know 200 and 404, that’s it. All others I have to look up, every single time. And Dash has a cheatsheet for this, so I am covered here.

There is no cheatsheet for Vim, not even on its movements. But good news is: you can create your own: cheatset.

I shall probably create one for TextMate key shortcuts (I should know most of them by now, I guess). We’ll see.

Useful Documentation Lookup Tools

| Comments

Today I just wanted to share some tools that I use to read documentation.

First one is Dash. It’s an offline documentation browser which contains many different topics. It has ruby, rails, CSS, HAML, jQuery, Go, Haskell and everything else you might want. It can also index locally installed rubygems. And did I mention that it is offline? If you’re anything like me, you need to consult documentation every 30 seconds. And with this tool you can even work on a plane! Can’t recommend enough.

Another one is for ruby only, OmniRef. These guys have indexed every ruby gem there is and cross-linked the documentation. On the site you can easily switch between version of a gem (or ruby) and see how documented behaviour and/or code changed over time. It also allows leaving notes (user generated content?), but I don’t see many notes at the moment. Potentially, these comments can be an excellent complement to the official documentation (we see this with, say, MySQL docs).

And, of course, the most important tool for me these days is StackOverflow. More than half of my google searches lead to this chest of collective programming wisdom. The funniest moments are when I struggle with something, then google points me to stackoverflow where I find an answer that I posted myself some time ago. Can you believe this? :)

Limitations of MongoDB

| Comments

MongoDB is becoming even more popular than it is now. More people want to learn about it. So I was preparing a seminar for this company and I had to compile a list of MongoDB limits. I never knew there were so many! Some of them are reasonable, some are weird. Anyway, it’s good to know them. Here’s a list of MongoDB limits as of version 2.4.9:

  • Max document size: 16 MB (we all knew this one, right?)
  • Max document nesting level: 100 (documents inside documents inside documents…)
  • Namespace is limited to ~123 chars (namespace is db_name + collection_name (or index_name))
  • DB name is limited to 64 chars
  • Default .ns file can store about 24000 namespaces (again, a namespace is referring to a collection or an index)
  • If you index some field, that field can’t contain more than 1024 bytes
  • Max 64 indexes per collection
  • Max 31 fields in a compound index
  • fulltext search and geo indexes are mutually exclusive (you can’t use both in the same query)
  • If you set a limit of documents in a capped collection, this limit can’t be more than 2**32. Otherwise, number of documents is unlimited.
  • On linux, one mongod instance can’t store more than 64 TB of data (128 TB without journal)
  • On windows, mongod can’t store more than 4 TB of data (8 TB without journal)
  • Max 12 nodes in a replica set
  • Max 7 voting nodes in a replica set
  • You can’t automatically rollback more than 300 MB of data. If you have more than this, manual invervention is needed.
  • group command doesn’t work in sharded cluster.
  • db.eval() doesn’t work on sharded collections. Works on unsharded, though.
  • $isolated, $snapshot, geoSearch don’t work in a sharded cluster.
  • You can’t refer db object in $where functions.
  • If you want to shard a collection, it must be smaller than 256 GB, or else it will likely fail to shard.
  • Individual (not multi) updates/removes in a sharded cluster must include shard key. Multi versions of these commands may not include shard key.
  • Max 512 bytes for shard key values
  • You can’t change shard key for a collection once it’s sharded.
  • You can’t change value of a shard key of a document.
  • aggregate/$sort produces error if sorting takes more than 10 percent of RAM.
  • You can’t use $or in 2d geo queries
  • You better not use queries with multiple $in parts. If this results in more than 4 million combinations - you get error.
  • Database names are case-sensitive (even on case-insensitive file systems)
  • Forbidden characters in database names: linux - /. “, windows - /. “*<>:|?
  • Forbidden characters in collection names: $ sign, “system.” prefix
  • Forbidden characters in field names: .$
  • Hashed index can’t be unique
  • Max connection number is hardcoded to 20k.

Hope this was useful to you.

Kata: Convert Numbers to Roman Numerals

| Comments

Here is an interesting problem: write a program that converts numbers into Roman numerals. Roman didn’t use Arabic numbers. Instead they used symbols of Latin alphabet that represented different values. It’s a simple system. “I” stands for 1, “V” stands for 5, “X” for 10 and so on. To represent 2 you have to use “II” (1 + 1), to represent 7 you use “VII” (5 + 1 + 1). Simple, right? Well, no. Here’s a twist.

* 4 is "IV" (not "IIII")
* 9 is "IX" (not "VIIII")
* 40 is "XL" (not "XXXX")
* 49 is "XLIX" (not "XXXXVIIII")

Weird, huh? Maybe they didn’t like 4 identical symbols in a row, who knows. Anyway, how can we solve this?

After trying out several “smart” solutions and failing, I’ve come up with this simple solution: store these special cases along with normal ones and keep subtracting from the number until it’s zero.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def romanize(number)
  reductions = {
    1000 => 'M',
    900 => 'CM',

    500 => 'D',
    400 => 'CD',

    100 => 'C',
    90 => 'XC',

    50 => 'L',
    40 => 'XL',

    10 => 'X',
    9 => 'IX',

    5 => 'V',
    4 => 'IV',

    1 => 'I',
  }

  result = ''

  while number > 0
    reductions.each do |n, subst|
      if number / n >= 1 # if number contains at least one of n
        result << subst  # push corresponding symbol to result
        number -= n
        break            # break from each and start it anew 
                         # so that the largest numbers are checked first again.
      end
    end
  end

  result
end

A test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
test_mapping = {
  1 => 'I',
  2 => 'II',
  3 => 'III',
  4 => 'IV',
  5 => 'V',
  6 => 'VI',
  7 => 'VII',
  8 => 'VIII',
  9 => 'IX',
  10 => 'X',
  11 => 'XI',
  12 => 'XII',
  13 => 'XIII',
  14 => 'XIV',
  15 => 'XV',
  19 => 'XIX',
  22 => 'XXII',
  29 => 'XXIX',
  30 => 'XXX',
  40 => 'XL',
  49 => 'XLIX',
  99 => 'XCIX',
  950 => 'CML',
  2014 => 'MMXIV',
}


test_mapping.each do |k, v|
  res = romanize(k)
  sign = res == v ? '.' : 'F'

  puts "#{sign} #{k} => #{res.inspect}"
end

This simple problem took me a little bit over 2 hours to come up with this (simple) solution. Too bad, that often we, programmers, don’t have time to look for simple solutions and we go for the easiest one.

Hope this helps someone.

What Is UTF-8 (for the Dummies)

| Comments

I must admit, I didn’t really understand what Unicode (more specifically, its flavor “UTF-8”) actually is. All I knew was that it’s a good encoding and it’s compatible with ASCII. Beyond that - no clue. This video has made it crystal clear.

Pipeline Processing in Go

| Comments

Pipeline processing is a very powerful design idiom. You have some simple building blocks that you can arrange in different combinations to perform complex tasks. A classic example of that is unix command line. Each tool is very simple. It does only job. But it still amazes me regularly, what you can achieve by simply combining those tools into a pipeline and feeding/piping data through it.

Say, you’re building an RSS reader that shows new posts live. Implementing it in a regular procedural manner is easy. Something like this (pseudo-code):

  loop {
    fetch posts
    for each post {
      if we have not yet seen this post {
        mark post seen
        show it to user
      }
    }
  }

Say, now we want to do a focused reading. Which means that we only want to see a subset of posts which satisfy some arbitrary criteria (filter by tags, etc.). No problemo, we just add one conditional in there.

  loop {
    fetch posts
    for each post {
      if post is interesting {
        if we have not yet seen this post {
          mark post seen
          show it to user
        }
      }
    }
  }

Now it’s getting a little bit difficult to read and it will get worse. All business rules are in the same pile of code and it may be difficult to tell them apart.

But if you think about it, there is a pipeline that consists of several primitive segments:

Each individual block is completely separate from each other (well, maybe except that “select unseen” and “mark all seen” might use the same storage). If we were to, say, remove caching (and stop serving cached content), we’d only have to take out those two segments. If we want to change how content is presented to user (print to terminal, send to text-to-speect engine, …), we only have to replace the final segment. The rest of the pipeline stays untouched. And the aforementioned tag filtering - we just insert it after fetcher:

In the pipeline, each segment can either swallow a message or pass it on (after applying some transformation (optional)). Two simple rules, infinite flexibility.

Go language allows for natural expression of this design: goroutines as segments, connected by channels.

Here’s an intentionally primitive example, which filters and transforms a stream of integers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
package main

import (
  "fmt"
  "time"
)

// type alias for our pipeline segment
// segment is a function that reads from a stream of integers
//  and writes to a stream of integers
type pipelineSegment func(in, out chan int)

func main() {
  // construct our pipeline. Put generator first, then filters/modifiers
  out := makePipeline(generator, onlyOdd, plusOne, squared, plusOne)

  for v := range out {
    fmt.Printf("Resulting value: %d\n", v)
  }
}

// This simply generates sequential integers in infinite loop.
func generator(_, out chan int) {
  i := 0

  for {
    out <- i
    i += 1
    time.Sleep(100 * time.Millisecond)
  }
}

// Filter. Selects only odd integers. Even integers are swallowed.
func onlyOdd(in, out chan int) {
  defer close(out)
  for val := range in {
    if val%2 == 1 {
      out <- val
    }
  }
}

// Modifier. Adds 1 and passes on.
func plusOne(in, out chan int) {
  defer close(out)
  for val := range in {
    out <- val + 1
  }
}

// Modifier. Passes a square of incoming integer.
func squared(in, out chan int) {
  defer close(out)
  for val := range in {
    out <- val * val
  }
}

// Generates pipeline out of individual segments. 
// Returns an "exhaust pipe", from which fully processed integers can be read.
func makePipeline(segments ...pipelineSegment) (out chan int) {
  current_input := make(chan int)
  var current_output chan int

  for _, seg := range segments {
    current_output = make(chan int)
    go seg(current_input, current_output)

    current_input = current_output
  }

  return current_output
}

Produced output:

  Resulting value: 5
  Resulting value: 17
  Resulting value: 37
  Resulting value: 65
  Resulting value: 101
  Resulting value: 145
  Resulting value: 197
  ...

First generated integer is 0. It’s even (well, certainly not odd), so it does not make it past filter.

Next one is 1. It passes filter. Then it gets +1, so it’s now 2. Then it’s squared and becomes 4. And finally, one more +1 which results in 5. There are no more segments, so this value is read from the output pipe and printed to terminal.

Hope this was useful.