Etsy Logo

Code as Craft

Plurals at Etsy main image

Plurals at Etsy

  image

Plurals at Etsy

If you’ve ever been shopping online and encountered a message like “1 items added to cart” or even “1 item(s) added to cart,” you’ve been bitten by a pluralization bug. It might have been a little annoying, but you likely shrugged it off and moved on. An engineer looking at the problem would have no trouble coming up with a one-line fix:

While it would scale poorly if we tried formatting all messages like this, it would work - but only for English. Other languages have very different plural requirements, and with 11 different languages supported on Etsy.com, it becomes very important to have a well-understood system in place to get it right every time.

In addition to standardizing character encodings for the world's writing systems, Unicode provides a set of resources to support translation and localization in what it calls the Common Locale Data Repository, or CLDR. Among many other things the CLDR maintains a comprehensive set of plural rules that define machine-readable ways of handling plurals across languages.These are the rules we rely on here at Etsy, and in this article we'll explore them a little, and show you how we translate with them.

Plural Categories

Plural rules define how to handle plurals of nouns (item vs. items) or unit expressions (hour vs. hours) for a given language. In the CLDR every plural rule has one of three types, depending on the value associated with the unit expression:

  • Cardinals: such as 1, 2, or 3 (e.g., you have 2 items in your cart)
  • Ordinals: such as 1st, 2nd, or 3rd (e.g., this is your 3rd order)
  • Ranges: such as 2-4 or 1-2 feet (e.g., your order will arrive in 1-2 weeks)

Etsy currently only supports cardinals, which are usually the simplest type of plural rule. While this occasionally limits the copy we can use on the site, it does simplify our translation code quite a bit, and we rarely have issues coming up with alternate copy that will work with the cardinal type. For simplicity, cardinals are the only plural rules we'll be talking about from here on.

The CLDR defines a total of six plural categories:

  • Zero: most commonly used when we have exactly 0 items, but may also apply when the number of items ends with a 0 (i.e. 10, 20, 100, …), depending on the language
  • One: most commonly used when we have exactly 1 item, but may also apply when the numbers of items ends with a 1 (i.e. 11, 21, 101), depending on the language
  • Two: most commonly used when we have exactly 2 items, but may also apply when the numbers of items ends with a 2 (i.e. 12, 22, 102), depending on the language
  • Few: used for the least remaining number, based on existing rules (e.g., if a language already has plural categories for “zero”, “one”, and “two”, “few” would apply starting at 3 items)
  • Many: used for the least remaining number, based on existing rules (e.g. if a language already has plural categories for “zero”, “one”, “two”, and “few” - which applies for 3 and 4 items, the “many” category would apply starting at 5 items)
  • Other: used for all numbers not included in the other plural categories

Note that these categories aren't grammatical or linguistic; they exist solely to support translation automation. A given language can use just one plural category, or it might require all six. Let’s take a look at a few examples.

In English, we have two plural categories - “one” and “other”:

  • “One”: You have 1 item in your cart
  • “Other”: You have X items in your cart

Here’s what the CLDR plural categories look like for English: Link

You can see the category definitions in the leftmost column, with some examples in the middle two columns, while the rightmost column defines the rules for each category. Some helpful nomenclature:

  • N refers to the source number, or the number that is associated with the noun or unit expression in the sentence. For example, in the sentence “You have 1 item in your cart,” N is “1.”
  • i refers to the integer digits of N. For example:
    • If N = 1, i = 1
    • If N = 2.1, i = 2
  • v refers to the number of visible fraction digits in N (including trailing zeros). For example:
    • If N = 1, v = 0
    • If N = 1.5, v = 1
    • If N = 1.500, v = 3

You can see the rule for the “one” case is “i = 1 and v = 0” – this means that the rule applies strictly to the case where N = 1. The rules section for the “other” case will always be empty, since it applies to all cases not covered by the other rules.

For a more complex example, here’s what the CLDR plural categories look like for Polish: Link

Some languages like Japanese only use the “other” category, while languages like Irish use all six. Furthermore, the rules for the "few" and "many" categories are defined on a per-language basis. This means that one language might apply the “few” category when the cardinality is 3-10, while another might apply the “few” category for all integers ending in 2.

So how do we take this data and convert it into something that we can integrate into our translation infrastructure?

Plural Category Rule Formatting

The exact origin of how we format plural category rules at Etsy has been lost to time, but it seems to borrow from the Unicode Locale Data Markup Language (LDML), which defines an XML format for providing machine-readable locale data. What we ended up with is something that uses pretty familiar operators and somewhat resembles a regex. The common operands that we use are:

  • ^ – separates rule definitions for different plural categories
  • ‘’ – (empty string) represents the “other” or “else” category
  • % – modulo
  • | – OR
  • ~ – NOT

The best way to illustrate how we use the CLDR plural categories and these operands to define a plural rule is with some examples.

Japanese

Link

Japanese has just one plural category: “other”. So its plural rule is defined by the empty string.

English

Link

English, as we've noted, has two plural categories: “one” and “other”. We know what the “other” category looks like already (the empty string), and the “one” category is also pretty simple – it's just "1". So our plural rule for English is a simple two-letter string: 1^

French

Link

French has three plural categories: “one”, “many”, and “other”, but we don’t currently support the “many” category since it only applies to very large cardinalities. The “one” category for French is slightly different from what we saw for English. The “one” rule is "i=0,1". We can write this as “0,1”, and then appending the rule for the “other” case we derive our French formatting string: 0,1^

Polish

Link

A more challenging example is Polish, with four plural categories: “one”, “few”, “many”, and “other.” The “one” and the “other” rules we already know how to form, so let’s consider “few.” The CLDR data tells us that the “few” category covers cases where i % 10 == 2..4 (that’s an inclusive range, so it means 2, 3, or 4) and i % 100 != 12..14.

Breaking this down:

  • i % 10 = 2..4: to write this as a plural rule, we start with the range (2..4), then add the modulus (2..4%10)
  • i % 100 != 12..14: similar to the above rule, but with a !, so we can extrapolate to come up with ~(12..14%100)

Combining them gives us the rule for the “few” category: 2-4%10~(12-14%100)

Now let’s look at the “many” category: the rules say i != 1 and i %10 = 0…1 OR i %10 = 5..9 OR i % 100 = 12..14. Breaking this down:

  • i != 1: this is technically already covered in the rule for the “one” case, so we don’t need to address it here (though for posterity, it would look like this: ~1)
  • i % 10 = 0..1: gets written as 0..1%10
  • i % 10 = 5..9: gets written as 5..9%10
  • i % 100 = 12..14: gets written as 12..14%100

Combining them, with the appropriate joining operators, gives us a complete rule for the “many” case: 5-9%10|12-14%100|0-1%10

Putting it all together gives us the final (big honking) plural rule string for Polish:

1^2-4%10~(12-14%100)^5-9%10|12-14%100|0-1%10^


Tagging Plural Strings

So now we can build our machine-readable plural rules. How do we make sure our engineers specify the strings in a way that allows us to use these rules effectively? At Etsy, when we add translatable content to our codebase, we call it “message tagging.” Engineers need to specify at least 3 components when tagging a message:

  • Content: The actual message; e.g. "Hello World!"
  • Description: Provides translators with a little context; e.g., "Site welcome message"
  • Project: Helps us organize messages into related groups, both for our internal tools and for our translators.

In a PHP class, we tag messages in static arrays called message catalogs, like so:


Note: the array key is strictly for the purpose of accessibility when using message catalog entries, and is not part of the translation system.

Tagging plural messages is a little more complicated. For starters, we’re writing our translatable content in English, so we need to specify the 2 different plural categories that are supported in English: "one" and "other". Let’s go back to the "N item(s) in your cart" example we've already used.

We could tag the singular and plural forms of the cart message independently, and that would be mostly okay. But without some formal indication that the two messages are related (i.e. they’re being used in the same place on the site), translators might end up handling them differently. There might be more than one appropriate translation for the word "item" or the word "cart," and we could end up with a discrepancy between the "one" and the "other" messages. It’s likely our international users would still understand the translation, but it would be a less well localized experience for them. Imagine if you were shopping on a website and they kept swapping between using the word “soda” and “pop” – wouldn’t that be a little annoying? In order to avoid this situation, we specify a plural type message that differs from a singular message in two key ways:

  • Content: This field is now an array, with 2 options specified
    • 1 – this is the "one" plural category
      • – this is the "other" plural category
  • Nature: By default, the value of this field is "single," but for plural messages, we mark it as "pluralize"

Here’s what a plural message would look like in our PHP message catalog:


Using this format, we can tag plural messages effectively and provide our translators with the additional context they need to offer a robust translation for all plural options.

Multiplexing/Demultiplexing for Translations

So now we know how to build plural rules, and how to tag plural messages, but so far we’ve only seen the English plural categories that are defined when source content is added to our codebase. How does this translate into supporting the plural categories for other languages? To understand that we need to do a quick review of the translation workflow at Etsy.

  1. First things first, translatable content is added to our codebase (aka message tagging).
  2. Next, our extraction process (which runs nightly) parses the codebase for these tagged messages, organizes them into logical chunks, and uploads them to our third-party translation service.
  3. Once the content is uploaded to XTM, our translators get to work.
  4. When the translations are completed, we download them via our continuously running dump process.

Our plural multiplexer comes into play in step #2 – the extraction process. Whenever it encounters a plural message, the multiplexer takes the two plural categories defined for the English source content, parses the plural rules for each of our supported languages, and outputs a messages array with an entry for each plural category we need to support.

Figure 1: The multiplexing process.
Figure 1: The multiplexing process.


Multiplexing doesn’t involve any kind of translating itself – we’re just setting things up so that the translators have space for their translations in all the possible plural categories.

When we download the translations for this string during the dump process, they might look something like this:

Demultiplexing happens when we attempt to fetch the translated content to be displayed on the page – we request the tagged message and provide the cardinality (e.g., how many items are in the cart), and our translation infrastructure automatically selects the correct version of the string based on our plural rules.

Summary

While it may seem trivial at first glance – does an "s" go at the end of this word or not? – a lot of thought and a lot of care goes into handling plurals at Etsy. By using a well-defined standard, converting its rules into machine-readable formatting, and making that the basis for our workflows, we simplify the everyday work of our engineers and translators and help set them up for success. The goal, as with all our localization efforts, is an Etsy experience that is equally seamless and welcoming for all our buyers and sellers, both in the US and abroad.