Friday, February 21, 2014

YAML - YAML Ain't a Markup Language

Hello Again, Today I come to talk a little about YAML. YAML is a general use data serialization language. Its most common uses are storing configurations, data persistence and online software messaging. As its name states, YAML is not a markup language, and this allows for more readable code.

Example XML code:

Equivalent YAML code:

Very Brief Introduction to YAML

This is a short, yet objective, introduction. By reading this you should be able to create and edit simple YAML files (it's not that hard btw, but takes some minutes to get the hang of the syntax). If you want a complete overview, go to yaml.org and look at their specifications, you'll be amazed by how it can be flexible (or inflexible, sometimes).

YAML, a data serialization language, is designed to be easily readable by humans. It can store any type of data (even in binary form, if you so want).

YAML has three basic structures: scalars, sequences and mappings. A document is formed by several nodes (or objects, if you prefer), that can be a scalar node, that holds information, or a map or sequence node, that hold other nodes. In a comparison to graphs, they can be branch or leaf nodes.
  • Scalars: as the name hints out, are simple values, sometimes with an individual identifying name, sometimes without. Can be string, numbers...

  • Sequences: Sequences are simply a list of nodes without any special identifications. They are easily accessible by index.

  • Mappings: also referred to as hashes and dictionaries, is a structure that allows you to relate identifiers and their informations in a more direct way. These identifiers (also referred to as keys) are usually a simple string name and the information in them can be of any type; as in they can be numbers, strings or even other maps and sequences. It is comparable with an std::map or, in some cases, with std::multimap.

Of course, these can be used together to create more and more complex documents that can be used to store any kind of information. Just don't create a password vault with this, it wouldn't be a good idea.

The language itself has some neat features, such as unique key identifiers (marked by a "? ") and variables marked by the '&' and '*' characters. Having GUIDs is a great way to serialize any types of asset or object data, but I personally don't like YAML's syntax for these unique keys ( actually I hate it, but well, them's the breaks). In that example, I use a sequence (denoted by "[]") what allows me to use, for example [sound, zombie1] or [data, zombie1] later, when I want to store different types of data. This unique sequence key feature is extremely useful.

Using reference variables is also very handy, specially when you may need to change a variable on several items. In this example, if I change the folder where I hold my spritesheets, all I need to do is change that one single line, instead of editing every object that reference that variable (or use a global variable in my engine).

The YAML syntax is quite tricky to get, there are lots of gotchas, and your first 10~20 tries will have invalid syntaxes; worry not! Some syntax gotchas!:
  • No tabs allowed. The blocks in YAML are all defined by indentation, and they banned tabs. More info on this here.
  • White Spaces are meaningful when starting a line and are used to identify blocks (through identation).
  • ": " isn't necessarily ":". While "name: Dejaime" means that name is an scalar node and its value is "Dejaime", name:Dejaime means that it is actually called name:Dejaime with type null. This is so we can have colons inside values like in "time_created: 19:03".
For those who want to try yaml, there's an online syntax checker called yamllint. It validates your text and then spits out a version optimized for ruby, what'll be of no use for us. The important part is just checking the syntax validity.

This should be enough for the article, but if you want to go deeper and into the fancy stuff, dive into the official specification.

yaml-cpp

The library yaml-cpp has two major versions right now, 0.3 and 0.5, both stable enough for use. The version we will be using here is the 0.5 (0.5.1, as of this writing), since it has a new revamped API, one that makes a better use of C++ in my view. It is available under this link: http://code.google.com/p/yaml-cpp/; X11 (MIT) license, so no worries here.

Building yaml-cpp

The compilation is simple, but you'll proably want to use the Boost library. Under linux platforms, to compile boost, you'll only need to run ./bootstrap.sh && ./b2 and it will be built. You may want to issue a sudo ./b2 install so boost is installed in your system (under /usr/local). You can also install boost with apt-get or Synaptic, but you'll get a slightly outdated version. Notice that building the entirety of the boost library can take long, but you can build only the ones you're interested in (I personally always build it all).

After you install boost on your system, yaml-cpp is also just as simple, create a Build folder in yaml-cpp root directory and issue cmake .. && make inside it, and it will be built. If you want, a sudo make install will install yaml-cpp in your system. If you didn't want to install boost, you may need to set the correct path in the boost variables of yaml-cpp CMakeLists file (or use ccmake, or even a cmake gui, if you prefer).

If you have problems or questions on this regard, please refer to their official building guides...

Our Example Problem

We want to load sprites and their definitions, including all possible animations, frame duration, spritesheet location, and anything necessary, from a single data file. We'll be using the following (public domain) spritesheet:
duotone SpriteSheet

Available here: http://opengameart.org/content/dutone-tileset-objects-and-character

We will assume that all frames of an animation are at the same horizontal level, with no border (just like in the sheet above) and each may have independent durations. We'll also assume an animation has a name, a constant size, but can change between more than one animation (and assume their respective sizes).

All right. Now that we have our definitions and assumptions, we need to define our YAML file structure, so we can store the information necessary for the whole sprite. So let's list the basic informations we need to store:

Sprite:
  • Unique Name
  • Spritesheet ID
  • Animation List
  • Animation Name
    • Initial SpriteSheet Offset
    • Animation Size
    • Frame List
      • Frame Duration
Basically, that is all the information we will want in our YAML file. It is a complex map, so let's break it down into how to list it.

The first information we need is actually the identifying name of the sprite (can be an number if you prefer):Notice how I didn't use Name: Player, but simply inserted Player: directly. This means we have something called Player, and not that we have an string called Name and valued Player.

Now we add the spritesheet reference. This can be a numeric ID, the path to the spritesheet file or something else. I will be using the spritesheet file path, as we won't be using any file loader that would handle UIDs (wenewbies:D [or not {w/e}]). This takes us to our next line.

We now have the basic information for the sprite, and need to detail the animations themselves. As animations each have their own names, let us list these names, in our example:These are the four animations for the player sprite, all added in a sequence called Anim_Names, so we can look them up later.

Now that we know, in advance, the names of our sprite's animations, we can map them using their names with no problem!

These animations also have specific informations: their size and their offset in the spritesheet.

The animations also need to know how many frames they have as well as each of these frame's time duration. We'll do the same we did to list the animations names.We have an animation with six frames where all of them have 80 ms duration.

This is the last information we need to add to the animation, and to the sprite itself. What takes us to our Player sprite configuration:

And to add the remaining sprites:

Now we have a small problem. As the entire file is a map, we'll need to know what are the unique names of our sprites (in this case, Player, Monster and Gem). In addition, there's no way to access them by a numeric index. It won't be a problem when we have some sort of level definition, specifying all of its objects and their respective Sprites by name, referencing our sprite definitions file. But even then, this line won't hurt:[code]Sprites_List: [Player, Monster, Gem][/code]

So, this is our definite Sprites.yaml file:


Parsing the File

As we are getting into the code part, I must put a license on it.


CC0
To the extent possible under law, Dejaime Antônio de Oliveira Neto has waived all copyright and related or neighboring rights to YAML-CPP C++ Example Code. This work is published from: Brazil.
The complete header is available here: https://gist.github.com/dejaime/9129611. Now that we have our file, and we understand how it was created, we can go ahead and create our parser. First, we need to define our structure to hold that information inside our code.

Starting with the Sprite itself, this is what I'll use:It has variables to hold the name of the sprite, the filepath to the spritesheet and a vector for the animations. The Sprite::Load(string, string) function takes a name for an animation and a filepath to the .yaml file (not to be confused with the spritesheet). It can be used directly or by calling Sprite::LoadAll (string, vector[Sprite*]), that will create, load and push all sprites in the file into the passed sprite vector.

Our Sprites also depend on different Animations.That are nothing but a fancy struct.

As you can see, all the information necessary for any sprite can be stored in these, if it follows our initial assumptions. If you have special needs, you can just alter it to suit your needs. Maybe moving the spritesheet into the animation to allow the animations to be in independent textures, or even add more information on every frame such as size, to allow an animation to change in size on each frame. Another useful piece of info could be the origin of each frame, like a reference to be used when rendering. You get it, this example is indeed using a minimalist approach.


The Load function


Since we have our basic structure to hold the information, we can now retreive it from the file and store on our objects, in order to use it. The first thing we should do is open our .yaml file. To do this, we need a yaml node, so it can assume the root node of the file. If you don't know what a node is, you can go back to our YAML introduction or look at their specs.

The yaml-cpp library works under the YAML namespace, and has a variable type for an yaml node, the YAML::Node, I guess I didn't really need to say that... Anyway, we need to declare an YAML::Node and assign the root node of our file to it.

Now that we opened the file and have our root node at baseNode, we need to find the node for our sprite. The name of our sprite was passed to us as an argument, and that's what we are going to use. Here is where the library author used C++ operator overload to give us a really nice API, as you'll see:We simply use our string as the index, and it will find the correct node. If it is not found, it will be left as a null node.

So, Sprite found, we can now start to load up our information, starting by the name. Of course, there's no need to look the name up, as we just received it as an argument, but we still need to look the SpriteSheet path.

With the sprite specifics on their place, the next step is to put all animations to the animation vector. Our code will need to know how many animations there are, but that'll be no problem:

Every animation is a node, so we should now retrieve it.

With the animation node retrieved, we can now create the animation and populate it with the information from the file.

The last thing we need to do now is to get the time duration for our animation's frames.

And Voilà! All our animations are now inside our Sprite object!

The most important functions we used here were:

Want to try?

Start by creating a valid yaml file. You can test your syntax at yamllint.com. It will probably take some tries, but after you actually learn it, you will mostly get it right in one try. A single entry like your personal info will do just fine. After that, create a simple structure to hold that information and load it manually, inside your main.cpp directly if you prefer. Then, move the loading procedure into a class that can load the information itself and, lastly, make several entries and load them independently. Move on to more complex documents and you'll master it before you know it.

Thanks to jbeder for the yaml-cpp library! It is so convenient I'm getting lazy.

Over and Out.