The Google Tokyo Tree

This was a project from my time at Thinkwell Group. It was the project I worked on that was probably the one that mixed the most interesting technologies: Unity, C#, C++, OpenCV, Arduino, OSC, Beaglebone (not Raspberry Pi), ultrasonic sensor, infrared cameras and infrared light projectors, all to create one seamless experience.

The goal of the task

The given user journey was for a tree that dispensed knowledge. The user was to be able to stand in front of the tree and ask questions and the tree would answer. We knew that guests would come as a group. Immediately, I thought, if the guests would come as a group, it would be a pleasant feeling to simulate an airflow and that Unity, with its physics simulation, should be able to do that. It ended up being one of the defining features of our experience.

The tree went through many variations and a lot of testing. It is, as far as I know, currently it the offices of Google Tokyo. This is a description of what the final version was as late as 2016.

Asking

When the guests approach the tree, it is just a stylized tree, with ambient sound and music (written by Thadeus Reed) playing. The wall is shaped like the 3D image itself. There is a shape projected onto the floor inviting guests to come forward in a nonverbal way. This is an image of the tree with testing lines in the Tokyo building:

Here’s a wider angled view, showing the lobby itself:

When a guests stands the project circle, they are detected by a ultrasonic sensor mounted on an Arduino board. The sensor just communicates how far the object in front of it is. Usually, it’s the floor, but when someone steps on the circle, the distance gets much shorter. The Arduino board is connected to a Beaglebone system-on-chip computer that runs an extremely simple program that just forwards this information via OSC to Unity. When Unity detects that value has gotten beneath a given threshold, it can safely assume that someone is standing on the circle and the shape of the projected circle changes. Then they are explicitly asked on the wall in front of them: “Please tell me what image you want to see”, either in English or in Japanese. This eventually got changed to sequence of more specific questions to keep the guests engaged. There’s also a logo that explicitly tells them that the tree is in audio capture mode or in effect, listening.

Seamless input, to an extent

Then the guest speaks to the tree. That ended up being a lot more trickier than we expected. Among the many unexpected situations were:

  • How should the system react if the guest steps away before the search is complete? (This happened almost every time, even though we assumed it wouldn’t.)
  • What should happen if the guest changes their mind in the middle of a query?
  • What should happen if the guest does not understand what is expected of them?
  • What should be done if none of the guests step up to the tree? (Should someone from the building invite them to?)

When the guest does make their query, the words are transcribed to text in real time thanks to a Google speech recognition system. We had a hard time testing it in Los Angeles because it was surprisingly difficult to find Japanese speakers. When we did get to Tokyo, the microphone installation that we had used that was chosen to be completely invisible had an echo. I remember it was a big problem, but since I’m not an audio person, I don’t remember all the details.

Here’s a tester making a request to the tree. Notice how the pattern on the floor has changed:

What I do remember is that the guests were split into four categories: male English speaking guests, female English speaking guests, male Japanese speaking guests, and finally female Japanese-speaking guests. It was very difficult for the system to understand the last of those. Women Japanese speakers did not always get their words recognized. Of course, they were category that had been the least tested in Los Angeles.

Answers

Once a request is made, it is sent to the Google Images API via Unity. Some of leaves change their textures to the top ten results. The leaves would always flutter (not really, more about that later) in the airflow caused by the guests walking by them, but from that point on, if the guests walk by them too quickly, the leaves can actually fall down and flutter downwards, still influenced by the airflow of the guests’ motion.

Here is the tree in Tokyo showing flowers:

The original user journey had the guests being able to kick the leaves when they were on the ground. Unfortunately, that was not possible. the problem was that the leaves projected on ground came from projectors on the ceiling and so when guests leaned forward to look at them, the leaves were hidden by the guests shadow. Also, our system was not good at detecting specific motions of depth.

Here’s Ellie from Google demonstrating the query feature.

Airflow

When the leaves have changed to images, the modified leaves respond to people moving in front of the tree as if a virtual airflow were created by their motion.

Here are testers moving by the tree to make the leaves fall:

Here’s Ellie from Google demonstrating the airflow feature.

To do this, the wall is filmed with an infrared camera. The camera has to be infrared, otherwise the projection itself would be filmed, creating an infinite loop. With an infrared camera, we project infrared light and we only see the guests themselves and their motion. We then feed these images to OpenCV and analyze the with the Farneback optical flow algorithm using C++. This produces a vector matrix that tells us exactly how much the content of each image has shifted between two frames, in other words, what the airflow is. This is then sent to Unity using C#’s marshalling abilities, basically, sharing a C array between C# and C++. We then adapt the raw data in the way explained by Ellie above so that the leaves respond to the motion in a more cartoony way. If we just followed real-life physics, the leaves would barely respond to our motion at all.

Flutter

Before falling down, the leaves flutter as the guests move by. I could never get the leaves to flutter correctly. Then, a teammate had a better idea: just move the leaves’ normals around. It worked extremely well. That made the light bounce around the leaves as if they were fluttering without having to find the right motion. The edges didn’t move, but in this case, that was a good thing because they needed to keep their overall assembled shape. It was an elegantly simple solution to what could have been a very messy problem.

3D wall

It shows that the wall the tree was projected was shaped to look like the wall of the projection. This was a nice idea in theory. But in practice, it means that the projection and the physical wall would always shift away from each other after a short while in real life. To keep them lined up, a maintenance system was set up. This is what it looked like:

Conclusion

The project was a creative challenge and very rewarding. It was definitely an unusual use of the Unity game engine and a mix of technologies that don’t come together often. Still, we did manage to make a smooth, seamless experience in the end, that welcomes everyone at the Google Partner Plex Tokyo.

SEECQUEL at the ETC

Back in my Fall 2012 semester at the Entertainment technology Center at Carnegie Mellon University, I took part in a project that we called SEECQUEL, because it was the second part of the SEEC project, SEEC being the Schrader Environmental Education Center at Oglebay, in West Virginia.

The Goal

Our task was to use an electronic device to encourage children to depend less on their electronic devices. It was an essentially paradoxical task. We used a heavily ruggedised Android tablet. It was a military grade tablet that children could drop on any surface without it breaking. It was expensive and we did not actually test that. The software platform we used was the Unity game engine with Scaleform for the interface. Scaleform allows a Flash layer on top of Unity for the interface.

At the end of tho project, these were our testers who were in fact very respectful of all the hardware that was handed to them.

Augmented Reality before its time

Devices like the Hololens did not exist at the time. But, in the context of the period, what we had built really was augmented reality. We added content to a natural space. We did not just add messages and items with content, we actually had a device that filmed the park itself and added visual context superimposed onto it.

The experience

The object

The offered the guests to become “junior naturalists”, an imaginary title that came with a real, physical insignia carved out of local wood. Since the title corresponds to no real-life rank, we tried to give it emotional importance.

Sharing

The children going through our experience were supposed to share the tablet. But “hand the tablet to player one” was not an acceptable way of formulating that instruction. We decided that each participant was going to touch a “spirit stone”, an actual local Native American traditional artifact, to discover their spirit animal. Then, each participant’s spirit animal would, in turn, be called to take part in the experience.

Our spirit stone

When we tested the experience and I asked the guests how the spirit stone knew which spirit animals was theirs, they said they didn’t know. When I asked them if they would believe me if I said it was random, they were very surprised. For that, I actually credit my script, on which I put a lot of effort.

The script

I wrote the whole script for the experience. While it’s not long, it was very precise. For example, the first version of the spirit animal selection was made while I was away for a couple of days. It was, “You are the hawk”, “You are the stag”, and so forth. That did not seem compelling enough to me. People are not hawks or stags. And why would they be one more than the other? I thought about it and came to the conclusion that the stone should be able to feel a connection between the guests and their spirit animal. That connection should come from a guest’s innate quality. That quality should be general and positive enough that anyone could identify with it. They ended up being (as per the script linked above):

  • Your inner strength channels the spirit of the stag.
  • The wisdom in your heart channels the spirit of the owl.
  • Your clever cunning channels the spirit of the fox.
  • Your persistent nature channels the spirit of the turtle.
  • The glee in your heart channels the spirit of the frog.
  • Your sense of adventure channels the spirit of the hawk.

Everybody hated the frog, but no one could think of another animal that was native to the area.

Similarly, passing the tablet had to be formulated in a different way each time, in order for the action to feel natural and not to feel repetitive:

  • For this activity, pass the tablet to the stag.
  • The owl should hold the tablet for this activity.
  • It’s now the fox’s turn to hold the tablet.
  • The tablet should be held by turtle for this activity.
  • Pass the tablet to the frog for this activity.
  • The tablet should be passed to the hawk for this activity.

Yes, the stag and the frog carry the same instruction in a different order. I hoped that phrasing would still carry enough variety. This replaced a placeholder “It’s the [animal name]’s turn.” That kind of flavor text cannot just inform, it must keep the guest engaged. Ideally, we should have had a different way of phrasing a call to each spirit animal. That being said, the experience as a whole was short enough that one was actually enough. This level of variety gives the impression that the narrator is this mysterious person with an unknown agenda.

We were lucky enough to find a fellow student with an amazing crystalline voice for the narrator. Everything the narrator said sounded ethereal and otherworldly. It added a lot to the abstract quality of that non-diegetic character.

The activities

To launch an activity, the tablet would scan one of five podiums in the park and our avatar, AB Brooks, based on a real-life naturalist, would come up and suggest the activities to the guests. The first ones required active use of the tablet.

The second podium offered guests to measure the angle of the branches of a tree in relation to its trunk. From this, one can deduce how much exposure to the sun the tree has had overall. It was probably the podium that had the most changes in its design during the project’s lifetime.

Unfortunately, I no longer have access to the accompanying interactive interface overlay. It allowed users to interactively position lines on the trunk and the branch and get a reading of the angle.

This activity did not required the use of the tablet at all. It was less high-tech. Still, it is important to remember that enjoying nature without technology was the main goal of this whole experience.

Our trailer

A lot more went on that semester, but those are the highlights of SEECQUEL, along with our video trailer.