How I Did That: Within A Frame

(Scroll to the end of this article to check out video footage of the motion-tracking used in this production.)

Noir, written by Kalle Macrides, was first produced by Adhesive Theatre Project at the New York City College of Technology as part of the Entertainment Technology Department's residency program. A multimedia/dance/theatre production, Noir utilizes a computer vision system to enhance the delivery of both prerecorded video content and live video feed. The system, which I helped to devise, uses a collection of borrowed technology and software that can become a very stable and robust platform. This article focuses on the methodology of building the physical hardware system and my exploration of the cv.jit software developed by Jean-Marc Pelletier for the Cycling '74 Max/MSP/Jitter platform. My chief aim was to track the movement of the performers on stage and deliver video content onto moving targets.

My interest in tracking performers on stage did not begin with the production of Noir. I initially wanted to employ this technology with puppetry. I am ultimately interested in creating a technological puppet that would allow a live performer's face to be "adhered" to a moving three-dimensional puppet head. For instance, in a Bunraku-style puppet, the head puppeteer could transfer his own face to the puppet and perform the puppet's role, or one performer in multiplicity could eerily play an army of robots in a production of R.U.R. This technology would allow for myriad possibilities, including a live performer that can be reduced or enlarged to any size, have the body of a fish, or be unfettered by gravity.

The Specific Needs Of Noir
Noir is about a Hollywood filmmaker, Eddy Chandler, who has just been shot. As he rewinds the moments that led to this tragedy, he uses his filmmaking techniques of manipulating time and image to uncover the truth about his would-be murder. Itis both an investigation of the aesthetics of film noir and an exploration of translating cinematic camera techniques for the stage.

I created a set made up of 24 movable frames, some made of steel with either white fabric, colored scrim, or paper within them that could be used as projection surfaces. It was my intention that these frames would be manipulated by the performers, much in the same way a puppeteer would manipulate a puppet to assist the storytelling. For instance, the first scene transitioned from an outside view of a cityscape into the interior of a movie lot. The performers became movie technician characters and moved the frames not only to create the film set, but to manipulate one frame like a boom microphone, another like a grip's light, and yet another like a movie camera that framed the film's actors. There were several moments when the addition of video to these framed surfaces furthered the performers' interaction with these frames and helped to illuminate a particular aspect of the narrative. I'll get to that.

In one scene in a train yard, our protagonist and a quick-witted reporter are pursued by some gangsters and a crooked cop. Our heroes manage to get away by out-running the trains that barrel towards them and jumping onto a boxcar. Of course, it would have been possible to have suggested a train just through the use of the rolling frames. With the addition of some sound effects and lighting, I could have further enhanced this staging. However, since this production was also about film and the exploration of cinematic movement on stage, it was important to create a visual landscape that would reflect this. I wanted to create three moments, with the use of stock footage, where the characters would interact with these trains that both moved within the frame and moved through physical space on stage.

I used this scene as my proof of concept for the system I created for tracking objects on stage. Because of certain constants, I was able to take many of the variables out of the equation and focus on just one of the aspects of the motion-tracking. For instance, the frames were rectangles with a fixed size and shape, so I was able to maintain a two-dimensional plane. Furthermore, because they were on casters and kept the same relationship to the floor, I was able to eliminate the up-down axis.

The system I used was camera-based. The camera became the eye for the computer. The software then analyzed the image for specific parameters and manipulated the video content based on those parameters. For this system to work, the video content had to be able to project onto a much larger surface, the whole stage for instance, so the software could project smaller images within a field of black. The other necessity for this system was that the camera and the projector needed to be calibrated in some way to one another; the camera had to "see" the same area on which the projector threw. This could be done by simply placing the camera and projector at the same position and on the same angle, projecting the blue screen of the projector, and making sure the camera could see all four edges of the blue with no other space in the field. This also required that the camera and the projector be set to the same aspect ratio. It was not necessary for this to be exact, but the closer we could make this calibration, the easier it was to make calculations within the software.

Once the camera and the projector were aligned, we had to figure out what specific parameters we wanted to detect. In a normal theatrical environment, there are stage lights of various colors and intensities, as well as many moving bodies and objects, not to mention that, once you start to deliver projection to the stage, the camera will pick up this information. In order to remove these distractions from my image, I started by using a Sony camera with nightshot mode. Although all cameras can read infrared light, in nightshot mode, the camera has increased sensitivity to the infrared spectrum. I then used a visual-light filter that blocked out all light that was not in the infrared spectrum. Luckily, projectors already remove the infrared light through the glass lens. Since theatrical light does project in the infrared wavelength, I added additional heat shield filters to a few of the lights that I would be using during the tracking, which reduced the IR spill from these lights.

I then had a very limited range of light going to my computer. I had initially purchased 20 $10 infrared flashlights but found that one 10¢ IR LED on a 3V battery was a brighter light source and easily reached the 40' distance from the stage to the camera. Also, a single IR LED lamp was virtually invisible to the audience. I used two lamps and two batteries in each tracking instance for redundancy in case a battery died. By changing the threshold of the camera within Max/MSP/Jitter, I could see a single white dot moving through a field of black, no matter what else was happening on stage—a very easy thing to track.

In 2006, Golan Levin wrote an article, "Computer Vision for Artists and Designers: Pedagogic Tools and Techniques for Novice Programmers" (Journal of Artificial Intelligence and Society, Vol. 20.4. Springer Verlag, 2006). It overviews many of the artists working with computer vision as far back as "Myron Krueger's legendary Videoplace, developed between 1969 and 1975." The article also lays out some of the basic principles of how computers analyze video images through a set of specific algorithms and data sets and offers a solid reference of many, if not all, the software that is being written for the use of computer vision for artists. Needless to say, it was a fundamental starting point for my research.

I use Max/MSP/Jitter primarily because it is taught at Brooklyn College's Performance and Interactive Media Arts Program, which I attended during the development of this production. There are several extension libraries that have been developed for Max/MSP/Jitter, including Eric Singer's Cyclops and David Rokeby's SoftVNS. I chose to explore Jean-Marc Pelletier's cv.jit library primarily because it was the only extension library that was freeware.

CV.Jit.Shift
There are basically two types of analysis of video images that I explored: blobs and mean-shift. Blob searches analyze the image for the biggest object, and you can modify your parameters around that. For example, you might choose to ignore or select that blob. The mean-shift takes the average of one image to see how much change—shift—there is to the next frame. This is also referred to as frame differencing. (These are both oversimplifications of the algorithms and processes.) After trying all of the patches in Pelletier's library, I discovered the cv.jit.shift object could track the white dot the computer was receiving from the camera. With several tweaks of his patch, and with the help of Scott Fitzgerald, I was able to make a very robust and simple solution. I called this system "one-point tracking."

As I mentioned, this was my proof of concept and only one of the vision systems I explored. The second was masking. For this, we used a Luma Key tool. Essentially, it takes the darker or lighter values of an image and makes that value transparent. This allows one to place another image or video into those regions. Since the projection screens themselves were so much brighter than the regions around them, I thought I could change the threshold enough from a camera to get a binary image and project only onto the screens and not the surrounding areas. If, say, one were to project a photo, as the screens moved, it would give the visual effect of opening a smaller window onto that photo that could scan through the image.

Since I was looking for reflected light, I decided to use a regular camera, not one with nightshot mode. This led to all of the issues I mentioned before—changing light intensities, moving performers, the projected video interacting with the computer vision camera, etc.—and, in turn, created some messy, yet interesting, video output. For instance, when performers moved across the stage, trails of video would follow them. Also, the projected video would be read by the camera and be projected again in degradation, ad infinitum, like a feedback loop.

Although these were interesting effects, the images were too muddy for my needs. Furthermore, they were dependent on how much stage light was being bounced back. If the stage was dark, there was no video image. With light on the stage, the projection was stronger, but that same light would wash out the projected images. Luckily, we had two IR stage lights in stock that we were able to hang above the house and flood the stage. By returning to the nightshot camera with the visual light filter, we were able to reduce the video artifacts and use this technique in complete darkness.

Horn and Schunk
Horn and Schunk created an equation for determining optical flow, or the change from one video frame to the next, on a two-dimensional plane. Using this calculation, one can see the difference between values for vertical movement or horizontal movement. In Pelletier's patch, cv.jit.HSflow,this change is represented by a color shift. If there is upward movement, the color is blue; down is yellow, left is green, and right is red. This patch actually inspired the final moments of the play.

When the playwright and I were discussing possible locations for the final scene, we hit upon a movie set for House of Wax, the original 3D version released in 1953 and starring Vincent Price. Although our production of Noir took place in 1949, we could easily justify that production on House of Wax started several years earlier, but due to an unfortunate shooting on the set (as seen in our play), was postponed. It was a fantastically scary setting for the gun shoot-out finale, as well as a jarring visual departure from the black-and-white aesthetic in the rest of the play. I wanted to play with the unsettling feeling of watching a 3D movie without the 3D glasses on. By using a live-feed camera on stage and processing the image through the optical flow horizontal movement filter, we were able to get disjointed trails of red and green.

A Steering Wheel And Eric Singer's Miditron
Another video effect we tried to employ was operated by a performer on stage and communicated to our computer via Eric Singer's wireless Miditron, a device that can accept sensors and send values from the sensors through a radio frequency. Singer suggests that the Miditron can communicate over several hundred feet and through walls. However, we kept losing signal after about 20'. Jared Mezzocchi, who devised this system, had many conversations with Singer, but we could not resolve the issue. We extended the receiver out into the house, but it was still having issues. It wasn't until our first performance that assistant technical director Chuck Eberle suggested we extend the antennae on the Miditron itself. This boosted the range, and we were able to communicate to the computer.

The Miditron was connected to a potentiometer, which was connected to a makeshift steering wheel. As the actress turned the wheel, it would move the video projection on stage to the left or right. We projected a video of the moving background behind a car. The video reacted to the performer's movements.

The One That Got Away: Four-Point Tracking
The system that I have been intermittently working on for the past year is a four-point tracking system. Instead of following one point, the four points would give me the location of the corners of my projection surface and allow me to project an image between those points. This system would allow, with a single camera, tracking of three-dimensional movement of the projection surface, pitch and yaw, as well as rotation. As opposed to the one-point tracking system, the screens could transverse both the depth and height of the stage as well as spin or pivot, all the while still maintaining the video projection.

As I was devising my system, I was also inspired by Johnny Chung Lee's "projector calibration" and Wii remote projects (Lee, Johnny Chung. Johnnylee.net. May 2009, johnnylee.net/projects/thesis). I tried to solicit Johnny to work on this project, but he is doing much more lucrative research for Microsoft.

I did receive a lot of help on this patch first from John Jannone and Jared Mezzocchi and later from Fitzgerald, who really helped push it toward a working model. We started with Pelletier's cv.jit.blobs.centroids patch, which locates blobs (in this case, four IR LEDs placed in a corner of our screen) and labels them so they can be distinguished from each other. We then took the location of each of the four points and drew a polygon from that information. Once we had a polygon, we could have our video output within that shape. Unfortunately, if the software loses a point or gets confused as to the order of the points, the video projected will not be able to maintain its shape and will display in a bowtie configuration, twisting like a helix. I believe this is a problem that could be easily overcome but would require a bit of C++ programming that is beyond my ability.

Another device that could solve some, but not all, of the three-dimensional issues is an accelerometer attached to a frame, wirelessly transmitting data, and used in tandem with the one-point tracking. Depth of field would still be a hurdle requiring an additional solution. Suffice it to say, I was unable to create a stable image in time for the run of Noir and had to use another solution for telling that moment of the story.

Where Do We Go Now?
I intend to continue researching smooth projections onto 3D moving targets. Ultimately, any research in this field is ephemeral at best and possibly pointless. There is probably only a small window of time before any system devised along these lines would be considered an outdated clunky relic. The first generation of mini-projectors, with the footprint the size of a quarter, are already available, and although they do not have the brightness necessary for a theatrical presentation, it is only a matter of time. Other technologies, like flexible displays, might become customizable and cheap enough to use to build a costume or a puppet and allow one to WiFi video content. Hopefully, Johnny Chung Lee will start selling his "calibrated video projector" and solve all of the tracking issues.

Whatever the next technology is that is cheap enough to enter consumer culture, geeky theatre artists like me will tinker, hack, and appropriate. Regardless of the inanity, I will continue to explore the technology at hand and offer my findings to anyone willing to listen, hopefully continuing to make interesting and thoughtful works of art.

Cory Einbinder is the co-director of Adhesive Theatre Project with his wife, Kalle Macrides. In addition to directing and designing, he teaches stagecraft at the New York City College of Technology (CUNY).