Here is a extended documentation on my installation. Thanks to everyone involvedRead More
Modified the Python script to allow multiple image capture, here’s an example with 3 cctv’sRead More
I found a workaround for “streaming webcams into PD gem, a Python script downloads an image every second, and the same time the PD patch refreshes the image every second. It’s not smooth framerate but it’s live.Read More
The main work relating to the portfolio is Rear Window, as o homage to Hitchcock’s film, which aesthetically influenced my work. It is a POV experience in voyeurism; the user is looking at the back yard of an apartment building trying look through the seemingly opaque windows. The footage behind the opaque windows is sourced from insecure CCTV cameras looking into
people’s own homes, revealing the Trojan Horse of using surveillance to boost security.
Scouting for a suitable location to film was time-consuming, in the end several videos were shot in different locations, chosen according to my daily routines, a POV out my window. As discussed in chapter 6, the need for engaging content prompted testing with several different scenes.
A continuous long take video from a static point of view is playing as main footage but only a relative small crop is usually seen; the users can control, with body movement, which part of the whole image is played on screen.
A Kinect camera tracks the movements on the x, y, z axis and this data is used to control
the panning, tilting and zooming in on the footage in PD. Movement on x axis controls the panning, y axis the tilting and z axis the zooming. When zooming close enough on the opaque windows, the data coming from the z axis controls the alpha blending with the footage underneath, and the more close the more of the footage is revealed. The PD patch uses the the ‘pix_freenect’ and ‘pix_openni’ PD externals developed by Matthias Kronlachner (available at https://github.com/kronihias ) The ‘pix_crop’ PD object was used for controlling the crop size but unfortunately had issues with layered videos of different sizes, mainly because for ‘pix_crop’ x0,y0 is bottom left corner whereas x0,y0 in GEM window is the centre the canvas; so instead the ‘translateXYZ’ object was preferred.
Three main problems needed to be overcome: optimising video for PD, smoothing and scaling the data coming from Kinect and implementing multi-user interaction.
Working with HD footage in PD was always going to be tricky. Finding a balance between resolution, compression and size was going to be the key to a smooth playback. Various tests were carried out testing all three issues. The first thing I look onto was compression, using a Mac Mini. The H.264 codec provides higher compression and smaller file sizes at the expense of CPU usage; next the Apple Prores codec was tested, a very popular choice for NLE editing. File sizes were significantly larger but the compression was far less CPU demanding. Out of the several flavours of Prores, the 442LT proved to offer the best balance between rise and playback smoothness. The PD GEM window could handle smooth playback at 720p resolution at 25FPS. Upscaling to 1080p produced resulted in less smooth playback at around 16-18FPS. Adding the CCTV footage behind the windows led to significant drop in playback smoothness, and general responsiveness of the patch. M. Kronlachner suggests that for “complicated applications performance problems using PD may occur. A lower level language like openFrameworks or Cinder could be a solution for CPU intensive applications” Kronlachner (2013, p39). An additional option could be using the ‘gemframebuffer’ object by rendering the small videos into a buffer. A drastic measure, but one which could improve the CPU performance would be using a photograph (pix_image) rather than video.
The next problem needed to be solved was smoothing and scaling the data stream coming from Kinect. ‘Pix_openni’ can output real world coordinates or normalised from -1 to 1. The normalised values seems more compatible with the ‘autoscale’ object. Smoothing was done using the ‘line’ object.
Another problem relating to communication between PD and Kinect was the case when one user would get out of the range of the depth camera, ‘pix_openni’ would send a 0.5 value for each axis. This was solved with ‘change’ object, which will still output the last value before the user was lost.
Implementing multiple user interaction was problematic due to the nature of how the content relates to theoretical concerns. The original film tells the story from one point of view, and the whole set design was build around this idea. I struggled finding rationale for adding several users other that adding a multiplier in controlling the transparency (z data from user can only go to 0.3 and adding z data from two other users will get to 0.9 almost total transparency), thus encouraging some kind of collaboration between users in order to fully reveal the CCTV footage behind the window.
Constable: The Great Landscapes X-Ray Examination Tate Britain August 2006
The X-Ray installation is a life size projection of “Salisbury Cathedral from the Meadows (1831) and consists of layered video of the painting with an X-Ray examination of the same painting. Movement of the audience in front of the painting trigger the revelation of the X-Ray under-layer. It was designed for a multiuser experience, the more were in front of the painting the more of the X-Ray layer was revealed. The interaction was simple and engaging and that was the basis of the success of the piece: “The success of the X-Ray Examination primary arises from the intuitive form of engagement facilitated by the gestural interface.“(Lehn, Hindmarsh 2007, 1486)
Here’s a look at several media and artworks that inspired the projects in one way or another according to the three main concerns: theoretical, aesthetic and technical.
Alex May Shadows of Light
Alex may’s Shadows of Light video installation, uses Kinect and Processing to create slow silhouette portraits of audience members thus: “explores the concept of ‘slow interaction’: rather than responding to quick movement, it requires viewers to slow down and stand still, when it will slowly take the viewers silhouette and uses it as a digital stencil” (May, 2009 online) I enjoyed the idea of slow interaction, as it denies the instant gratification that seems so pervasive in our culture.
Kyle McDonald Exhausting a crowd
McDonald describes his work as it “speaks to the potential of a perfectly automated future of surveillance, enabled by a distributed combination of machine and human intelligence. A beautiful record of the energy present in shared space, as well as a disturbing look into the potential for control in a dystopian environment.”(McDonald 2015, online) It is a 12 hour long real time video of Piccadilly Circus in London. The user has the ability to zoom in the footage at any point, and ad their own comments, as an exercise in guessing and interpreting the actions of the people in the footage. Although shot in 4K it certainly has the look of CCTV footage, especially when zoomed in. It feel like the digital media version of John Smith’s film “Girl with chewing gum”.
A. Hitchcock Rear Window
Film critic Roger Ebert describes the main protagonist in the film as he is “is trapped in a wheelchair, and we’re trapped, too–trapped inside his point of view, inside his lack of freedom and his limited options. When he passes his long days and nights by shamelessly maintaining a secret watch on his neighbours, we share his obsession. “ (Ebert, 2000,online)
One thing that in common with the film’s protagonist is my background in photography. Generally in Hollywood films, the representation of photographers is negative, ranging from voyeurs to murderers. BlackLab’s video “Photographers” (Vimeo, online) is a collage of clips from various films and it doesn’t paint a flattering picture. In Rear Window the photographer is a voyeur, and the whole film is presented from his point of view, and the audience shares it as he uses the camera’s telephoto lens to spy on his neighbours. Ebert goes on saying “It’s wrong, we know, to spy on others, but after all, aren’t we always voyeurs when we go to the movies? Here’s a film about a man who does on the screen what we do in the audience–look through a lens at the private lives of strangers.” (Ebert, 2000 online).
Further testing with Kinect and PD. I used two participants, user tracking with pix_openni and expr object to control a synth. My approach is to start simple, test thoroughly, then try more complicated setups. Unfortunately due to the narrow space, moment was constricted especially on the x scale, for future tests I will look for a larger space and maybe three participants.Read More
OpenNI supports the output of 24 different joints. The NiTE middleware skeleton tracking supports just 15 joints. The skeleton output of will therefore have additional joints with duplicated coordinates.
Here’s a list of all the joints available for tracking:
1 /skeleton/joint/head 5 0.376254 0.158162 1.31012 1
6 /skeleton/joint/l_shoulder 5 0.442317 0.298091 1.39435 1
10 /skeleton/joint/l_fingertip 5 0.502907 0.580862 1.37264 0 (duplicate! not valid)
11 /skeleton/joint/r_collar 5 0.502907 0.580862 1.37264 0 (duplicate! not valid) 12 /skeleton/joint/r_shoulder 5 0.316621 0.302097 1.31258 1
16 /skeleton/joint/r_fingertip 5 0.243468 0.58301 1.26445 0 (duplicate! not valid
22 /skeleton/joint/r_knee 5 0.3335 0.777346 1.32825 1
24 /skeleton/joint/r_foot 5 0.348461 0.954826 1.55574 1
M. Kronlachner, „The Kinect distance sensor as human-machine-interface in audio-visual art projects“, project report, Institute of Electronic Music and Acoustics, University of Music and Performing Arts, Graz, Austria, January 2013.
Building on previous research, I found this article on posture tracking and emotion recognition, using a Kinect depth camera. This may solve the problem identified previously, eliminating the need for direct connection between user and computer through a pulse sensor or GSR sensor, allowing for a more natural interaction.
“Intelligent User Interfaces can benefit from having knowledge on the user’s emotion. However, current implementations to detect affective states, are often constraining the user’s freedom of movement by instrumenting her with sensors. This prevents affective computing from being deployed in naturalistic and ubiquitous computing contexts. ”
“In this paper, we present a novel system called mASqUE, which uses a set of association rules to infer someone’s affective state from their body postures. This is done without any user instrumentation and using off-the-shelf and non-expensive commodity hardware: a depth camera tracks the body posture of the users and their postures are also used as an indicator of their openness. By combining the posture information with physiological sensors measurements we were able to mine a set of association rules relating postures to affective states. ”
“An analysis of the user evaluation showed that mASqUE is suitable for deployment in ubiquitous computing environments as its rich, extensive range of emotion representations (i.e. affective states) is able to inform intelligent user interfaces about the user’s emotion. This is especially important for evaluating user experience in ubiquitous computing environments because the spontaneous affective response of the user can be determined during the process of interaction in real-time, not the outcome of verbal conversation. ”
Chiew Seng Sean Tan, Johannes Schöning, Kris Luyten, and Karin Coninx. 2013. Informing intelligent user interfaces by inferring affective states from body postures in ubiquitous computing environments. In Proceedings of the 2013 international conference on Intelligent user interfaces (IUI ’13). ACM, New York, NY, USA, 235-246. DOI=10.1145/2449396.2449427 http://doi.acm.org/10.1145/2449396.2449427