Binh Nguyen's Blog: More Automated Research/Analysis

Wednesday, October 31, 2012

More Automated Research/Analysis

I've been examining the 'Automated Research/Analysis' concept further (especially in the context of real time analysis) so that I can design/build a prototype. One of the problems that was faced is the actual filtering of the actual content an underlying file format itself. As I've seen myself on a number of occassions depending on the format recovering the original data can be difficult if not impossible depending on how the developers created it (possibly compressed, packed, encrypted, has a proprietary format, or something else strange was done with it...) Moreover, depending on the nature of the programs data corruption or just mis-alignment (better word may be inconsistency)(through inadequate collection of data, incompetence, inadequte survey design, etc...) of data is possible. In the context of both quantitative and qualititive analysis 'outliers' are always a possibility but as the field of statistics is reasonably well explored (and predictive analysis)(although I admit it's still very much evolving) so I don't see too much of a problem in most circumstances (Still fleshing this out... My thoughts may change after seeing more data.).

Data collection can be any means. I've been playing around with speech recognition technology for a long while and it has come a long way (look at what they've done with Echelon), video surveillance technology has made significant strides (there are a group of researchers in Adelaide who are working on technology that allows you to scan and track a target in real time across multiple cameras though there are still some criticisms of facial recognition technology), and oscillators and data acquisition cards can be had for less than four/three figures from any number of sources and electronic stores.

http://en.wikipedia.org/wiki/Echelon_(signals_intelligence)
http://en.wikipedia.org/wiki/Onyx_(interception_system)
http://en.wikipedia.org/wiki/Frenchelon
http://en.wikipedia.org/wiki/ADVISE
http://arstechnica.com/security/2007/09/the-demise-of-advise-dhs-data-mine-boarded-up/
http://www.fas.org/irp/world/russia/soud/index.html
http://www.cvni.net/radio/nsnl/nsnl021/nsnl21ewar.html
http://en.wikipedia.org/wiki/Facial_recognition_system#Criticisms

After that it's a question of extracting useable/maleable data from the digital representation of the physical phenomena. One part of this may involve an intermediate data format. For instance, in the context of search engines data is often converted to a text format or otherwise a performance optimised, resilient, binary format that allows you to determine whether language in two text strings have similar taxonomy/meaning/context (whether across languages or inside a single language). In the context of facial recognition and images, you may use particular landmarks, shapes, and ratios to determine whether you may have something of interest...

http://en.wikipedia.org/wiki/Data_mining
https://en.wikipedia.org/wiki/Stemming
http://en.wikipedia.org/wiki/Conflation
http://en.wikipedia.org/wiki/Facial_recognition_system#Traditional

From the quantitative (even the qualititive side as well if we factor in taxonomy and semantic variations) analysis side a lot of the concepts that we require are already here. Determing relationships between sets of data is something that is done manually at high school level via algebra with more complex analysis of curves via calculus done at later high school and University level (complex graphing calculators were often used which could automatically define limited relationships between sets of data).

http://en.wikipedia.org/wiki/Student%27s_t-test
http://vassarstats.net/textbook/ch11pt1.html
http://en.wikipedia.org/wiki/Analysis_of_variance

http://en.wikipedia.org/wiki/Linear_algebra
http://en.wikipedia.org/wiki/Quadratic_equation
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29

http://en.wikipedia.org/wiki/Fourier_transform
http://en.wikipedia.org/wiki/Fast_Fourier_transform
http://en.wikipedia.org/wiki/Fourier_series

Even if it isn't possible to determine a theory which works for the entire range of data it may still be possible to put together theories in series which include boundaries on which the data doesn't 'quite correlate'. For example, in Physics there is something known as the, 'Grand Unified Theory' which is a theory which attempts to model supposedly independent interactions, symmetries, and coupling constants into a single theory.

http://en.wikipedia.org/wiki/Grand_Unified_Theory
http://certifiedwaif.livejournal.com/389422.html

From this base it should be clear that we can lift the base and use it to work on all sorts of automated forms of analysis and research.

If we look at law enforcement/surveillance we have a history of real time facial recognition systems (which have had their fair share of criticism). But if we think about this further we don't need necessarily require real time analysis nor perfect facial recognition (I'm thinking about automated crime reporting rather than tracking people). If we are able to capture particular movements (literally and figuratively) then we can have a general idea of where a suspect is and what crime they have committed. For instance, if we look at the human body and examine a punching movement we have an arm (which is generally about 1/2 to 1/3 the length of the body from the tip of the hand to the top of the shoulder, has a hand that will generally be flesh coloured) which is moving at certain critical velocities with reference to the body (a punching movement will generally go up or across. Clothes are generally of a single colour and of very similar shapes which allow you to distinguish the body while the head is generally uncovered which will allow you to correctly identify most people (unless they are nude though there are laws against that (or they are wearing flesh covered clothing)) and in reference to the target body. Then it's simply a matter of periodically watching for specific relationships to show up particular data sets. For instance, if the punching movement was determined the by equation y = jx^2 + b + a^5cbz^3 and we found this particular relationship showing up in multiple points in our data set then we can be fairly sure that this particular event occurred. Obviously, we can extend the concept further to allow for unique equations that can represent other actions as well. As Quantum and High Performance Computing technology progresses the possiblity of real time analysis and a machine which roughly replicates 'The Machine' from 'Person of Interest' quickly becomes a reality. All you have to do is integrate surveillance, GPS, and time based information and you would be away. Time to think some more and flesh out other details....

http://www.businessweek.com/articles/2012-09-13/watch-out-google-facebook-s-social-search-is-coming#r=shared

- as usual thanks to all of the individuals and groups who purchase and use my goods and services
http://sites.google.com/site/dtbnguyen/
http://dtbnguyen.blogspot.com.au/

Binh Nguyen's Blog

000webhost

Wednesday, October 31, 2012

More Automated Research/Analysis

Empathetic Personalities, Random Stuff, and More