Job Market Recovery

February 13, 2009 by Peyman Khorsand

By now, most of the people following the financial crisis closely have seen the following graph (left).

It shows the job recovery during all the financial recessions in US after WWII. As you can see the unemployment curve is symmetric. Looking at the current crisis data, if the historic analysis is valid, we can conclude we are not even half way through. An interesting trend in the graph is that the time to recovery has been increasing. This should be due to the more effective govermental policies to control the crisis and its damage to job market. These policies effectively has flatten and elongated the unemployment curve. To see this trend we plotted the number of months that it took the job market to recover vs the year that recession started (right). The linear fit is not a good fit but if it is of any clue, in the current crisis the job market will take around 36 months to recover.

Thermodynamics of Small Systems

November 22, 2008 by Peyman Khorsand

In a small system fluctuations can not be ignored, experiments can not be repeated. However, it is still desirable to have some understanding over the non-equilibrium behavior of the system.

Control Parameters vs Fluctuation Variables: For small systems the equation of state and the spectrum of fluctuations are fully determined by so-called control parameters.

The total change in the energy of the system has two components

dU = \sum_i {\partial U\over \partial x_i}|_{X} dx_i + \sum_a {\partial U\over \partial X_a}|_{x} dX_a

The first term can be interpretes as internal energy, dQ while the second term can be thought as work dW. For a specific experiment the value of \Delta Q and \Delta W are not reproducible due to the fluctuations in the system, although we can think of their probability distributions.

Probability Distributions: The work and heat probability distributions P(W) and P(Q) characterize the work and heat collected over an infinite number of experiments.

Fluctuation Theorems: The rate at which the system exchange heat with the bath is called the “entropy production“. It is convenient to define \sigma = Q/Tt where t is the interval of time over which the system exchange the heat Q. Associated with the entropy production is a time-dependent probability distribution P_t(\sigma). Under very general condition the following relation holds (Gallavotti and Cohen) for systems in steady states

\lim_{t\rightarrow \infty} {k_B \over t} \ln \left( P_{t}(\sigma) \over P_t(-\sigma) \right) =\sigma

Although the relation holds for infinite limit, it is considered as a good approximation for finite time. Note that the ratio P_t(\sigma)/P_t(-\sigma) for a system in equilibrium with the thermal bath is equal to $1$. This shows that steady state systems are more likely to deliver heat to the bath than it is to absorb heat form bath (Time reversal microscopic laws give birth to non time reversal macroscopic phenomenon, an answer to Loschmidt’s paradox).

The Jarzynski Equality: In a system in contact with thermal bath at temperature T, \Delta G the difference in free energy between equilibrium state A and B with x_A and x_B control parameter.

\exp \left(- {\delta G \over k_B T}\right)=\langle \exp \left(- {W \over k_B T}\right) \rangle,

the average being taken over infinite number of non-equilibrium processes.

The Crooks Fluctuation Theorem:

Non-Gaussian in the heat pobability distribution:

Generalized JE to arbitrary transition between non-equilibrium steady states:

for more information Ref to “The Nonequilibrium Thermodynamics of Small Systems”, by C. Bustamante, J. Liphardt and F. Ritort

Foam

November 7, 2008 by Peyman Khorsand

It is interesting to have a statistical description of foam. In doing so few simple characteristics may come to mind,

Network Properties:

A network can be associated to a foam structure by replacing every cell(bubble) with a node and attach these nodes to their physical neighbors. The statistical properties of this network can be interesting.

Bubble size Distribution:

Various empirical formulae have been proposed by fitting data, including

P(r) \propto {R \over (1+\beta R^2)^4}

P(r) \propto {R^2 e^{-\beta R^2}}

Here r is the radius of the bubble assuming they have spherical shapes and R is the relative size of bubbles to average radius of the population R=r/{\bar {r}}.

Dynamical Aging:

1) Average bubble sizes increase.

2) The probability distribution broadens.

3) Total volume decreases

4) Average life time of a bubble with radius r, we may need to know all the neighbors radii too.

In the case of a single hemispherical bubble at the surface of liquid and gas the bubble burst because of drainage and thinning effect at the top of the buuble. The life time in this case inversely depends on r

\tau \propto {1 \over r}

5) Transition probability of the network after a bubble burst.

Information Maximization

October 21, 2008 by Peyman Khorsand

Taken from: “An Information Maximization Approach to Blind Separation and Blind Deconvolution” by A.J. Bell and T.J. Sejnowski

The idea is to maximize the mutual information between the output Y of a neural network and its input X. Mutual information is defined as I(Y,X)= H(Y) - H(Y | X).

There are divergences due to the continuous variables. As a result it is better to work with gradient of mutual information with respect to some parameter, w. They considered the systems that have additive noise, N,  where H(Y | X)= H(N). Therefore to maximize the mutual information it is sufficient to maximize entropy of the output. Assuming that the input probability distribution is f_{x}(x) then probability distribution of the output is

f_{y}(y)= {f_{x}(x) \over |\partial y /\partial x |}

the entropy in the output can be written as

H(y) = E \left[ \ln \left| {\partial y \over \partial x} \right| \right] -E \left[ \ln f_{x} (x) \right]

A stochastic gradient descent learning rule can be implemented as

\Delta w\propto {\partial H(y)\over \partial w}= \left({\partial y\over \partial x} \right)^{-1}

In the case of logistic transfer function y= 1/ (1+ e^{-(wx+w_0)} or tanh function y= \tanh(wx + w_0). The learning rule has two component an Anti-Hebbian, \Delta w \propto -2 x y, and an Anti-Decay component, \Delta w \propto 1/w.

Multi-component network:

This result can be generalized to multi-component network, where {\mathbf y}= g({\mathbf W x +w_{0}}). The result will be

\Delta w_{ij} \propto {\mathrm {cof} \, w_{ij} \over \det \bf{W} } + x_i (1 - 2 y_j)

Casual Filters:

Furthermore it can be generalized to casual filters where y(t)=g[u(t)]=g[w(t) * x(t) ], or equivalently {\mathbf Y} =g({\mathbf W X} ). Now {\mathbf W} is lower triangular matrix.

W=\left(\begin{array}{lccccc} w_L & 0 &\cdots & 0 & & 0\\ w_{L-1} & w_L &0& \cdots& &0\\ \vdots & & & & &\vdots\\ w_1&\cdots & w_L & &\cdots &0\\ 0 & w_1 & \cdots & w_L & \cdots &0\\ \vdots & & & & & \vdots\\ 0 & \cdots & & w_1& \cdots& w_L\\ \end{array}\right)

The probability distribution function of X and Y are f_{Y}(Y)= f_{X}(X)/|J|

J=\det \left[ \partial y(t_i) \over \partial x(t_{j}) \right]=(\det W) \prod_{t=1}^{M} {\partial y(t) \over \partial u(t)}

The gradient descend learning rule is

\Delta w_L \propto \sum_{t=1}^M \left( {1 \over w_L} - 2 x_t y_{t} \right)
\Delta w_{L-j} \propto \sum_{t=1}^M \left( - 2 x_{t-j} y_{t}\right)

The delay weights w_{t-j} keep shrinking and try to de-correlate the past signal from future. It is very interesting to see what kind of w will be created depending on the input statistics.

Time Delay:

y(t)=g[w\, x(t-d)]

following the same set of steps we find

\Delta d \propto 2 w {\partial x \over \partial t} y

Generalizing the transfer function:

{dy\over d u}= y^p (1 - y)^r

\Delta w\propto {1\over w} +x[p(1-y) - ry]

\Delta w_0 \propto p(1-y )-ry

Blind Separation: A set of source signals, s_1(t), s_2 (t), \ldots, s_n(t) is mixed together linearly, with a matrix {\mathbf A}. We ought to find a square matrix {\mathbf W} that is inverse of {\mathbf A} up to a permutation. The problem reduces to minimize the mutual information between the inputs(Independent Component Analysis). Obviously, if the matrix {\mathbf A} is singular one can not solve the problem.

Blind De-convolution: A signal s(t) is corrupted with a linear filter, a(t)x(t)= [a(t)* s(t)]. We have to find a filter w_1, w_2 , \ldots , w_L to recover s(t) from x(t). The problem reduces to remove statistical dependency across time (Whitening of x(t)).

The fundamental question is how can we reduce mutual information between two outputs by information maximization. The joint entropy of two output signals is

H(y_1,y_2)=H(y_1) + H(y_2) -I(y_1, y_2)

The algorithm discussed above maximize H(y_1,y_2) and this is mostly done by minimizing I(y_1,y_2). When I(y_1,y_2) is zero the probability distribution of y’s is separable. Can we introduce a constraint and keep the total \sum H(y_i) conserved?

It was briefed up to section 5.

Accuracy vs. Correlated Noise

October 16, 2008 by Peyman Khorsand

Taken from: “The Effect of Correlated Variability on the Accuracy of a Population Code” by L.F. Abbott and P. Dayan

They answer to the following questions for three different class of models: “(1) Does correlation increase or decrease the accuracy with which the value of an encoded quantity can be extracted from a population of N neurons? (2) Does this accuracy approach a fixed limit as N increases?”

The average firing rate is noted as f_i(s)=\langle r_i(s) \rangle. The correlation matrix Q_{ij}(s)= \langle (r_i (s)- \langle r_i (s)\rangle)(r_j(s) - \langle r_j (s)\rangle ) \rangle is used to calculate Fisher information I_{\mathrm F}.  The Fisher information can be calculated assuming Gaussian character of correlation

P[\mathbf {r}|s] = {\cal N} \exp \left[-{1\over 2}\sum_{i,j}(r_i - f_i)Q_{ij}^{-1} (r_j - f_j) \right]

I_{\mathrm F}(s)=\sum_{i,j} {d f_{i}\over ds} Q^{-1}_{ij} {d f_{j}\over ds} + {1\over 2}\sum_{i,j,k,l} {d Q_{ij} \over ds} Q_{jk}^{-1} {dQ_{kl} \over ds } Q_{li}^{-1}
I_{\mathrm F}(s)= {d \mathbf{f}^{\mathrm T} \over ds} \mathbf{Q}^{-1} {d\mathbf{f} \over ds}+ {1\over 2} \mathbf {Tr} \left[{d\mathbf{Q} \over ds} \mathbf{Q}^{-1}{d \mathbf{Q}\over ds}\mathbf{Q}^{-1} \right]

The models being studied are:

Additive Noise Model:

Q_{ij}= \sigma^2 \left[ \delta_{ij} + c(1 - \delta_{ij})\right]

A nice example of collective quantities R= {1\over N} \sum r_i and \tilde R={1\over N} \sum (-1)^i r_i and the N dependence of their respective variance, \sigma^2_R and \sigma^2_{\tilde R}, is presented.

\lim _{N\rightarrow \infty} I_{\mathrm F}= {N [F_1(s)- F_{2}(s)]\over \sigma^2 (1-c)}

where F_{1}(s)= {1\over N} \sum_i \left(d f_i(s) \over ds\right)^2 and F_{2}(s)=\left( {1\over N} \sum_i {d f_i(s) \over ds} \right)^2. It worths mentioning that when F_1=F_2 the Fisher information fails to grow linearly. This will put a constraint on the individual neurons tuning curves among the population

F_1=F_2 \Rightarrow f_i(s)=p(s)+q_i

Multiplicative Noise Model:

Q_{ij}(s)= \sigma^2 \left[ \delta_{ij} + c(1-\delta_{ij})\right] f_i (s) f_{j}(s)

\lim_{N \rightarrow \infty}I_{\mathrm F}= {N [G_1(s)- G_{2}(s)] \over \sigma^2 (1-c)} +{N [(2-c)G_1(s)-c G_{2}(s)]\over (1-c)}

Limited-Range Correlation Model:

Q_{ij}=\sigma^2 \rho^{|i-j|}

I_{\mathrm F}={N(1-\rho)F_{1}(s)\over\sigma^2 (1+\rho)} +O(N^{1-2/D})

D is the number of encoded variables. “The main conclusion of this paper is that neurons should have different selectivity to the quantities they are encoding. In particular their tuning curves should not be additively or multiplicatively separable”.

Potential Energy Landscapes

October 14, 2008 by Peyman Khorsand

The configuration space of any system with conserved energy (time-independent Hamiltonian) can be broken into subspaces surrounding potential energy minima, which the minima will be reached by the steepest descent path of the potential. This hyper-volume around a potential minimum is called basin. The structure based on these neighboring basins is called as the “Inherent Structure”. The statistical behavior of a complex system can be studied by analyzing the characteristics of this structure. The partition function of a system of N particles can be written as

Z(\Omega,N,T) = {1\over N! \lambda^{3N}} \int_{\Omega} d^{3N}x e^{-\beta V(x)}

We characterize each basin by its minimum potential E_i. The total volume \Omega is partitioned by these basins and so does the above integral.

Z(\Omega,N,T) ={1\over N! \lambda^{3N}} \sum_i \int_{\omega_i} d^{3N}x e^{-\beta V(x)}

Defining an average free energy for basins f_{b}(E_i,T,\Omega)

-\beta f_{b}(E_i,T,\Omega)=\ln \langle {\int_{\omega_i} d^{3N}x e^{-\beta V(x)} \over \lambda^{3N}} \rangle

Now if we know the probability distribution of energy minima \rho(E) the partition function can be written easily as,

Z(\Omega,N,T)=\int dE \rho(E) e^{-\beta f_{b}(E,T,\Omega)}

General Properties of PEL’s:

The number of total minima is exponentially depends on the number of underlying degrees of freedom (e.g. number of atoms),

N_{min} \sim e^{N_{deg}}

The energy distribution of minima follows Guassian distribution. This can be understood through central limit theorem for a large system with independent subsystems.

P(E) = {\cal{N}} e^{-(E-E_{0})^2/2 \sigma_e^2}

Also on average an exponential relation between the minimas’ energy and their corresponding volume was observed

A \sim e^{-\alpha E}

The probability distributions, P(A), of the volume, A, of these basins for various dynamical systems (Binary Lennard-Jones, Dzugutov Liquids and Amorphous Silicon) show power law behavior

P(A) \sim A^{-2}

There is a similarity between this phenomenon and “Apollonian Packing” of an arbitrary volume, where in the limit of large dimensionality of the target space obeys the same power law. Moreover, the network consist of connecting neighboring minima is a scale free network, which means

There is a strong correlation (a power law) between the volume of the basin and its number of neighbors, k_i,

k_i \sim (A_i)^\beta

This relation subsequently dictates a relation between the minimum energy and number of neighbors.

As a reference look through the papers by J. P. K. Doye and C. P. Massen.

Fluctuations in Network Dynamics

October 11, 2008 by Peyman Khorsand

Taken from: “Fluctuations in Network Dynamics”, by M. Argollo de Menezes and A.-L. Barabasi

The flow through a node in a network is time dependent. This time dependence can be partly be described by mean flow, f_{\mathrm i}(t) and the fluctuation around this mean \sigma_{\mathrm i}(t).
In most natural networks, random, scale-free or small-world there is a functional dependence between mean and fluctuation of the flow through the nodes. In this paper it is claimed that the relation between f and \sigma in different networks (only scale-free and random network) fall into two different categories and characterized by their \alpha-exponent

\sigma \sim \langle f \rangle^{\alpha}

In all the networks they studied \alpha is equal to 1/2 or 1. In the networks with \alpha=1/2 the internal noise is responsible for flow fluctuations, while it was claimed that in networks with \alpha=1 the external noise is responsible for the fluctuations. Two different models are proposed.

Model 1: At any time step, W number of walkers are placed randomly on the network nodes, the preform M step walks.

Model 2: At any time step W random pairs of nodes are selected and they are connected through the shortest path between them (degeneracy problem is not discussed).

In both cases we observe \alpha=1/2 if W is fixed and \alpha=1 if W has large fluctuations. In general the fluctuations on a given nodes can be decompose into internal and external components
\sigma_{\mathrm i}^2=(\sigma_{\mathrm i}^{\mathrm {int}})^2 +(\sigma_{\mathrm i}^{\mathrm {ext}})^2
\sigma_{\mathrm i}^2 = a_{\mathrm i}^2 \langle f_{\mathrm i} \rangle +\left[ {\sigma_{\mathrm {dr}}\over \langle W(t)\rangle } \langle f_{\mathrm i} \rangle \right]^2

where \sigma_{\mathrm{dr}}=\sigma_{\mathrm{dr}}(\Delta W), represent the external driving force in the noise magnitude. Moreover, by increasing \Delta W a transition form \alpha =1 /2 to \alpha=1 behavior can be seen (some issues regarding the fitting process should be considered.).

In conclusion: “The \alpha=1/2 captures an endogenous behavior determined by the system’s internal fluctuations” while “The \alpha=1 exponent describes driven systems, in which the fluctuations of individual nodes are dominated by the time dependent changes in the external driving forces.”

Optimal Sampling of Natural Images

October 10, 2008 by Peyman Khorsand

This post is taken from the following paper,

“Optimal sampling of Natural Images: A design Principle for the Visual Systems?” by W. Bialek, D.L. Ruderman and A. Zee

If the natural scene images can be characterized by \phi(\mathbf {x}) and the cells are indexed by integers n and positioned respectively at \mathbf{x}_n. The output of each cell, Y_n, is a linear filter (receptive field) with an additional noisy component

Y_n=\int d^2 \mathbf{x}F( \mathbf{x}-\mathbf{x}_n)\phi(\mathbf{x})+\eta_n

The goal is to find a kernel F that maximze the information content of output, Y_n, about the input, \phi. The information content of Y_n assuming that \phi(\mathbf{x}) is Gaussian is

I={1\over 2\ln 2}Tr\ln\left[ \delta_{nm} + {1 \over (2 \pi)^2 \sigma^2} \int d^2\mathbf{k} e^{i \mathbf{k}(\mathbf{x}_n -\mathbf{x}_m)} |\tilde{F}(\mathbf{k})|^2 S(\mathbf{k})\right]

here S(\mathbf{k}) is the power spectrum of the signal. We can approximate I in large noise regime as

I \approx {N \over 2 (2\pi)^2\sigma^2 \ln 2} \int d^2\mathbf{k} |\tilde{F}(\mathbf{k})|^2 S(\mathbf{k})

We need to put extra constraints to get solutions that are physically realistic

1) We should fix the filters gain,

\int d^2 \mathbf{k} |\tilde{F} (\mathbf{k}) |^2 =1

2) There should be a cost for “long-range interactions in spatial space” or “sharp fluctuations in momentum space”,

C = \alpha \int d^2 \mathbf{k} \mathbf{k}^2 |\tilde{F}(\mathbf{k}) |^2

Using variational methods they found that filter should satisfy the schrodinger like equation in \mathbf{k}-space

-{\alpha\over 2}{\nabla_k}^2\tilde{F}(k)-{1\over 2 \sigma^2\ln 2} S(k)\tilde{F}(k)=\Lambda \tilde{F}(k)

the {h}^2/\alpha playes the role of the mass M and - S(k)/ ( 2 \sigma^2 \ln 2 ) of potential V in the schrodinger equation.

In the case that they are interested the power spectrum of natural images has a power law S(k)\sim 1/k^2 and as a result an accidental symmetry. This accidental symmetry allows them to recombine various angular momentum eigenstates and create orientation selective eigenstates (filters).

What interests me the most is that for any linear filter this formalism holds and, the power spectrum of the input signal appears as the potential of the schrodinger equation.

Statistical Mechanics of Four Letter Words

October 6, 2008 by Peyman Khorsand

Taken from the paper by G.J. Stephens and W. Bialek.

There are 26^4=456976 possible four letter words, the total entropy for such a sample is S_{0}= 4 \log_2 (26)=18.802 bits. Out of this large number of possibilities only a small fraction of 1/3700 is in use in English language. By looking at large database of American literature, \sim 800 words and their frequency of occurrence is found (the database had \sim 2\times 10^7 words all the four letter words which they have appeared more than 100 times are considered.). The entropy of this set is approximately S_{full} = 6.92\pm 0.003. The paper is trying to regenerate legal words using statistical properties of their building blocks (the letters).

Independent Letters Approximation:

Not all the letters appear with the same frequencies. Considering this point the probability of having a four letter words is P(l_1, l_2, l_3, l_4)= \prod_i l_i. The entropy of a such an ensemble is S_{ind} = 14.083 \pm 0.001 bits.

Pairwise Interaction Approximation:

We assume that P(l_1,l_2,l_3,l_4) can be written as a Boltzmann factor of pairwise interacting between different letters

P^{(2)}(l_1,l_2,l_3,l_4)= {1 \over Z} \exp \left[ - \sum_{i,j} V_{i,j}(l_i,l_j) \right]

V_{i,j}(l_i,l_j) are the pairwise potential and have 6 \times (26^2 -1)=4050 free parameters. They can be determined by solving 6 \times 26^2 coupled equations (pairwise marginal distribution).

\rho_{i_1,i_2}(l_1,l_2) = \sum^{\prime} P^{(2)}(k_1,k_2,k_3,k_4)

The ensemble built in this construction has an entropy of 7.48 bits. The pairwise potentials V_{i,j} define an energy landscape. There are 136 local minima in this landscape (changing any letter increases the energy) out of them 118 are real English words, that capture 63.5 of the probability distribution.

The induced probability for each real word using pairwise interactions P_2 reproduce the observed probability of that word P_{full} especially for frequent words (The plot of P_2 vs P_{full} is almost diagonal).

The independent letter approximation reduced the number of possibilites by a factor of \sim 1/26, a further reduction factor of \sim 1/100, the higher order interaction only contribute for a factor of 1/1.5.

Zipf’s Law:

Plotting the probability vs the rank of different words shows an approximate power law.The four letter words have a cut off tail. The pairwise model removes some weight from the bulk of the distribution and reassigns it to the tail.

Thermodynamics of Natural Images

September 25, 2008 by Peyman Khorsand

This is a brief extraction from Mora et.al. paper.

The idea originates from an observation by Field that the spatial spectrum of images of natural scenes follow a 1/f^2 behavior. Now if this behavior was observed in a statistical system, the first conclusion was that the system is in its critical point.

Here the authors to show the same result assume that the probability of an image \sigma is related to an energy function E(\sigma) through Boltzmann relation

P_T(\sigma)= {1 \over  Z} e^{-E(\sigma)/T} where Z = \sum_{\sigma} P(\sigma)

An L\times L pixel black and white image has 2^{L^2} possible configurations. In order to make this number they studied the system for L=2 , 3 and 4 using coarse graining ideas and block averaged the natural images.

Using their set of images they

1) assumed T=1 in natural images and found P(\sigma)

2) the probability at any other temperature can be found by

P_T(\sigma) = {P(\sigma)^{1/T}     \over  \sum_{\sigma} P(\sigma)^{1/T} }

3) the entropy at any other temperature can be derived by

S(T) = - \sum_{\sigma} P_T(\sigma)  \log  P_T(\sigma)

4) calculated the specific heat

C(T)=T {\partial S(T) \over \partial T}

5) repeated all these steps for L=2,3 and 4.

They observed that there is a peak in normalized specific heat C(T,L)/L^2 and as L increase the peak becomes sharper and approaches T=1. This is a clear indication for critical behavior in natural scenes (divergence of the specific heat at T=1).

In the macrocanonical ensemble, we study the statistical system under the fixed energy constraint. Under this conditions we can define the partition function as an integral over the density of states with constant energy \rho (E),

Z(T) = \int dE \rho(E) e^{-E/T}

Entropy will be defined as logarithm of density of states i.e. \rho(E)=e^{S(E)} ,

since both entropy and energy are extensive it is more suitable to work with \epsilon = E/N and s(\epsilon)=S(\epsilon)/N. In the large N limit,

C={N \over T} \left[  - {d^2 s(\epsilon)  \over d\epsilon^2}  \right].

When the S/N vs E/N curve for a system is become linear we should expect critical behavior. For natural images this can be done by looking at the probability of occurrence of each configuration, P(\sigma), then calculating P=\exp (-  E /T) of each configuration up to a constant. The density of states \rho (E) is the histogram of the configurations’ energy. Finally, the entropy is logarithm of density of states.

This calculation has been done for 8 to 50 pixel sample and they all shown a linear S , E behavior at low energy limit, where the sampling methods are more reliable (more occurrence).

They also prove that in a system which follows generalized Zipf’s law entropy has a linear dependence on energy.

Note: Generalized Zipf’s law i.e. p_r \propto 1/r^{\alpha}, where it  relates the ranking of a state r to its probability.

Energy Landscape:

Finally they studied the energy landscape of the natural images by looking at the local minima of the energy. They examined all the 4 \times 4 pixel images. The local minima are defined as configurations that flipping any pixel will increase the energy of the configuration. They found approximately 100 of these minima. The local minima are important in answering the question that if the natural images are at the critical point of T=1 what will happen to system at T=0?, and what order parameter can be defined for this phase transition?

Most of the local minima are representing an edge between black and white regions or a strip. This is consistent with the visual system that the response triggered averages of neurons in response to natural scenes has a prominent edge detection component.