Version 3. Traduction.

Henri a été surpris d’apprendre que je n’avais pas encore étais au Louvre. Je lui ai dit que je pensais que c’était plus cher d’y entrer. Henri a dit que c’était gratuit le dimanche, et il m’a ordonnée y aller. J’ai pris un bus vers la rue de Rivoli. Je me suis approchée du bâtiment massif, et lorsque j’y marchais je me suis senti aussi petit qu’un grain de sable à la plage. J’étais accablé par son échelle.

« Vous Américaine ? », J’ai entendu la voix d’un homme derrière moi alors que j’approchais de l’une des nombreuses entrées du musée.

J’ai hoché la tête, embarrassée.

La voix provenait d’un touriste américain qui avait un accent traînant du sud. Il m’a frappé légèrement sur le bras. « C’est là qu’ils gardent la Mona Lisa, hein ? »

Il y a beaucoup plus que la Mona Lisa là-dedans, dis-je, et rapidement je me suis éloignée du grand homme au poncho de plastique jaune et les bottes de l’Ouest, j’essayais de m’éloigner le plus possible du laid américain.

J’ai ouvert mon sac pour laisser le garde le farfouiller, puis je suis entrée dans le palais des trésors volés. Ils ont volé mes ancêtres d’Afrique, volé leur langue, effacé leur histoire et leurs noms, ils avaient volé des crucifix d’églises européennes, des bijoux des dynasties chinoises, des tombes des déserts égyptiens. Aux États-Unis, il y a eu un mouvement pour récupérer les ossements des peuples ethniques du monde entier afin qu’ils puissent avoir un véritable enterrement. Je me demandais si et quand la France relâcherait ses ossements africains. J’ai erré les couloirs de marbre et galeries de taille d’arène pendant des heures. Le temps a changé quand j’ai regardé des peintures si riches en détails que j’ai senti comme si je pouvais marcher droit dans les bois denses, les salles intimes, les champs de bataille dramatiques et les mariages grandioses. Quand j’ai aperçu l’Américain bruyant au poncho jaune, je suis entrée plus vite dans la pièce voisine pour lui échapper et j’ai commencé à chercher une sortie. L’air frais était comme du vin pour moi. Je me suis assis sur un banc de parc et laisser toutes les couleurs et les formes s’insèrent sur une carte, dans mon esprit.

Advertisements

Technique: Fiber Photometry Calcium Imaging

<<La solitude vivifie ; l’isolement tue.>>

Joseph Roux

 

Paper: Dorsal Raphe Dopamine Neurons Represent the Experience of Social Isolation

Fiber photometry calcium imaging is a method that allows us to measure neural activity in a particular population of neurons by means of genetic and fluorescent recording techniques in freely moving mice. First, luminescent calcium indicators (e.g., GCaMP6m in this paper) are genetically expressed in a desired set of neurons. Next, an optic fiber is surgically placed above the population of neurons expressing the calcium indicator. Finally, the optical fiber works in a bidirectional way. That is, the system of optical fibers excites and records the activity of the calcium indicators thanks to a dichroic mirror and a photodetector. More specifically, the fiber excites the GCaMP6m with light at about 470nm and the recorded activity is compared to the baseline of around 410nm or 430nm of background excitation. Notice that such measurements are the overall activity of the entire set of neurons. In addition, fiber photometry is particularly attractive when we want to record the activity of freely moving mice (unlike other methods that require animals to be head-fixed for calcium recording). In general, this technique is useful when dissecting neural circuits because it allows scientists to implant multiple fibers and record the activity of a particular set of neurons when other populations of neurons are excited as well. The real-time recording ability of fiber photometry permits researchers to compare the activity of different neural populations and see what neurons are related to the orchestration of a particular behavior.

The figure below (figure 2 A, B) shows a schematic and real microscopic representation of how fiber photometry was set up in the freely moving mice in this experiment. The researchers in this paper were trying to study how social isolation affected the activity of dorsal raphe nucleus (DRN) dopamine (DA) neurons. In order to do this, they had two experimental groups. One group of mice was socially isolated while the other one was allowed to hang out with other mice. Of course, these mice were previously injected with adeno-associated viral vectors to express GCaMP6m in dopamine neurons of the DRN. Then the individual mice in the fiber photometry set up were presented to a young mouse. The fiber photometry recordings showed that isolated mice had a significantly greater fluorescent response to this new mouse (Figure 2 C). Furthermore, in order to avoid any possible confounds, the authors compared this neural activity to a control: a new object instead of a new mouse in the same experimental setup. Indeed, the response of isolated mice in fiber photometry was greater for the new mouse than for the new object. The authors concluded that, after experiencing social isolation, DA neurons in the DRN have a significant greater activity when exposed to a social stimulus. Of course, this is just a correlation because fiber photometry does not allow us to make conclusions about causation. Optogenetic manipulations would be necessary to corroborate causation.

Screen Shot 2018-01-12 at 10.53.19 AM

 

Reference

Matthews Gillian A, Nieh Edward H, Vander Weele Caitlin M, Halbert Sarah A, Pradhan Roma V, Yosafat Ariella S, Glober Gordon F, Izadmehr Ehsan M, Thomas Rain E, Lacy Gabrielle D, Wildes Craig P, Ungless Mark A, Tye Kay M Dorsal Raphe Dopamine Neurons Represent the Experience of Social Isolation. Cell 164:617-631.

Paper of the Day: Text Recognition in the Human Brain and Computational Systems

Heankel Oliveros

Final Paper CS332

Text Recognition

 

This paper analyses the system presented in Reading text in the wild with convolutional neural networks by Jaderberg et al. (2016). In that article, the authors propose an end-to-end method for text spotting. The first task, text detection, is performed by weak detectors, and the resulting proposals are then filtered and refined. The second task, word recognition, is accomplished by a deep convolutional neural network. In the next sections, I explain the details of these two stages, as well as their results in terms of other text spotting methods. In addition, I first present an introduction to word processing in the human brain. Understanding how the brain recognizes text is fundamental to appreciate why the system developed by Jaderberg and colleagues is superior to previous text spotting methods.

Text Recognition in the Human Brain

Text is a relatively recent invention and, because of time constraints, a brain region devoted to text recognition is unlikely to have evolved in evolution. The modern Homo sapiens is thought to be about 200 000 years old while writing was first invented around 5000 years ago (Reinhardt, 2005). In addition, not all human populations were exposed to text in the past, but until more recently after the invention of the printing press. Studies using fMRI, however, have reported a stronger response to alphabetic strings than to other visual stimuli (e.g., faces, houses, checkerboard) in the left occipitotemporal cortex, a region adjacent to the fusiform gyrus known as the visual word form area (VWFA; Figure 1) (Cohen, Dehaene et al. 2000, Cohen, Lehericy et al. 2002, Hasson, Levy et al. 2002). This activation is present in literate subjects across different languages and writing systems (Bolger, Perfetti et al. 2005). Moreover, the VWFA is equally activated by real words and readable pseudowords, suggesting that this area is tuned to the orthographic regularities of the language(Cohen, Lehericy et al. 2002). These results in the Roman alphabet have also been replicated in real and pseudo characters in Chinese (Liu, Zhang et al. 2008).

 

figure 1

Figure 1. The visual word form area (Cohen, Lehericy et al. 2002). The left panel shows the VWFA, a region of the left-hemispheric fusiform cortex more responsive to letters and words than to control stimuli in fMRI studies. The green squares come from individual subjects, whereas the yellow squares represent group analysis. The right panel shows the average BOLD signal for words and consonants versus checkerboards in both visual fields.

 

If the visual word form area is not encoded in our genome, how can we understand its consistency across individuals and cultures? Recent studies suggest that, despite the structural constraints of the cortical organization in the visual system, experience-driven plasticity can lead to a specialization process. For instance, Baker, Liu et al. (2007) tested the dependency on experience of the VWFA by showing that this region is activated more strongly to Hebrew words in readers than in nonreaders of that language. In addition, the orthographic familiarity of subjects seems to be correlated with a stronger blood oxygenation (BOLD) response in the VWFA(Binder, Medler et al. 2006). Other studies using fMRI rapid adaptation techniques suggest that neurons in the VWFA respond selectively to individual real words (i.e., words known by the subjects) (Glezer, Jiang et al. 2009).

Because the VWFA results from a functional reorganization in the visual system driven by experience, it is not a surprise that word recognition is in line with the postulates of object recognition in the visual cortex(Riesenhuber and Poggio 2002). Besides the specificity of the VWFA, consider, for example, how the VWFA recognizes words at different levels, from characters (Baker, Liu et al. 2007), syllables or letter combinations (Binder, Medler et al. 2006), up to “whole real words” (Glezer, Jiang et al. 2009). Furthermore, in fMRI studies, the response in the VWFA to words shows invariance across visual features such as letter case, size, orientation, and font. For instance, the VWFA is equally activated (versus fixation) by words whether they are presented as “pure-case words” (e.g., hello world) or as “alternating-case words” (e.g., hElLo WoRlD)(Polk and Farah 2002). The case insensitivity of the VWFA has also been reported in word masking and unconscious repetition priming. The results from fMRI and event-related potentials (ERPs) in the VWFA were sensitive to word repetition, independently of changes in letter case (Dehaene, Naccache et al. 2001).

Taken together, this evidence from neuroimaging reveals three key ideas about word recognition in the human brain. First, the specialization of the visual word form area results from the functional reorganization in the visual cortex driven by experience. Second, the response of the VWFA is sensitive to spelling and the experience of the reader in a particular writing system, but its response is invariant across other visual features such as letter case. Third, there is evidence of a hierarchical organization in the process of word recognition (i.e., recognizing letters, combination of letters, and whole words). The third idea will be important when discussing Jaderberg and colleagues’ system for text recognition in computer vision. I will use some of these concepts to discuss how the proposed text recognition system of Jaderberg et al. is similar to word processing in the human brain and superior to previous methods.

Text Recognition Using Convolutional Neural Networks

In computer vision, the detection and recognition of words in natural scene images is a problem with important applications. In the modern urban world, text is present almost everywhere: traffic signs, labels, digital screens, and billboards, just to mention a few examples. In this context, an automatic text spotting system could have relevant applications for visually impaired people, translating text from images, and analyzing or retrieving textual content from video or image databases. Recognizing text in the wild, however, is not an easy task. Unlike text in black and white documents, text in scene images varies greatly in visual features such as lighting, occlusions, size, alignment, orientation, and noise. This is why the challenges for text spotting methods in the wild are greater than the ones covered by standard text recognition techniques in documents (e.g., OCR). In the next section, I explain Jaderberg and colleagues’ end-to-end system for text spotting (figure 2). First, I describe their method for text detection, then the convolutional neural network (CNN) for text recognition, and finally, I discuss their results.

 

figure 2

Figure 2. An overview of the text spotting system (Jaderberg, Simonyan et al. 2016). A) The first step of text detection uses weak detectors to generate proposals. B) Those proposals are filtered with a stronger classifier. C) The bounding box of the word proposals is refined using regression in a CNN. D) Word recognition is performed using a CNN. E) The outputs of the CNN are merged and ranked in order to eliminate false positives and duplicates. F) The proposals that pass the threshold are taken as the final results.

 

Text Detection

Before performing the laborious task of text recognition, this system identifies, filters, and refines the text proposals that will go into the CNN (figure 2: a,b,c). Early in text detection, a tradeoff between precision and recall (true positives found) was necessary to reduce the complexity and time devoted to this task. That is, the authors chose a fast detection method with high recall and low precision (many false positives). In order to achieve this tradeoff, they selected two weak classifiers whose proposals were then filtered and refined for text recognition.

Weak classifiers for proposal generation

Edge boxes is a weak detector developed by Zitnick and Dollár (2014). The idea behind this method is simple: the number of edges wholly enclosed in a box indicates how likely that box is to contain an object (figure 3). Because words are a combination of letters with clearly defined contours, they are correctly detected in this method. Following Zitnick and Dollár’s method, Jaderberg et al. used a sliding window at different scales to evaluate the probability of each box b to contain text. These boxes are then given a score, ranked and removed if they overlap with another box of higher ranking. Those boxes with scores higher than a threshold are taken as candidate “bounding boxes” Be.

 

figure 3

Figure 3. Edge boxes: a weak detector for text. The wholly enclosed edges of an image indicate how likely that box is to contain an object, in this case, text.

 

The second cheap classifier is a trained detector called “aggregate channel feature” (ACF) detector. Jaderberg and coworkers decided to use the efficient ACF structure presented by Dollar et al. (2014). The ACF detector is a sliding window method that considers the output of an adaptive boosted (AdaBoost) classifier over a collection of channel features (figure 4). The channel features are designed to extract and reduce information from a given image. In this case, Jaderberg et al. obtained the normalized gradient magnitude, the histogram of oriented gradients (HOG), and a grayscale version of the image. In order to reduce the information from these channels, the authors divided each channel into blocks, smoothed them, summed the pixels in each section, and smoothed them again to finally result in the aggregate channel features. Next, the trained AdaBoost algorithm was applied. Similar to the Viola and Jones face detector (2004), AdaBoost creates an accurate classification rule to detect text by combining weak and simple features. These features were applied using a sliding window at multiple scales in order to represent different length words and those proposals above a threshold were considered as the final box proposals, Bd. Finally, the candidate bounding boxes identified by both weak classifiers (edge boxes and ACF) were considered as the proposals for the next stage.

 

figure 4

Figure 4. Overview of the aggregate channel feature detector (Dollar et al., 2014). The ACF classifier from Dollar et al. takes an image and it extracts features such as HOG, normalized gradient magnitude, and grayscale. Then the image is divided into blocks, and the pixels in each block are added. The image is transformed to a vector and the boosted classifier is applied.

 

Filtering and Refinement of Proposals

In Jaderberg and coworkers’ text spotting pipeline, filtering and refinement occur in two different stages. First, a stronger classifier than edge boxes and ACF is used to reject the false positives in the candidate bounding boxes. In order to achieve this, Jaderberg et al. opted for a random forest classifier (Breinman, 2001). This method is a binary classifier (i.e., word/no-word) acting on the HOG features of each bounding box (figure 5, a, b). The forest classifier simply rejects the proposals that are under a certain threshold, keeping the candidates that are most likely to be words. Once most of the false positives have been rejected, it is important to refine the bounding boxes of each proposal in order to prepare the whole words that will be taken as input in the text recognition CNN. In most cases, because the classifiers used an overlap ratio of 0.5, the proposals generally overlap only with half of the groundtruth (figure 5, c). That is, the bounding boxes are accurate in terms of width, but not height, and vice versa. The solution proposed by the authors was a CNN capable of regressing the groundtruth bounding box for all the candidate bounding boxes (figure 5, d). In short, the input to this network is an image of fixed width and height containing the bounding box at its center. In addition, the bounding box is inflated by a factor of two and its coordinates are encoded in terms of the cropped image in order to provide enough context to the CNN for the prediction of the refined proposal. The network is trained with example pairs of the input image and the groundtruth bounding box. After filtering with a random forest classifier and refining with a CNN bounding box regression, the proposals are finally ready for the most computationally expensive task: text recognition.

 

figure 5

Figure 5. Filtering and Refinement of Proposals. A) An example of the HOG used in the stronger classifier. B) The random forest classifier separates words from no-words according to the HOG. C) Image showing the problem in the bounding box word because of the 0.5 overlap ratio used in the weaker detectors. D) Before and after the CNN bounding box regression. Green shows the groundtruth bounding box. Red is the regressed box.

 

Text Recognition

The CNN for text recognition is composed by five convolutional layers and three fully connected layers. Each output neuron in the CNN corresponds to a word from an English dictionary of 90K words, whereas each input is one of the generated proposals. This network recognizes which word “w” in the dictionary corresponds to the input image by ranking the probability of each word for a given input image “x” (figure 6). The word from the dictionary with the highest probability is the best match for the given input. A minor limitation of using a CNN is that the image input must have a pre-defined size. This condition, however, did not affect the performance of the network since the horizontal distortion provided information about the length of the word.

 

figure 6

Figure 6. Overview of the text recognition CNN (Jaderberg, Simonyan et al. 2016). The network is composed by five convolutional layers and three fully connected layers. The final layer corresponds to the words from the dictionary used for recognition. The input is a whole-word image. The bounding box proposal is associated to the word in the dictionary that has the highest probability of being its match.

Given that the CNN takes a whole word image as its input, the CNN had to be trained by word images. Despite the fact that there are some databases available with street view text (Wang et al., 2011), the size and variety of the word images in these datasets are considerably limited. Because of this constraint, other text recognition methods generally approach the problem by developing character classifiers. The solution to this problem was a synthetic data set developed by the authors themselves. Their premise was that most text in natural scenes is created by a range of fonts available in computers. In addition, other text features such as alignment, texture, and lighting effects could be imitated. Considering the variation on these features, Jaderberg et al. created single word image samples. Each one of these samples was composed by three different image-layers: background, foreground, and border/shadow layer. The generation of synthetic data consisted of six steps (figure 7):

  1. Font rendering: choosing a random font from a catalog of 1400 fonts
  2. Border/shadow rendering: altering the border size and the shadow
  3. Base coloring: changing the color of the layers in the context of natural images
  4. Projective distortion: distorting the view of the sample to simulate the 3D world
  5. Natural data blending: mixing the samples with textures from natural scenes
  6. Noise: introducing Gaussian noise and other artefacts to the image

Overall, the large synthetic data set created in this process was able to produce a diverse range of samples without the need of using real-world data. In addition, the authors had the flexibility to choose the words from the dictionary used in the CNN. This rich synthetic data set allowed the authors to train the CNN based on whole-word image samples.

 

figure 7

Figure 7. Synthetic Training Data (Jaderberg, Simonyan et al. 2016). A) The process of creating word image samples from words in the dictionary. B) Examples of images used in the synthetic training data set generated by the authors.

 

Merging and Ranking

At this point in the algorithm, I have described how Jaderberg and coworkers’ system generates, filters and refines word-bounding boxes. These word images are then matched to their most likely set of words in the CNN. However, some duplicates and false positives must be eliminated before yielding a final answer. The authors performed merging and ranking according to the requirements of the text recognition task. That is, whether the task is text spotting (general word search) or image retrieval (specific word search). In the case of text spotting, there were two major problems. First, candidate output words for the same word (duplicates), and second, different words with some overlap. In order to reject duplicates and to find the actual word in overlapping candidates, the authors performed non-maximum suppression (NMS). The key idea is that this method works as a “positional voting” for a specific word. Therefore, the candidate with the best score was taken as the real output. In the case of image retrieval, the system computes the probability of an image to contain the query word and those images with the best cores are processed. This classification allows the system to retrieve images rapidly from large databases.

 

Results

Now that I have explained the process of proposal generation, filtering, bounding box regression, CNN recognition, and merge and raking of candidate words, it is time to evaluate the performance of the system in text spotting[1]. According to the standards in the field (Wang et al., 2011), text spotting algorithms should disregard alphanumeric characters and words that are not at least three characters long. In addition, a result is considered as valid only if the bounding box has at least 0.5 overlap with the groundtruth. Following these rules, Jaderberg and colleagues compared their system with previous end-to-end text spotting methods using different databases. Across all datasets, their pipeline was far superior that all previous methods (Figure 8). Furthermore, the performance of their system was slightly better when the overlap with the groundtruth was reduced to 0.3.

 

figure 8

Figure 8. Text recognition using CNN for whole-word images is superior to previous text spotting methods in natural scenes. The proposed method is superior across all the datasets. Most of the previous methods were focused in character recognition. In addition, a decrease in the overlap of the groundtruth box improves the performance of the system.

When looking at the previous methods presented by Jaderberg and coworkers, I realized that they were mostly focused in character recognition in order to identify words. For instance, Jaderberg et al. (2014) had essentially the same pipeline using CNN, but for character classification instead of whole word recognition. Similarly, Neumann and Matas (2013) developed a method of character detection and recognition by combining a sliding-window and an algorithm that works on “strokes of specific orientations”, which involves a process of convolving the image gradient field with a set of oriented bar filters. Moreover, Alsharif and Pineau (2014) developed an end-to-end text recognition method with hybrid HMM maxout models, which attempts to combine the character and word recognition problem by first starting with character recognition, and then proceeding on to word recognition. However, none of these methods is close to perform as well as the proposed text recognition method using whole-words in CNN.

Jaderberg and colleagues’ whole-word text spotting system has demonstrated to be superior to character recognition or hierarchical dependent models. I think that this draws an interesting parallel between this system of computer vision and word processing in the brain. As I mentioned before, there are neurons that respond preferentially to whole words, particularly real words. Although the VWFA is also responsive to characters, behavioral studies suggest that people with more reading experience tend to recognize words as entities, instead of recognizing each letter individually (Grainger, Lete et al. 2012). Like in humans, it seems that the whole-word approach with CNN could lead to more efficient computer systems for text recognition. This approach was particularly an advantage when detecting disjoint, occluded and blurry word images (Figure 9). On the other hand, the system usually failed when it encountered slanted or vertical text. This makes sense because the authors did not model such type of instances in their framework. In addition, sub-words or multiple adjacent words had a tendency to generate false-positive results.

 

figure 9

Figure 9. Text spotting results (Jaderberg, Simonyan et al. 2016). Some examples of the text recognition results in the proposed method. The red bounding boxes are the groundtruth, whereas the green boxes represent the bounding boxes that the algorithm predicted. Notice the small and blurry text recognized by the system in the first image.

 

Conclusion

In this paper, I had the opportunity to study word processing in the brain and in a system of computer vision. I learned that, despite the relatively new invention of writing, literate humans have a brain region (VWFA) that responds preferentially to characters and words, particularly whole-real words. With respect to the end-to-end text reading pipeline of Jaderberg et al., I learned that it is possible to detect and recognize whole words in natural scenes using a CNN and synthetic training data. I was also able to appreciate how the complexity of the pipeline increased as we moved from text detection to word recognition. I realized that the same simple weak detectors that we studied in class (e.g., face and object recognition) were also useful when performing text detection. It was also interesting to see how the CNN was employed in multiple tasks such as text recognition and bounding box regression. Finally, this text recognition pipeline could improve in terms of recognizing unknown words, words in the same alphabet, but different language, and even arbitrary strings. I think that new models could attempt to go back to a lower level of character recognition if the whole-word is not recognized in the CNN. By combining these methods, maybe the problem of not recognizing new words or vertical text could be solved.

 

 

References

Alsharif, O., & Pineau, J. (2014). End-to-end text recognition with hybrid HMM maxout models. In International conference on learning representations.

Baker, C. I., J. Liu, L. L. Wald, K. K. Kwong, T. Benner and N. Kanwisher (2007). “Visual word processing and experiential origins of functional selectivity in human extrastriate cortex.” Proc Natl Acad Sci U S A 104(21): 9087-9092.

Binder, J. R., D. A. Medler, C. F. Westbury, E. Liebenthal and L. Buchanan (2006). “Tuning of the human left fusiform gyrus to sublexical orthographic structure.” Neuroimage 33(2): 739-748.

Bolger, D. J., C. A. Perfetti and W. Schneider (2005). “Cross-cultural effect on the brain revisited: universal structures plus writing system variation.” Hum Brain Mapp 25(1): 92-104.

Cohen, L., S. Dehaene, L. Naccache, S. Lehericy, G. Dehaene-Lambertz, M. A. Henaff and F. Michel (2000). “The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.” Brain 123 ( Pt 2): 291-307.

Cohen, L., S. Lehericy, F. Chochon, C. Lemer, S. Rivaud and S. Dehaene (2002). “Language-specific tuning of visual cortex? Functional properties of the Visual Word Form Area.” Brain 125(Pt 5): 1054-1069.

Dehaene, S., L. Naccache, L. Cohen, D. L. Bihan, J. F. Mangin, J. B. Poline and D. Riviere (2001). “Cerebral mechanisms of word masking and unconscious repetition priming.” Nat Neurosci 4(7): 752-758.

Glezer, L. S., X. Jiang and M. Riesenhuber (2009). “Evidence for highly selective neuronal tuning to whole words in the “visual word form area”.” Neuron 62(2): 199-204.

Grainger, J., B. Lete, D. Bertand, S. Dufau and J. C. Ziegler (2012). “Evidence for multiple routes in learning to read.” Cognition 123(2): 280-292.

Hasson, U., I. Levy, M. Behrmann, T. Hendler and R. Malach (2002). “Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.” Neuron 34(3): 479-490.

Jaderberg, M., K. Simonyan, A. Vedaldi and A. Zisserman (2016). “Reading Text in the Wild with Convolutional Neural Networks.” International Journal of Computer Vision 116(1): 1-20.

Liu, C., W. T. Zhang, Y. Y. Tang, X. Q. Mai, H. C. Chen, T. Tardif and Y. J. Luo (2008). “The Visual Word Form Area: evidence from an fMRI study of implicit processing of Chinese characters.” Neuroimage 40(3): 1350-1361.

Polk, T. A. and M. J. Farah (2002). “Functional MRI evidence for an abstract, not perceptual, word-form area.” J Exp Psychol Gen 131(1): 65-72.

Riesenhuber, M. and T. Poggio (2002). “Neural mechanisms of object recognition.” Curr Opin Neurobiol 12(2): 162-168.

Viola, P. and M. J. Jones (2004). “Robust real-time face detection.” International journal of computer vision 57(2): 137-154.

 

 

Commentary on “A Cortical Circuit for Sexually Dimorphic Oxytocin-Dependent Anxiety Behaviors” (Li et al., 2016)

NEUR 315

Heankel Oliveros

Screen Shot 2017-12-20 at 10.48.42 PM
Article: Li, K., M. Nakajima, I. Ibanez-Tallon and N. Heintz (2016). “A Cortical Circuit for Sexually Dimorphic Oxytocin-Dependent Anxiety Behaviors.” Cell 167(1): 60-72.e11.

The authors of this paper investigated a group of cortical neurons that modulate both social and emotional states differently in male and female mice. Previously, these researchers identified oxytocin receptor interneurons (OXtrINs) in the medial prefrontal cortex (mPFC) and discovered that these neurons facilitate social behavior in females during their estrus phase by acting on layer 5 pyramidal neurons. In contrast, despite males and females having an equivalent number and distribution of OXtrINs, the photostimulation of OxtrINs or administration of OXT in the mPFC had no effect on the social behavior of male mice.

In this study, the researchers focused on the regulation of anxiety-related behaviors by OXtrINs in males and females. They approached this question by photostimulating OXtrINs expressing channelrhodopsin. They transfected these neurons by doing a bilateral stereotactic injection of Cre-dependent AAV virus encoding channelrhodopsin in the mPFC of Oxtr-Cre mice. Then these mice were tested in three behavioral tasks. The first one, the three-chamber social interaction test, examined the preference of animals to spend time with a novel object or a new mouse. The other two tests, the open field (OF) and the elevated plus maze (EPM), aimed to assay the anxiety of mice. Consistent with previous experiments, females displayed social preference when OXtrINs were activated using blue light. In contrast, the stimulation of OXtrINs had no changes in social preference in male mice. On the other hand, when OXtrINs were stimulated during the EPM and OF tests, males showed a decrease in anxiety-related behavior. That is, they spent more time in the center arena of the OF and explored open arms more frequently. Females, however, did not display any changes in behavior upon OXtrINs activation.

Once the authors confirmed that OXtrINs were candidates for the modulation of anxiety-related behaviors in male mice, they proceeded to determine whether the anxiolytic effect of these neurons required OXT/OXTR signaling. To test the necessity of oxytocin signaling, the authors injected an AAV-expressing Cre to delete the OXT receptor gene (Oxtr). The controls were injected with AAV-expressing GFP. After behavioral testing, the authors observed a strong anxiogenic effect in male mice whereas controls and females did not show any changes in anxiety-related behavior.

In the second half of the article, the authors investigated the pathway by which OXtrINs activation modulates local neurons to produce anxiolytic effects. They approached this question by doing whole-cell recordings of neurons in layers 2/3 and 5 during photostimulation of OXtrINs using optogenetics. The authors discovered that males had a larger inhibitory postsynaptic current (IPSCs) in 2/3 pyramidal neurons, whereas females showed larger IPSCs in layer 5 neurons. In terms of excitatory postsynaptic currents (EPSCs), neurons in layer 2/3 were similar for both males and females, whereas layer 5 neurons displayed larger amplitudes for females. The conclusion from this experiment was that the activation of GABAergic OXtrINs results in different inhibitory effects in the local circuit activity of females and males.

Next, the authors gained more insight into the signaling pathways in these circuits by examining the translated mRNAs in OXtrINs. To analyze OXtrINs-specific mRNAs, they used a technique called TRAP (Translating Ribosome Affinity Purification). In short, TRAP works by isolating EGFP-tagged ribosomes and examining the mRNA in them. TRAP localized cell-specific ribosomes by expressing EGFP under a conditional promoter in Oxtr-cre mice. After sequencing the obtained RNA, the authors focused on the top ten most highly enriched translated mRNAs in OXtrINs. From these top candidates, the researchers decided to focus on corticotropin-releasing factor-binding protein (CRHBP) because it binds to corticotropin-releasing hormone (CRH) with high affinity and inactivates it action on CRH receptors.

After discovering a high expression of Crhbp in OXtrINs, the authors inferred that the anxiogenic action of CRH on the mPFC could be regulated by the production of CRHBP when OXtrINs are activated in males. To test this hypothesis, the authors performed electrophysiological recordings in layer 2/3 pyramidal neurons and applied a CRH bath to female and male slices. The results showed that male slices treated with CRH had an increase in action potentials. These CRH-generated spikes were suppressed by co-application of CRHR1 antagonist. In contrast, the spiking activity of neurons in female slices had a very small increase that was insensitive to CRHR1 antagonist. In addition, layer 5 neurons were insensitive to CRH treatment in both males and females. This experiment demonstrated that male layer 2/3 neurons were more sensitive to CRH compared to females and that the activation of Crhr1 receptor was responsible for this neural activity. Next, the authors performed electrophysiology again from layer 2/3 neurons, but this time they stimulated OXtrINs using optogenetics. The recordings showed that the CRH-induced activity of male slices strongly decreased during OXtrINs photostimulation. Taken together, these results demonstrated that the activation of OXtrINs may decrease the response of layer 2/3 neurons to CRH.

Finally, the authors used a conditional knockdown of crhbp in OXtrINs by injecting a lentiviral construct to express shRNAs for Crhbp in Cre-positive OXtrINs only. They had controls expressing EGFP instead of the shRNA cassette. Then the animals were tested in the OF and EPM tests. The results demonstrated that anxiety-like behaviors were not affected in transfected female mice whereas males did show an increase in anxiety-like behavior compared to controls. In addition, female sociosexual behavior was not changed by this shRNAs knockdown. These results suggest that CRHBP synthesis from OXtrINs regulates anxiety in males only. Considering their electrophysiological recordings, the anxiogenic action of CRH in layer 2/3 neurons is likely to be decreased by the production of CRHBP from OXtrINs in male mice. The authors conclude by suggesting that the higher CRH levels in females compared to males might be responsible for the insensitivity of these cortical neurons to the production of CRHBP from OXtrINs.

In general, I think that this article includes good controls and behavioral tests that support their cortical model for oxytocin-dependent anxiety behaviors. For instance, they tested females during different phases of their estrous cycle to make sure that the anxiety-related behaviors regulated by this cortical circuitry were not dependent on other hormonal levels. In addition, they used two tests for anxiety-related behaviors to support their conclusions and they also examined the sociosexual behavior of animals whenever they did a genetic manipulation.  Moreover, their CRH hypothesis is supported by their experiments (electrophysiology and TRAP/RT-PCR) because they showed that 1) CRH did not change the neural activity of pyramidal neurons in the mPFC of females, and 2) CRH levels are higher in females. The only limitation that I see in this study is that the anxiolytic effects of this circuitry were not tested in fear conditioning. One could argue that the findings in this study are not related to anxiety, but rather to an impairment in risk assessment, so the animals are more likely to explore the open arms and the center of the open field because they do not process the risk that such actions imply. In fact, the mPFC is also involved in risk assessment (Xue et al., 2009). By testing this circuitry in fear conditioning, we could eliminate the confounding variable of risk assessment because animals have already learned to fear a stimulus. However, the involvement of CRH in this circuit is a strong indicator that this is indeed an anxiety-related neural circuit.

In future experiments, the authors could investigate how the levels of CRH might mediate sex differences in this circuit. One approach would be to regulate CRH levels by targeting Crh in the paraventricular hypothalamus, which was reported to have a very high expression of CRH in females. The authors could perform a conditional knockdown of this gene by using a similar siRNA technique as they did on this paper. They could put their CRH-siRNA under a tissue-specific promoter for the targeted cells in the hypothalamus. They could test females (OF and EPM) early in development and in adulthood in order to see if there is a critical period for the formation of this sexually dimorphic anxiolytic circuit. This type of experiment would allow us to explore how CRHBP is able to have different regulatory actions on cortical circuits that do not have any evident differences in neuroanatomy and mRNA profiles related to CRH.

 

References

Li, K., M. Nakajima, I. Ibanez-Tallon and N. Heintz (2016). “A Cortical Circuit for Sexually Dimorphic Oxytocin-Dependent Anxiety Behaviors.” Cell 167(1): 60-72.e11.

Xue G, Lu Z, Levin IP, Weller JA, Li X, Bechara A (2009) Functional dissociations of risk and reward processing in the medial prefrontal cortex. Cereb Cortex 19:1019-1027.