Rooting for the machines

Split-Merge HMMs

2020-05-19T00:00:00-04:00

HMM: the early years

Back in Spring 2015, when I had just started as a postdoc at Princeton, Janice Chen was wrestling with a data analysis problem in a new kind of dataset she had collected. She had fMRI data from subjects watching an hour-long movie and then freely recalling (over tens of minutes) the narrative. She wanted to see whether people’s brains during recall looked like they were replaying activity patterns during movie-watching - was it be possible to track which part of the movie people were thinking about during each moment of recall? I was absolutely captivated by this experiment, which broke all the rules about how you were supposed to collected fMRI data, especially since people were talking while being scanned (which conventional wisdom said should ruin your data). So I volunteered to help with analysis, which started as a side project and eventually turned into the main focus of my years at Princeton.

What we came up with was a Hidden Markov Model (HMM), which models brain activity while experiencing or remembering a story as proceeding through an ordered sequence of states, each corresponding to some event in the story. It turned out that in addition to movie-recall alignment, this model could do a bunch of other things as well, such as figuring out how to divide a story into events or detect anticipation of upcoming events in the story, and along with the paper describing the results we also released a python code for the HMM as part of the brainIAK toolbox. My lab and others have continued finding uses for this model, like this recent super-exciting preprint from James Antony.

My blog (when I remember to actually post things) usually is intended to give non-technical explanations of research, but in this post I’m going to get deeper into a) how the HMM finds event boundaries, and b) a recent update I made to the brainIAK code that improves how this fitting process works.

How the HMM fits data

Let’s look at a tiny simulated dataset, with 10 voxels and 20 timepoints:

You can see visually where the event boundaries are - these are the timepoints (7 to 8, and 16 to 17) where the spatial pattern of activity across voxels suddenly shifts to a new stable pattern.

The HMM is using a probabilistic model to try to estimate a) what the patterns for each event look like, and b) which event each timepoint belongs to. This is a chicken-and-egg problem, since it is hard to cluster timepoints into events without knowing what they look like (and the boundaries between events are usually much less obvious in real datasets than in this toy example). The way the HMM gets started is by using its prior estimate of where events are likely to be. Let’s plot these prior probabilities as black lines, on top of the data:

The HMM is certain that the first timepoint is in the first event and the last timepoint is in the last event, and the timepoints around in the middle are most likely to be in the second event. This prior distribution comes from summing over all possible sets of event boundaries - if we wrote down every possible way of slicing up these 20 timepoints into 3 events, timepoint 10 would be in the second event about in about half of these.

Now that we have this initial guess of which events belong to each timepoint, we can make a guess about what each event’s pattern looks like. We can then use these patterns to make a better assignment of timepoints to events, and keep alternating until our guesses aren’t getting any better. Here is an animation showing this fitting procedure, with the event probability estimates on the top and the event voxel pattern estimates on the bottom:

We can see that the HMM can perfectly find the true boundaries, shifting the prior distributions to line up with the underlying data. Note that the HMM doesn’t explicitly try to find “event boundaries,” it just tries to figure out which event each timepoint is in, but we can pull out event boundaries from the solution by looking for where the event label switches.

How to confuse the (original) HMM

This original HMM has been shown empirically to work well on a number of different datasets, as mentioned above. The fitting procedure, however, isn’t guaranteed find the best solution. One thing the original HMM has trouble with is if the true event lengths are very far from the prior, with some events much smaller than others. For example, here is another simulated dataset:

Here the first event is very long, which means that starting with the prior as our initial guess is not great. The HMM thinks that timepoint 13, for example, is much more likely to be in event 2 or 3 instead of event 1. When we start with this initial guess and run the fitting, here’s what happens:

The HMM correctly figured out that there is an event boundary between timepoints 13 and 14, but missed the other transition between 16 and 17. The problem is the event patterns for events 1 and 2 accidentally latch onto the same event, forcing event pattern 3 to cover the last two events. Once this starts happening, the model has no way to recover and re-allocate its event patterns. How can we give the HMM a way to escape from its bad decisions?

Split-Merge HMM to the rescue

In the new version of brainIAK, I’ve now added a split_merge option to the EventSegment class. If enabled, this forces the HMM to try reallocating its events at every step of fitting, by finding a) neighboring pairs of events with very similar patterns, indicating that they should be merged, and b) events that could be split in half into two very different-looking events. It checks to see if it can find a better solution by simultaneously merging one of the pairs of (a) and splitting one of the events from (b), to keep the same number of events overall. The number of different combinations the HMM tries is controlled by a split_merge_proposals parameter (defaults to 1).

This will come at a cost of extra computational time (which will increase even more with more split_merge_proposals) - does this extra flexibility lead to better solutions? Let’s try fitting the simulated data with very uneven events again:

Near the end of fitting the HMM realizes that the first two events can be merged, freeing up an extra event to split the final six timepoints into two separate events, as they should be. You can also see the event patterns for events 2 and 3 jump rapidly when it performs this split-merge.

Testing on real data

This proof-of-concept shows that using split-merge can help on toy datasets, but does it make a difference on real fMRI data? I don’t have a conclusive answer to this question - if you are interested, try it out and let me know!

I did try applying both HMM variants to some real fMRI data from the brainIAK tutorial. This is group-average data from 17 subjects watching the 50 minutes of Sherlock, downsampled into 141 coarse regions of interest. Fitting the original and split-merge HMMs using 60 events and then comparing to human-annotated boundaries, the original HMM is able to find 18 out of 53 boundaries (p=0.01 by permutation test), while the split-merge HMM is able to find 21 (p=0.002). Using split-merge seems to help a bit at the beginning of the show, where observers label many short events close together. Here is a plot of the first 12 minutes of the data, comparing the HMM boundaries to the human-annotated ones:

Both of the HMM variants are doing a decent job finding human-annotated boundaries, but the split-merge HMM is able to find some extra event boundaries (black boxes) that are perhaps too small for the original HMM to find.

More extensive testing will need to be done to understand the kinds of data for which this can help improve fits, but if you are willing to spend a little more time fitting HMMs then give split-merge a try!

Code to produce the figures in this post:

#%% Imports
from brainiak.eventseg.event import EventSegment
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import matplotlib.patches as patches
import deepdish as dd
import numpy as np
from scipy import stats


def generate_data(event_labels, noise_sigma=0.1):
    n_events = np.max(event_labels) + 1
    n_voxels = 10
    event_patterns = np.random.rand(n_events, 10)
    data = np.zeros((len(event_labels), n_voxels))
    for t in range(len(event_labels)):
        data[t, :] = event_patterns[event_labels[t], :] +\
                     noise_sigma * np.random.rand(n_voxels)
    return data


def plot_data(data, prob=None, event_patterns=None, create_fig=True):
    if create_fig:
        if event_patterns is not None:
            plt.figure(figsize=(6, 6))
        else:
            plt.figure(figsize=(6, 3))
    if event_patterns is not None:
        plt.subplot(2,1,1)
    data_z = stats.zscore(data.T, axis=0)
    plt.imshow(data_z, origin='lower')
    plt.xlabel('Time')
    plt.ylabel('Voxels')
    plt.xticks(np.arange(0, 19, 5))
    plt.yticks([])
    if prob is not None:
        plt.plot(9.5*prob/np.max(prob), color='k')

    if event_patterns is not None:
        plt.subplot(2,1,2)
        plt.imshow(stats.zscore(event_patterns, axis=0),
        	       origin='lower')
        plt.xlabel('Events')
        plt.ylabel('Voxels')
        n_ev = event_patterns.shape[1]
        plt.xticks(np.arange(0, n_ev),
                   [str(i) for i in range(1, n_ev+1)])
        plt.yticks([])
        plt.clim(data_z.min(), data_z.max())


def animate_fit(f, fname):
    plt.figure(figsize=(6, 6))
    frames = np.unique(np.round(np.logspace(0, 2.5, num=20)))
    anim = FuncAnimation(plt.gcf(), f, frames=frames, interval=300)
    anim.save(fname, dpi=80, writer='imagemagick')


def human_match(bounds, human_bounds, nTR, nPerm=1000, threshold=3):
    event_counts = np.diff(np.concatenate(([0], bounds, [nTR])))
    perm_bounds = bounds

    match = np.zeros(nPerm + 1)
    for p in range(nPerm + 1):
        for hb in human_bounds:
            if np.any(np.abs(perm_bounds - hb) <= threshold):
                match[p] += 1
        perm_counts = np.random.permutation(event_counts)
        perm_bounds = np.cumsum(perm_counts)[:-1]

    return match[0],  np.mean(match >= match[0])


def fit(t):
    plt.clf()
    es = EventSegment(3, n_iter=t)
    es.fit(data)
    plot_data(data, es.segments_[0], es.event_pat_, create_fig=False)


def fit_split_merge(t):
    plt.clf()
    es = EventSegment(3, n_iter=t, split_merge=True)
    es.fit(data)
    plot_data(data, es.segments_[0], es.event_pat_, create_fig=False)


def plot_bounds(bounds, n):
    w = 1
    for b in bounds:
        plt.gca().add_patch(patches.Rectangle(
        (b-w/2, n), 2, 1, color='C%d' % n))


#%% Simulation #1
event_labels = np.array([0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,2,2,2])
np.random.seed(0)
data = generate_data(event_labels)
plot_data(data)
plt.show()

#%% Plot prior
es_prior = EventSegment(3)
prior = es_prior.model_prior(len(event_labels))[0]
plot_data(data, prior)
plt.text(1.5, 8.8, 'Event 1')
plt.text(8.2, 4.3, 'Event 2')
plt.text(15.5, 8.8, 'Event 3')
plt.show()


#%% Fitting simulation #1
animate_fit(fit, 'fit.gif')


#%% Simulation #2
event_labels = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,2,2,2])
np.random.seed(0)
data = generate_data(event_labels)
es_prior = EventSegment(3)
prior = es_prior.model_prior(len(event_labels))[0]
plot_data(data, prior)
plt.text(1.5, 8.8, 'Event 1')
plt.text(8.2, 4.3, 'Event 2')
plt.text(15.5, 8.8, 'Event 3')
plt.show()

#%% Fitting simulation #2
animate_fit(fit, 'fit_noms.gif')

#%% Fitting simulation #2 with merge/split
animate_fit(fit_split_merge, 'fit_ms.gif')

#%% Real fMRI data
sherlock = dd.io.load('sherlock.h5')
data = sherlock['BOLD'].mean(2).T
human_bounds = sherlock['human_bounds']
plt.figure(figsize=(10,3))
data_z = stats.zscore(data.T, axis=0)
plt.imshow(data_z[:20,:100], origin='lower')
plt.xlabel('Time')
plt.ylabel('Regions')
plt.xticks(np.arange(0, 100, 10))
plt.yticks([])
plt.show()

#%% Fitting real fMRI data
es = EventSegment(60)
es.fit(data)
no_ms_bounds = np.where(np.diff(np.argmax(es.segments_[0], axis=1)))[0]

es_ms = EventSegment(60, split_merge=True)
es_ms.fit(data)
ms_bounds = np.where(np.diff(np.argmax(es_ms.segments_[0], axis=1)))[0]

#%% Plots and stats
print(human_match(no_ms_bounds, human_bounds, data.shape[0]))
print(human_match(ms_bounds, human_bounds, data.shape[0]))

plt.figure(figsize=(6,3))
plt.axis([0, 480, 0, 3])
plot_bounds(human_bounds, 0)
plot_bounds(ms_bounds, 1)
plot_bounds(no_ms_bounds, 2)
plt.xlabel('Timepoints')
plt.yticks([0.5, 1.5, 2.5], ['Human', 'Split-Merge HMM', 'Original HMM'])

for i in range(len(human_bounds)):
    if np.any(np.abs(ms_bounds - human_bounds[i]) <= 3) and \
       not np.any(np.abs(no_ms_bounds - human_bounds[i]) <= 3) and \
       human_bounds[i] > 140:
        hb = human_bounds[i]
        plt.gca().add_patch(patches.Rectangle(
            (hb - 6, 0), 12, 2, color='k', fill=False))

plt.show()

Death by puppies - tenure-track year one

2019-06-06T00:00:00-04:00

The end of this month will mark the end of my first year as a tenure-track assistant professor. I don’t know if I have much helpful advice to give, since I’m still new enough to the job that it’s hard for me to know what I’ve been doing right or wrong, and I’m very grateful to my collaborators and colleagues in my department for bearing with me as I’ve bumbled my way through my new responsibilities this year. But I do think I can shed some light on what junior faculty life is like, especially for grad students or postdocs who are, like I was, spending a lot of time trying to figure out whether academia is a career path worth attempting. (Though I admire those who were able to easily make up their minds - in grad school I asked a fellow student whether he wanted to pursue a tenure-track job, and he responded with a string of profanity that I will just translate here as “No.”)

I think this video sums up pretty well how this year has gone:

Almost all of the responsibilities in this job are things I love doing - there are just a lot of them, usually way too many to possibly be completed by one person. Being buried by tons of great things is quite a nice place to be, as long as you don’t mind the chaos:

Mentoring trainees

I devote the biggest chunk of my time to working with the trainees in lab, which basically means that I get to brainstorm experiments, debug analyses, and talk about science with super-smart people for most of the day. Recruiting lab members seemed like a huge gamble when I started last year (can I know if I want to work closely with someone for 5+ years after meeting them for one day?), but I’m amazed at how much my mentees have accomplished: two were accepted into top PhD programs, we developed multiple new experimental paradigms from scratch, submitted abstracts, drafted a review article, and set up half a dozen pieces of new equipment.

Teaching

Groaning about (and trying to get out of) teaching is a favorite pastime for research-oriented faculty, but honestly I’ve loved teaching more than I expected. I’ve been able to design two of my own seminars and modify an existing course, so most of the time I get to talk about things that I care about, and I’ve been able to convince students to care about them too! Also, positive teaching evaluation comments are some of the most meaningful and validating bits of praise I’ve ever gotten in my career.

Faculty/trainee recruiting

As a well-known research institution, we get bombarded with applications from prospective PhD students, postdocs, and faculty, and I’ve spent a great deal of time interviewing candidates and attending job talks. Speaking with all these enthusiastic current and future scientists is bittersweet, since more extremely-well-qualified people apply for positions than we could ever accommodate. This is especially true for faculty searches - multiple times this year I’ve been interviewing candidates who are objectively more accomplished researchers than I am, and most didn’t end up with an offer.

Grants

Writing grants was always presented to me as the major downside of a faculty position, and there is certainly a lot of stress involved - suddenly I am responsible for running a small business on which multiple people depend for their salaries, and the process by which grants are evaluated is unpredictable at best. But I’ve found putting together grants to actually be quite useful for thinking more long-term about my research goals, and for providing opportunities to build new collaborations and connect with other faculty members.

Research

I’m still trying to carve out some time each week to do at least a little of my own research, writing some analysis code or testing out ideas that I could pass on to trainees if they seem promising. Looking at senior faculty it seems like this will probably gets squeezed out of my schedule at some point, but right now I still look forward to spending some time with my headphones on fiddling with python code.

Talks and writing

At the end of the day, the primary way I’ll be evaluated is based on how productively I get my lab’s research out into the world in talks and papers. Effective speaking and writing is a hard, time-consuming process, and even as I’ve become much better at it over the years I still don’t know how to do it quickly. It is some consolation to me that even professional authors haven’t found any other way to communicate ideas aside from repeatedly writing the wrong thing and crossing it out, until finding something that works.

Administration and departmental service

Not having a boss in the traditional sense is great in many, many ways, but the downside is that it means that a lot of paperwork tends to flow in my direction. There are also a bunch of advisory and committee jobs in the department that need to get done, but which no one particularly wants to do - luckily my department has been pretty good about insulating junior faculty from these, so I haven’t had much of this dumped on my plate so far.

Back to the dog pile!

Building the present from the past

2018-11-08T00:00:00-05:00

All airports, he had long ago decided, look very much the same. It doesn’t actually matter where you are, you are in an airport: tiles and walkways and restrooms, gates and newsstands and fluorescent lights. This airport looked like an airport.

-Neil Gaiman, American Gods

Each scene of a movie (or paragraph of a story) generates a pattern of activity in the viewer’s brain, and I showed in my last paper that changes in these activity patterns correspond to new events happening in the story. But it is still mysterious exactly what information the brain is keeping track of in these activity patterns. We know that movie and audiobook versions of the same story generate similar patterns in many brain regions, which means that the information must be pretty abstract. Can we push this even farther? Can we find pattern similarities between different narratives that describe events from the same “template”?

In my new paper we studied a kind of template called an event script, which describes a typical sequence of events that occurs in the world. For example, when you walk into a restaurant you have detailed expectations about what is going to happen next - you are going to be seated at a table, then given menus, then order your food, and then the food will come. Our hypothesis was that some brain regions would track this script information, and should look similar for any story about restaurants regardless of the specific characters and storyline of this particular narrative.

We showed subjects movies and audiobooks of stories taking place in restaurants and stories taking place in airports. Here are examples of two of the restaurant movies (all of the stories are publicly available here):

We then looked for brain regions that seemed to be tracking the restaurant or airport script during all of the stories, and found a whole network of regions:

An especially important region here is the medial prefrontal cortex, which is the part of your brain right behind the middle of your forehead. We found that this region only tracked the script when it made logical sense - if we scrambled the order of the script (e.g. showed people getting food before they ordered) then it no longer bothered to track what was going on.

I also used some of the analysis tools I previously developed for matching up brain data during perception and recall of the same story to find correspondences between different stories with the same script template. For example, I can use brain activity from the superior frontal gyrus to figure out which parts of the “Up in the Air” story and the “How I Met Your Mother” story take place in the same places in the airport. Here are snippets of the stories that show similar patterns of brain activity:

Script event	Up in the Air	How I Met Your Mother
Enter aiport	Ryan looked up from his boarding pass and sighed. There was his new partner Natalie, awkwardly climbing out of a taxi at the curb of the airport.	Barney and Quinn walked into the airport pulling their matching pink tiger-stripe suitcases. Barney leaned over and kissed her on the cheek.
Airport security	Ryan shrugged. “Look at the other lines. I never get behind people traveling with infants - I’ve never seen a stroller collapse in less than twenty minutes. Old people are worse - their bodies are littered with hidden metal and they never seem to appreciate how little time they have left on earth.”	The guard motioned to several others for backup. “Sir, you need to open this box.” “Oh, I can’t do that. Magician’s code. A magician never reveals his tricks. The only person I could possibly reveal the trick to is another magician.”
Boarding gate	He stopped halfway to their gate and pointed at a luggage store. “If you’re going to be flying with me, you need to get a carry-on bag. You know how much time you lose by checking in?”	She tried to interrogate him as they sat in front of the gate, but he refused to spill the beans. “I told you, magician’s code.”
On plane	“I like my own stuff. Don’t you like feeling connected to home?” Ryan laughed. “This is where I live. All the things you probably hate about traveling - the recycled air, the artificial lighting - are warm reminders that I am home.”	In its center was a diamond ring, which Barney plucked from the flower and held out to Quinn. “Quinn, will you marry me?”

I’m currently analyzing some additional data from these subjects, collected while they tried to retell all 16 stories from memory. Our hypothesis is that these script templates should also be useful when trying to remember events, since they give us clues about what kinds of events to search for in our memories. I’m also fascinated by how these scripts get learned, and am hoping to study this learning process both in adults (who are learning a new scripts in the lab) and in children (who are learning real scripts over the course of years).

The three ingredients of reproducible research

2018-03-02T00:00:00-05:00

Much of the conversation about research methods in science has focused on the “replication crisis” - the fact that many classic studies (especially in psychology) are often not showing the same results when performed carefully by independent research groups. Although there are some debates about exactly how bad the problem is, a consensus is emerging about how to improve the ways we conduct experiments and analyses: pre-registering study hypotheses before seeing the data, using larger sample sizes, being more willing to publish (informative) null results, and maybe being more conservative about what evidence counts as “proof.”

But there is actually an even simpler problem we haven’t fully tackled, which is not “replicability” (being able to get the same result in new data from a new experiment) but “reproducibility” - the ability to demonstrate how we got the result from the original data in the first place. Being able trace and record the exact path from data to results is important for documenting precisely how the analysis works, and allows other researchers to examine the details for themselves if they are skeptical. It also makes it much easier for future work (either by the same authors or others) to keep analyses comparable across different experiments.

Describing how data was analyzed is of course supposed to be one of the main points of a published paper, but in practice it is almost impossible to recreate the exact processing pipeline of a study just from reading the paper. Here are some real examples that I have experienced firsthand in my research:

Trying to replicate a results from papers that used a randomization procedure called phase scrambling, I realized that there are actually at least two ways of doing this scrambling and papers usually don’t specify which one they use
Confusion over exactly what probability measure was calculated in a published study set off a minor panic when the study authors started to think their code was wrong, before realizing that their analysis was actually working as intended
Putting the same brain data into different versions of AFNI (a neuroimaging software package) can produce different statistical maps, due to a change in the way the False Discovery Rate is calculated
A collaborator was failing to reproduce one of my results even with my code - turned out that the code worked in MATLAB versions 2015b and 2017b but not 2017a (for reasons that are still unclear)

These issues show that reproducible research actually requires three pieces:

Publicly available data
Open-source code
A well-defined computing environment

The first two things we know basically how to do, at least in theory - data can be uploaded to a number of services that are typically free to researchers (and standards are starting to emerge for complex data formats like neuroimaging data), and code can be shared (and version-controlled) through platforms like GitHub. But the last piece has been mostly overlooked - how can we take a “snapshot” of all the behind-the-scene infrastructure, like the programming language version and all the libraries the code depends on? This is honestly often the biggest barrier to reproducing results - downloading data and code is easy, but actually getting the code to run (and run exactly as it did for the original analysis) can be a descent into madness, especially on a highly-configurable linux machine.

For my recent preprint, I tried out a possible solution to this problem: an online service called CodeOcean. This platform allow you to create an isolated “capsule” that contains your data, your code, and a description of the programming environment (set up with a simple GUI). You can then execute your code (on their servers), creating a verified set of results - the whole thing is then labeled with a DOI, and is publicly viewable with just a browser. Interestingly the public capsule is still live, meaning that anyone can edit the code and click Run to see how the results change (any changes they make affect only their own view of the capsule). Note that I wouldn’t recommend blindly clicking Run on my capsule since the analysis takes multiple hours, but if you’re interested in messing with it you can edit the run.sh file to only conduct a manageable subset of the analyses (e.g. only on a single region of interest). CodeOcean is still under development, and there are a number of features I haven’t tried yet (including the ability to run live Jupyter Notebooks, and a way to create a simple GUI for exposing parameters in your code).

For now this is set up as a post-publication (or post-preprint) service and isn’t intended for actually working on the analyses (the computing power you have access to is limited and has a quota), but as cloud computing continues to become more convenient and affordable I could eventually see entire scientific workflows moving online.

Live-blogging SfN 2017

2017-11-16T00:00:00-05:00

[I wrote these posts during the Society for Neuroscience 2017 meeting, as one of the Official Annual Meeting Bloggers. These blog posts originally appeared on SfN’s Neuronline platform.]

SuperEEG: ECoG data breaks free from electrodes

The “gold standard” for measuring neural activity in human brains is ECoG (electrocorticography), using electrodes implanted directly onto the surface of the brain. Unlike methods that measure blood oxygenation (which have poor temporal resolution) or that measure signals on the scalp (which have poor spatial resolution), ECoG data has both high spatial and temporal precision. Most of the ECoG data that has been collected comes from patients who are being treated for epileptic seizures and have had electrodes implanted in order to determine where the seizures are starting.

The big problem with ECoG data, however, is that each patient typically only has about 150 implanted electrodes, meaning that we can only measure brain activity in 150 spots (compared to about 100,000 spots for functional MRI). It would seem like there is no way around this - if you don’t measure activity from some part of the brain, then you can’t know anything about what is happening there, right?

Actually, you can, or at least you can guess! Lucy Owen, Andrew Heusser, and Jeremy Manning have developed a new analysis tool called SuperEEG, based on the idea that measuring from one region of the brain can actually tell you a lot about another unmeasured region, if the two regions are highly correlated (or anti-correlated). By using many ECoG subjects to learn the correlation structure of the brain, we can extrapolate from measurements in a small set of electrodes to estimate neural activity across the whole brain.

Figure from their SfN poster

This breaks ECoG data free from little islands of electrodes and allows us to carry out analyses across the brain. Not all brain regions can be well-estimated using this method (due to the typical placement locations of the electrodes and the correlation structure of brain activity), but it works surprisingly well for most of the cortex:

This could also help with the original medical purpose of implanting these electrodes, by allowing doctors to track seizure activity in 3D as it spreads through the brain. It could even be used to help surgeons choose the locations where electrodes should be placed in new patients, to make sure that seizures can be tracked as broadly and accurately as possible.

Hippocampal subregions growing old together

To understand and remember our experiences, we need to think both big and small. We need to keep track of our spatial location at broad levels (“what town am I in?”) all the way down to precise levels (“what part of the room am I in?”). We need to keep track of time on scales from years to fractions of a second. We need to access our memories at both a coarse grain (“what do I usually bring to the beach?”) and a fine grain (“remember that time I forgot the sunscreen?”).

Data from both rodents and humans has suggested that different parts of the hippocampus keep track of different levels of granularity, with posterior hippocampus focusing on the fine details and anterior hippocampus seeing the bigger picture. Iva Brunec and her co-authors recently posted a preprint showing that temporal and spatial correlations change along the long axis of the hippocampus - in anterior hippocampus all the voxels are similar to each other and change slowly over time, while in posterior hippocampus the voxels are more distinct from each other and change more quickly over time.

In their latest work, they look at how these functional properties of the hippocampus change over the course of our lives. Surprisingly, this anterior-posterior distinction actually increases with age, becoming the most dramatic in the oldest subjects in their sample.

The interaction between the two halves of the hippocampus also changes - while in young adults activity timecourses in the posterior and anterior hippocampus are uncorrelated, they start to become anti-correlated in older adults, perhaps suggesting that the complementary relationship between the two regions has started to break down. Also, their functional connectivity with the rest of the brain shifts over time, with posterior hippocampus decoupling from posterior medial regions and anterior hippocampus increasing its coupling to medial prefrontal regions.

These results raise a number of intriguing questions about the cause of these shifts, and their impacts on cognition and memory throughout the lifespan. Is this shift toward greater coupling with regions that represent coarse-grained schematic information compensating for degeneration in regions that represent details? What is the “best” balance between coarse- and fine-timescale information for processing complex stimuli like movies and narratives, and at what age is it achieved? How do these regions mature before age 18, and how do their developmental trajectories vary across people? By following the analysis approach of Iva and her colleagues on new datasets, we should hopefully be able to answer many of these questions in future studies.

The Science of Scientific Bias

This year’s David Kopf lecture on Neuroethics was given by Dr. Jo Handelsman, entitled “The Fallacy of Fairness: Diversity in Academic Science”. Dr. Handelsman is a microbiologist who recently spent three years as the Associate Director for Science at the White House Office of Science and Technology Policy, and has also led some of the most well-known studies of gender bias in science.

She began her talk by pointing out that increasing diversity in science is not only a moral obligation, but also has major potential benefits for scientific discovery. Diverse groups have been shown to produce more effective, innovative, and well-reasoned solutions to complex problems. I think this is especially true in psychology - if we are trying to create theories of how all humans think and act, we shouldn’t be building teams composed of a thin slice of humanity.

Almost all scientists agree in principle that we should not be discriminating based on race or gender. However, the process of recruiting, mentoring, hiring, and promotion relies heavily on “gut feelings” and subtle social cues, which are highly susceptible to implicit bias. Dr. Handelsman covered a wide array of studies over the past several decades, ranging from observational analyses to randomized controlled trials of scientists making hiring decisions. I’ll just mention two of the studies she described which I found the most interesting:

How it is possible that people can be making biased decisions, but still think they were objective when they reflect on those decisions? A fascinating study by Uhlmann & Cohen showed that subjects rationalized biased hiring decisions after the fact by redefining their evaluation criteria. For example, when choosing whether to hire a male candidate or a female candidate, who both had (randomized) positive and negative aspects to their resumes, the subjects would decide that the positive aspects of the male candidate were the most important for the job and that he therefore deserved the position. This is interestingly similar to the way that p-hacking distorts scientific results, and the solution to the problem may be the same. Just as pre-registration forces scientists to define their analyses ahead of time, Uhlmann & Cohen showed that forcing subjects to commit to their importance criteria before seeing the applications eliminated the hiring bias.
Even relatively simple training exercises can be effective in making people more aware of implicit bias. Dr. Handelsman and her colleagues created a set of short videos called VIDS (Video Interventions for Diversity in STEM), consisting of narrative films illustrating issues that have been studied in the implicit bias literature, along with expert videos describing the findings of these studies. They then ran multiple experiments showing that these videos were effective at educating viewers, and made them more likely to notice biased behavior. I plan on making these videos required viewing in my lab, and would encourage everyone working in STEM to watch them as well (the narrative videos are only 30 minutes total).

Drawing out visual memories

If you close your eyes and try to remember something you saw earlier today, what exactly do you see? Can you visualize the right things in the right places? Are there certain key objects that stand out the most? Are you misremembering things that weren’t really there?

Visual memory for natural images has typically been studied with recognition experiments, in which subjects have to recognize whether an image is one they have seen before or not. But recognition is quite different from freely recalling a memory (without being shown it again), and can involve different neural mechanisms. How can we study visual recall, testing whether the mental images people are recalling are correct?

One way option is to have subjects give verbal descriptions of what they remember, but this might not capture all the details of their mental representation, such as the precise relationships between the objects or whether their imagined viewpoint of the scene is correct. Instead, NIMH researchers Elizabeth Hall, Wilma Bainbridge, and Chris Baker had subjects draw photographs from memory, and then analyzed the contents of those drawings.

This is a creative but challenging approach, since it requires quantitatively characterizing how well the drawings (all 1,728!) match the original photographs. They crowdsource this task using Amazon Mechanical Turk, getting high-quality ratings that include: how well can the original photograph be identified based on the drawing, what objects were correctly drawn, what objects were falsely remembered as being in the image, and how close the objects were to their correct locations. There are also “control” drawings made by subjects with full information (that get to look at the image while they draw) or minimal information (just a category label) that were rated for comparison.

The punchline is that subjects can remember many of the images, and produce surprisingly detailed drawings that are quite similar to those drawn by the control group that could look at the pictures. They reproduce the majority of the objects, place them in roughly the correct locations, and draw very few incorrect objects, making it very easy to match the drawings with the original photographs. The only systematic distortion is that the drawings depicted the scenes as being slightly farther away than they actually were, which nicely replicates previous results on boundary extension.

This is a neat task that subjects are remarkably good at (which is not always the case in memory experiments!), and could be a great tool for investigating the neural mechanisms of naturalistic perception and memory. Another intriguing SfN presentation showed that is possible to have subjects draw while in an fMRI scanner, allowing this paradigm to be used in neuroimaging experiments. I wonder if this approach could also be extended into drawing comic strips of remembered events that unfold over time, or to illustrate mental images based on stories told through audio or text.

Reality, now in extra chunky

2017-08-02T00:00:00-04:00

Our brains receive a constant stream of information about the world through our senses. Often sci-fi depictions of mind-reading or memory implants depict our experiences and memories as being like a continuous, unbroken filmstrip.

From The Final Cut, 2004

But if I ask you to describe what has happened to you today, you will usually think in terms of events - snippets of experience that make sense as a single unit. Maybe you ate breakfast, and then brushed your teeth, and then got a phone call. You divide your life into these separate pieces, like how separate memory orbs get created in the movie Inside Out.

From Inside Out, 2015

This grouping into events is an example of chunking, a common concept in cognitive psychology. It is much easier to put together parts into wholes and then think about only the wholes (like objects or events), rather than trying to keep track of all the parts separately. The idea that people automatically perform this kind of event chunking has been relatively well studied, but there are lots of things we don’t understand about how this happens in the brain. Do we directly create event-level chunks (spanning multiple minutes) or do we build up longer and longer chunks in different brain regions? Does this chunking happen within our perceptual systems, or are events constructed afterwards by some separate process? Are the chunks created during perception the same chunks that get stored into long-term memory?

I have a new paper out today that takes a first stab at these questions, thanks to the help of an all-star team of collaborators: Janice Chen, Asieh Zadbood (who also has a very cool and related preprint), Jonathan Pillow, Uri Hasson, and Ken Norman.

The basic idea is simple: if a brain region represents event chunks, then its activity should go through periods of stability (within events) punctuated by sudden shifts (at boundaries between events). I developed an analysis tool that is able to find this kind of structure in fMRI data, determining how many of these shifts happen and when then happen.

The first main result is that we see event chunking in lots of brain regions, and the length of the events seems to build up from short events (seconds or less) in early sensory regions to long events (minutes) in higher-level regions. This suggests that events are an instrinsic part of how we experience the world, and that events are constructed through multiple stages of a hierarchy.

The second main result is that right at the end of these high-level events, we see lots of activity in brain regions the store long-term memories, like the hippocampus. Based on some additional analyses, we argue that these activity spikes are related to storing these chunks so that we can remember them later. If this is true, then our memory system is less like a DVR that constantly records our life, and more like a library of individually-wrapped events.

There are many (many) other analyses in the paper, which explains why it took us about two years to put together in its entirety. One fun result at the end of the paper is that people who already know a story actually start their events a little earlier than people hearing a story for the first time. This means that if I read you a story in the scanner, I can actually make a guess about whether or not you’ve heard this story before by looking at your brain activity. This guessing will not be very accurate for an individual person, so I’m not ready to go into business with No Lie MRI just yet, but maybe in the near future we could have a scientific way to detect Netflix cheaters.

Parenting the last human generation

2017-06-05T00:00:00-04:00

For most of human history, parents had a pretty good idea of the kind of world they were preparing their children for. Children would be trained to take over their parents’ business, or apprentice in a local trade, or aim for a high-status marriage. Even once children began to have more choice in their futures, it was easy to predict what kind of skills they would need to succeed: reading and handwriting, arithmetic, basic knowledge of science and history.

As technological progress has accelerated, this predictability is starting to break down. Companies like internet search engines didn’t even exist when most of Google’s 70,000 employees were born, and there is no way their parents could have guessed the kind of work they would eventually be doing. Some of the best-known musicians in the world construct songs using software, and don’t play any of the instruments that would have been offered to them in elementary school.

Given this uncertainty, what kinds of skills and interests should I encourage for my own children? Praticing handwriting, as I spent hours doing in school, would almost certainly be a waste. Same goes for mental math beyond small numbers or estimation, now that everyone carries a caculator. Given how computers are slowly seeping into every object in our house, programming seems like a safe answer, until you hear that researchers are currently building systems that can design themselves based on training examples.

Maybe in a couple decades, being creative and artistic will be more important than having STEM skills. Artificial intelligence is still pretty laughably bad at writing stories, and AI-based art tools still require a human at the helm. Even if that changes by the time my kids are starting their careers, there could still be a market for “artisan,” human-made art. Having good emotional intelligence also seems like it will always be helpful, in any world where we have to live with others and with ourselves.

As confusing as this is for me, it will be immensely harder for my children to be parents. I think of this current generation of toddlers as the last human generation - not because humanity is likely to wipe itself out within the next 20 years (though things are looking increasingly worrying on that front), but because I expect that by then humans and technology will start to become inseparable. Even now, being separated from our cell phones feels disconcerting - we have offloaded so much of our thinking, memory, and conversations to our devices that we feel smaller without them. By the time my grandchildren are teenagers, I expect that being denied access to technology will be absolutely crippling, to the point that they no longer have a coherent identity as a human alone.

When a software update could potentially make any skill obsolete, what skills should we cultivate?

Connecting past and present in visual perception

2016-10-25T00:00:00-04:00

There are two kinds of people in the world—those who divide everything in the world into two kinds of things and those who don’t.
Kenneth Boulding

Scientists love dividing the world into categories. Whenever we are trying to study more than 1 or 2 things at a time, our first instinct is to sort them into boxes based on their similarities, whether we’re looking at animals, rocks, stars, or diseases.

There have been many proposals on how to divide up the human visual system: regions processing coarse vs. fine structure, or small objects vs. big objects, or central vs. peripheral information. In my new paper, Two distinct scene processing networks connecting vision and memory, I argue that regions activated by photographic images can be split into two different networks.

The first group of scene-processing regions (near the back of the brain) care only about the image that is currently coming in through your eyes. They are looking for visual features like walls, landmarks, and architecture that will help you determine the structure of the environment around you. But they don’t try to keep track of this information over time - as soon as you move your eyes, they forget all about the last view of the world.

The second group (a bit farther forward) uses the information from the first group to build up a stable model of the world and your place in it. They care less about exactly where your eyes are pointed and more about where you are in the world, creating a 3D model of the room or landscape around you and placing you on a map of what other places are nearby. These regions are strongly linked to your long-term memory system, and show the highest activity in familiar environments.

I am very interested in this second group of regions that integrate information over time - what exactly are they keeping track of, and how do they get information in and out of long-term memory? I have a new manuscript with my collaborators at Princeton (currently working its way through the publication gaunlet) showing that these regions build abstract representations of events in movies and audio narration, and am running a new experiment looking at how event templates we learn over our lifetimes are used to help build these event representations.

How deep is the brain?

2016-07-08T00:00:00-04:00

Recent AI advances in speech recognition, game-playing, image understanding, and language translation have all been based on a simple concept: multiply some numbers together, set some of them to zero, and then repeat. Since “multiplying and zeroing” doesn’t inspire investors to start throwing money at you, these models are instead presented under the much loftier banner of “deep neural networks.” Ever since the first versions of these networks were invented by Frank Rosenblatt in 1957, there has been controversy over how “neural” these models are. The New York Times proclaimed these first programs (which could accomplish tasks as astounding as distinguishing shapes on the left side versus shapes on the right side of a paper) to be “the first device to think as the human brain.”

Deep neural networks remained mostly a fringe idea for decades, since they typically didn’t perform very well, due (in retrospect) to the limited computational power and small dataset sizes of the era. But over the past decade these networks have begun to rival human capabilities on highly complicated tasks, making it more plausible that they could really be emulating human brains. We’ve also started to get much better data about how the brain itself operates, so we can start to make some comparisons.

At least for visual images, a consensus started to emerge about what these deep neural networks were actually doing, and how it matched up to the brain. These networks operate as a series of “multiply and zero” filters, which build up more and more complicated descriptions of the image. The first filter looks for lines, the second filter combines the lines into corners and curves, the third filter combines the corners into shapes, etc. If we look in the visual system of the brain, we find a similar layered structure, with the early layers of the brain doing something like the early filters of the neural networks, and later layers of the brain looking like the later filters of the neural networks.

Zeiler & Fergus 2014, Güçlü & van Gerven 2015

It seemed like things were mostly making sense, until two recent developments:

The best-performing networks started requiring a lot of filters. For example, one of the current state-of-the-art networks uses 1,001 layers. Although we don’t know exactly how many layers the brain’s visual system has, it is almost certainly less than 100.
These networks actually don’t get that much worse if you randomly remove layers from the middle of the chain. This makes very little sense if you think that each filter is combining shapes from the previous filter - it’s like saying that you can skip one step of a recipe and things will still work out fine.

Should we just throw up our hands and say that these networks just have way more layers than the brain (they’re “deeper”) and we can’t understand how they work? Liao and Poggio have a recent preprint that proposes a possible solution to both of these issues: maybe the later layers are all doing the same operation over and over, so that the filter chain looks like this:

Why would you want to repeat the same operation many times? Often it is a lot easier to figure out how to make a small step toward your goal and then repeat, instead of going directly to the goal. For example, imagine you want to set a microwave for twelve minutes, but all the buttons are unlabeled and in random positions. Typing 1-2-0-0-GO is going to take a lot of trial and error, and if you mess up in the middle you have to start from scratch. But if you’re able to find the “add 30 seconds” button, you can just hit it 24 times and you’ll be set. This also shows why skipping a step isn’t a big deal - if you hit the button 23 times instead, it shouldn’t cause major issues.

But if the last layers are just the same filter over and over, we can actually just replace them with a single filter in a loop, that takes its output and feeds it back into its input. This will act like a deep network, except that the extra layers are occurring in time:

So Liao and Poggio’s hypothesis is that very deep neural networks are like a brain that is moderately deep in both space and time. The true depth of the brain is hidden, since even though it doesn’t have a huge number of regions it gets to run these regions in loops over time. Their paper has some experiments to show that this is plausible, but it will take some careful comparisons with neuroscience data to say if they are correct.

Of course, it seems inevitable that at some point in the near future we will in fact start building neural networks that are “deeper” than the brain, in one way or another. Even if we don’t discover new models that can learn better than a brain can, computers have lots of unfair advantages - they’re not limited to a 1500 cm³ skull, they have direct access to the internet, they can instantly teach each other things they’ve learned, and they never get bored. Once we have a neural network that is similar in complexity to the human brain but can run on computer hardware, its capabilities might be advanced enough to design an even more intelligent machine on its own, and so on: maybe the “first ultraintelligent machine is the last invention that man need ever make.” (Vernor Vinge)

The corner of your eye

2016-05-20T00:00:00-04:00

We usually think that our eyes work like a camera, giving us a sharp, colorful picture of the world all the way from left to right and top to bottom. But we actually only get this kind of detail in a tiny window right where our eyes are pointed. If you hold your thumb out at arm’s length, the width of your thumbnail is about the size of your most precise central (also called “foveal”) vision. Outside of that narrow spotlight, both color perception and sharpness drop off rapidly - doing high-precision tasks like reading a word is almost impossible unless you’re looking right at it.

The rest of your visual field is your “peripheral” vision, which has only imprecise information about shape, location, and color. Out here in the corner of your eye you can’t be sure of much, which is used as a constant source of fear and uncertainty in horror movies and the occult:

What’s that in the mirror, or the corner of your eye?
What’s that footstep following, but never passing by?
Perhaps they’re all just waiting, perhaps when we’re all dead,
Out they’ll come a-slithering from underneath the bed….
Doctor Who “Listen”

What does this peripheral information get used for during visual processing? It was shown over a decade ago (by one of my current mentors, Uri Hasson) that flashing pictures in your central and peripheral vision activate different brain regions. The hypothesis is that peripheral information gets used for tasks like determining where you are, learning the layout of the room around you, and planning where to look next. But this experimental setup is pretty unrealistic. In real life we have related information coming into both central and peripheral vision at the same time, which is constantly changing and depends on where we decide to look. Can we track how visual information flows through the brain during natural viewing?

Today a new paper from me and my PhD advisors (Fei-Fei Li and Diane Beck) is out in the Journal of Vision: Pinpointing the peripheral bias in neural scene-processing networks during natural viewing (open access). I looked at fMRI data (collected and shared generously by Mike Arcaro,Sabine Kastner, Janice Chen, and Asieh Zadbood) while people were watching clips from movies and TV shows. They were free to move their eyes around and watch as you normally would, except that they were inside a huge superconducting magnet rather than on the couch (and had less popcorn). We can disentangle central and peripheral information by tracking how these streams flow out of their initial processing centers in visual cortex to regions performing more complicated functions like object recognition and navigation.

We can make maps that show where foveal information ends up (colored orange/red) and where peripheral information ends up (colored blue/purple). I’m showing this on an “inflated” brain surface where we’ve smoothed out all the wrinkles to make it easier to look at:

This roughly matches what we had previously seen with the simpler experiments: central information heads to regions for recognizing objects, letters, and faces, while peripheral information gets used by areas that process environments and big landmarks. But it also reveals some finer structure we didn’t know about before. Some scene processing regions care more about the “near” periphery just outside the fovea and still have access to relatively high-resolution information, while others draw information from the “far” periphery that only provides coarse information about your current location. There are also detectable foveal vs. peripheral differences in the frontal lobe of the brain, which is pretty surprising, since this part of the brain is supposed to be performing abstract reasoning and planning that shouldn’t be all that related to where the information is coming from.

This paper was my first foray into the fun world of movie-watching data, which I’ve become obsessed with during my postdoc. Contrary to the what everyone’s parents told them, watching TV doesn’t turn off your brain - you use almost every part of your brain to understand and follow along with the story, and answering questions about videos is such a challenging problem that even the latest computer AIs are pretty terrible at it (though some of my former labmates have started making them better). We’re finding that movies drive much stronger and more complex activity patterns compared to the usual paradigm of flashing individual images, and we’re starting to answer questions raised by cognitive scientists in the 1970s about how complicated situations are understood and remembered - stay tuned!

Cutups and configural processing

2016-04-13T00:00:00-04:00

“The love of complexity without reductionism makes art; the love of complexity with reductionism makes science.” — E.O. Wilson

In the 1950s William S. Burroughs popularized an art form called the “cut-up technique.” The idea was to take existing stories (in text, audio, or video) and cut them up into pieces, and then recombine them into something new. His creations are a juxaposition of (often disturbing) imagery, chosen to fit together despite coming from different sources. Here’s a sample from The Soft Machine:

Police files of the world spurt out in a blast of bone meal, garden tools and barbecue sets whistle through the air, skewer the spectators - crumpled cloth bodies through dead nitrous streets of an old film set - grey luminous flakes falling softly on Ewyork, Onolulu, Aris, Ome, Osteon - From siren towers the twanging notes of fear - Pan God of Panic piping blue notes through empty streets as the berserk time machine twisted a tornado of years and centuries-

The cut-ups aren’t always coherent in the sense of having an understandable plot - sometimes Burroughs was just aiming to convey an emotion. He attributed an almost mystical quality to cut-ups, saying they could help reveal the hidden meanings in text or even serve as prophecy, since “when you cut into the present the future leaks out.” His experimental film The Cut-Ups was predictably polarizing, with some people finding it mesmerizing and others demanding their money back.

If you jump through the video a bit you’ll see that it isn’t quite as repetitive as it seems during the first minute. (I also think Burroughs would heartily approve of jumping through the movie rather than watching it from beginning to end.)

This idea of combining parts to create something new is alive and well on the internet, especially now that we are starting to amass a huge library of video and audio clips. It’s painstaking work, but there is a whole genre of videos in which clips from public figures are put together to recreate or parody existing songs, or to create totally original compositions.

Since the whole can have a meaning that is more than the sum of its parts, our brains must be somehow putting these parts together. This process is referred to as “configural processing,” since understanding what we’re hearing or seeing requires looking not just at the parts but at their configuration. Work from Uri Hasson’s lab (before I joined as a postdoc) has looked at how meaning gets pieced together throughout a story, and found a network of brain regions that help join sentences together to understand a narrative. They used stimuli very similar to the cut-ups, in which sentences were cut out and then put back together in a random order, and showed that these brain regions stopped responding consistently when the overall meaning was taken away (even though the parts were the same).

Today I (along with my PhD advisors, Fei-Fei Li and Diane Beck) have a new paper out in Cerebral Cortex, titled Human-object interactions are more than the sum of their parts (free-access link). This paper looks at how things get combined across space (rather than time) in the visual system. We were looking specifically at images containing either a person, an object, or both, and tried to find brain regions where a meaningful human-object interaction looked different from just a sum of person plus object.

In the full paper we look at a number of different brain regions, but some of the most interesting results come from the superior temporal sulcus (an area right behind the top of your ears). This area couldn’t care less about objects by themselves, and doesn’t even care much about people if they aren’t doing anything. But as soon as we put the person and object together in a meaningful way, it starts paying attention, and we can make a better-than-chance guess about what action the person is performing (in the picture you’re currently looking at) just by reading your brain activity from this region. Our current theory about this region is that it is involved in understanding the actions and intentions of other people, as I described in a previous post.

Next month I’ll be presenting at CEMS 2016 on some new work I’ve been doing with Uri and Ken Norman, where I’m trying to figure out exactly which pieces of a story end up getting combined together and how these combined representations get stored into memory. Working with real stories (like movies and TV shows) is challenging as a scientist, since usually we like our stimuli to be very tightly controlled, but these kinds of creative, meaningful stimuli can give us a window into the most interesting functions of the brain.

Interviewer: In view of all this, what will happen to fiction in the next twenty-five years?

Burroughs: In the first place, I think there's going to be more and more merging of art and science. Scientists are already studying the creative process, and I think the whole line between art and science will break down and that scientists, I hope, will become more creative and writers more scientific. [...] Science will also discover for us how association blocks actually form.

Interviewer: Do you think this will destroy the magic?

Burroughs: Not at all. I would say it would enhance it.

William S. Burroughs, The Art of Fiction No. 36 </div>

Just how Amazing is The Amazing Race?

2015-12-01T00:00:00-05:00

The Amazing Race

The Amazing Race is one of the few reality TV shows that managed to survive the bubble of the early 2000s, with good reason. Rather than just trying to play up interpersonal dramas (though there is some of that too), it is set up like a traditional game show with a series of competitions between teams of two, who travel to different cities throughout the world over the course of the show. Eleven teams start out the race, and typically the last team to finish each day’s challenges gets booted from the show until only three teams are left. These three teams then have a final day of competition, with the winner being awarded $1 million.

Winning first place on any day before the last one doesn’t matter much (though you get a small prize and some bragging rights), which is interesting, since it means that it is possible for the winning team to have never come in first place before the final leg. This got me wondering: if we think of the Race as an experiment which is trying to identify the best team, how good is it? What if we just gave teams a point for every first place win, and then saw which one got the most points, like a baseball series?

Modeling the Race

To try to answer this question, I build a simple model of the Race. I assume that each team has some fixed skill level (sampled from a standard normal distribution), and then on each leg their performance is the sum of this instrinc skill and some randomness (sampled from another normal with varying width). So every leg, the ranking of the teams will be their true skill ranking, plus some randomness (and there can be a lot of randomness on the race). Fans of the show will know that this is a very simplified model of the race (the legs aren’t totally independent, the teams can strategically interfere with each other, etc.) but this captures the basic idea. I ran simulated races 10,000 times for each level of randomness.

We can measure how effective the Race was at picking a winner, by seeing what true skill rank the winning team had. So if the team with the highest skill (number 1) wins, that means the race did a good job. If a team with a low skill rank (like 10) wins, then the race did a very bad job of picking the million-dollar winner. This plot shows the rank of the winning team, compared to chance ((1+11)/2=6).

This actually looks surprisingly good! Even at with lots of leg randomness (more than the actual skill difference between the teams) a team with a relatively high rank tends to win. Once the randomness gets to be an order of magnitude bigger than the differences between teams, the winner starts getting close to random.

Improving the Race

But how good is this relative to a simpler kind of competition, where the winner is the team with the most first-place wins? Rather than eliminating teams, all teams race all 9 legs, and the team coming in first the most wins the prize (ties are broken based on which team won most recently). Would this do better or worse?

Turns out this is a little bit better! In general the rank of the winning team tends to be higher, meaning that a “more deserving” team won the money. But the size of the gap depends on how much randomness there is in each leg of the race. Which point along these curves corresponds to the actual TV show?

To answer this, I took the per-leg rankings from the Amazing Race Wikia from the past 10 seasons. Yes, there are people way more obsessed with this show than me, who have been together databases of stats from each season. I measured how consistent the rankings were from each leg of the race. If there wasn’t any randomness, we’d expect these to have a perfect (Kendall) correlation, while if each leg is just craziness for all teams then the correlation should be near zero. I found that this correlation varied a bit across seasons, but had a mean of 0.0992. Comparing this to the same calculation from the model, this corresponds to a noise level of about sigma=2.2.

At this level of randomness, there is about a 10% advantage for counting-first-places competition: 37.4% of the time it picks a better team to win the money, while 28.5% of the time the current elimination setup picks a better team (they pick the same team 34.1% of the time).

Of course there are some disadvantages to counting first place wins: the requires all teams to run all legs (which is logistically difficult and means we get to know each team less) and the winner might be locked-in before the final leg (ruining the suspense of the grand finale they usually have set up for the final tasks). This is likely a general tradeoff in games like this, between being fair (making the right team more likely to win) and being exciting (keeping the winner more uncertain until the end). As a game show, The Amazing Race probably makes the right choice (entertainment over fairness) but for more serious matters (political debate performance?) maybe we should pay attention to the winner of each round rather than the loser.

All the MATLAB code and ranking data is available on my bitbucket.

The Big Data bust of the 1500s: lessons from the first scientific data miner

2015-05-11T00:00:00-04:00

The Big Data craze is in full swing within many scientific fields, especially neuroscience. Since we can’t hope to understand the tangled web of the brain by looking at only one tiny piece, groups have started to amass huge datasets describing brain function and connections. The idea is that, if we can get enough careful measurements all together, then we can have computers search for patterns that explain as much of the data as possible.

This approach would have made perfect sense to Tycho Brahe, a Danish astronomer born in 1546. Although people have been studying the skies since the dawn of civilization, Brahe was the first to make a detailed catalog of stellar and planetary positions.

Scientific American’s description of his research program makes it clear that this was one of the first Big Data science projects:

Brahe was a towering figure. He ran a huge research program with a castlelike observatory, a NASA-like budget, and the finest instruments and best assistants money could buy. […] Harvard University historian Owen Gingerich often illustrates Brahe’s importance with a mid-17th-century compilation by Albert Curtius of all astronomical data gathered since antiquity: the great bulk of two millennia’s worth of data came from Brahe.

Brahe then announced a model of planetary motion that fit his vast dataset exactly. You could use it to predict precisely where the stars and planets would be in the sky tomorrow. It relied on a fancy prosthaphaeresis algorithm that allowed for the computation of a massive number of multiplications. The only problem was that it was deeply, fundamentally wrong.

It was called the Geoheliocentric Model, since it proposed that the sun orbited the stationary Earth and the other planets orbited the sun. It was attractive on philosophical, scientific, and intuitive grounds (of course the Earth isn’t moving, what could possibly power such a fast motion of such a heavy object?). And it illustrates a critical problem with the data-mining approach to science: just because you have a model that predicts a pattern doesn’t mean that the model corresponds to reality.

What might be needed is not just more data, or more precise data, but new hypotheses that drive the collection of entirely different types of data. It doesn’t mean that Big Data isn’t going to be part of the solution (most neuroimaging datasets have been laughably small so far), but simply performing pattern recognition on larger and larger datasets doesn’t guarantee that we’re getting closer to the truth. The geoheliocentric model was eventually brought down not with bigger datasets, but by a targeted experiment looking at small annual patterns of stellar motion.

Interestingly, there is a clear counterexample to my argument in the work of Dan Yamins, a postdoc with Jim DiCarlo. Dan has shown that a neural network model that learns to label objects in a large set of images ends up looking a lot like the visual processing regions of the brain (in terms of its functional properties). This is surprising, since you could imagine that there might be lots of other ways to understand images.

I wonder if this works because the brain is itself a Big Data mining machine, trained up through childhood to build models of our experiences. Then finding the strongest patterns in big datasets of experiences (images, videos, audio) might come up with the same solution as the brain. Or maybe our neural network models are starting to approximate the broad functional properties of the brain, which makes them a good hypothesis-driven model for finding patterns (rather than just blind data mining). As John Wixted stressed at the CEMS conference last week, hypothesis-free data anlysis has a seductive purity, but the true value of datasets (regardless of their size) comes only through the lens of carefully constructred ideas.

Seven things I learned as a PhD student

2015-03-31T00:00:00-04:00

Doing great research is tough. There are so many factors outside of your control: experiments not panning out, unfair reviewers, competing labs, limited funding sources. I’ve tried to distill down some of the strategies that worked well for me and my labmates (these are most relevant to my background in science/engineering, but some might apply to other fields as well):

Get your hands dirty

Some early grad students get stuck in a loop of doing a lot of general talking about the kind of things they want to work on, but never really get started. Taking time to learn and plan your experiments is great, but there are a lot of things you can’t learn without diving into real data. You’re almost certainly going to mess things up the first one or two (or twenty) times, so start making mistakes as soon as possible. Having a deeper understanding of the data you’re dealing with will be invaluable in driving the kinds of questions you’ll ask and the design of your experiments.

Investigate things that don’t make sense

When you’re looking at the results of an analysis, often there will be something that just doesn’t quite line up: there’s one value of 1.01 when the maximum measurement should be 1, two data points are exactly on top of one another, 1% of the data points are giving “NaN”s. It’s easy to just brush these under the rug (“it’s just a couple datapoints”), but getting to the bottom of these is critical. Often they will reveal some flaw in your analysis that might mean all your results are invalid, or (if you’re lucky!) they might point to some new science hiding being an unexpected pattern in the data.

Explore, then replicate

The best way to approach an unfamiliar problem is to first collect (or extract from your full data) a pilot dataset, and start looking for patterns. You don’t need to be rigorous about statistics, multiple comparisons, or model complexity - what are the strongest signals you can find? Are there ways of transforming the data that make it more amenable to your models? Then, once you’ve optimized your analysis, you apply it to new (or held-out) data, and meticulously measure how well it performs. If you do your playing directly on the data, it’s very easy to start fooling yourself about what’s really there and what you just want to be there.

Realize that you’re in the big leagues

Throughout school, you’ve always been measured against your peers - your kindergarten macaroni crafts earned you a gold star because they were impressive for your experience level, not because they were competitive with a typical exhibit at the Louvre. In your first year of grad school, you are now competing with professional scientists who have been in the field for 40 years. This is intimidating (and one of the reasons why you start out being mentored by senior students and faculty members), but also exciting. You are on the front lines of scientific knowledge, answering real problems that no one has ever figured out before.

Know more than your advisor

This might sound contradictory to the previous point, since your advisor has a many-year head start in understanding your field, and you can’t hope to have more total knowledge by the end of your PhD. But for the particular project you’re working on, you should be finding papers on your own and reading everything you can. Publishing a paper that advances the field is going to require knowing more about that topic than anyone in the world, including your advisor.

Keep an end-of-the-day journal

Completing a PhD requires extremely long-term, self-guided planning, and it’s easy to lose track of what you should be working on and what the critical next steps are. Different people have different solutions for this, but my favorite strategy was to take 10 minutes at the end of the day and write (in a physical, dead-tree notebook) a couple bullet points about what I did that day and what the next steps should be. This forces you to take stock of your current goals, gives you a little morale boost (especially when you can look back over the past week and remind yourself that you really did make progress), and lets you pick up where you left off when you come back to your projects (possibly days later).

Drink Water

Taking care of your physical health is often the first thing to go when stress sets in, but this is a sure way to completely derail your research career. Drinking more water is an easy fix for most grad students - you can avoid kidney stones (I learned that the hard way), you’ll eat less junk, and having more bathroom breaks makes sure you take some breaks from your chair (no fitbit required). Some other no-brainers are to make sure you have an ergonomic typing setup (I know multiple PhDs that had to go on leave due to RSI) and keep a reasonable sleep schedule.

A simple model for growing a brain

2015-03-23T00:00:00-04:00

Today I had a chance to read a paper by Song, Kennedy, and Wang about their model for explaining how brains are wired up at a high level (explaining how areas are connected, not the detailed wiring of neural circuits). It’s very simple, but manages to capture some complicated aspects of brain structure.

The goal of the model is to offer some explanation for area-to-area connection patterns across species (humans, monkeys, mice). For example, the human connection matrix looks like this (from their supplementary material):

Gray squares show pairs of regions that (we think) are connected to each other. This looks complicated, and it is - every region has a different connection pattern, some are similar to each other (neighboring rows), some are very dissimilar, some regions (rows) have lots of connections and others have few, etc. The Song et al. paper starts by discussing the features of these matrices that seem predictable and similar across species, but the part I found more exciting was their proposed model for how you would grow a brain with these properties.

This figure from their paper shows the setup. You first randomly choose a bunch of points as the centers of brain regions and randomly create a bunch of neurons, assigning each neuron to belong to the closest region center (A-C). Then, to grow a connection out from a neuron, you pick a direction as a weighted average of the directions to all region centers (with closer regions weighted more heavily), and then grow in that direction a random amount (with short connections more likely than long connections) (D-G). That’s it! Every neuron grows without talking to any other neuron, and they are not even really aiming anywhere in particular.

This simple set of instructions is enough to produce some of the structures in real connectivity matrices, like relationships between how similar the connections are for two regions and how likely they are to be connected to one another. One of the take-aways is that spatial position is a very important feature for wiring brain regions - just adding a bias to connect close regions rather than distant regions is enough to explain a lot of the brain data. This sounds sort of obvious, but spatial position is often ignored in many analysis methods, and I’ve been recently proposing ways of incorporating spatial information for understanding connectivity at both the whole-brain and region level.

There is a lot missing from this paper - it makes a literal “spherical cow” assumption that the brain is a solid ellipse, assumes that neurons are fired in straight lines like a laser, and doesn’t account for how brain size changes during development. But in some ways it makes the result even more impressive, since they can explain a lot of the data without using any of these details.

The Hero's Powerpoint

2015-03-01T00:00:00-05:00

What makes a good story? It turns out that, even among stories about very different things (romance, aliens, animated fluffy critters saving the world) there are some elements that are often the same. This topic is fascinating in its own right, but I find that it is also of great help for preparing academic talks. The goals of a talk (to keep the listener engaged and excited) are the same as those of a storyteller, and so it can be helpful to borrow some tips from the successful stories of the past few thousand years.

The idea of a repeated story structure that has existed since antiquity (the “hero’s journey”) was first popularized by Joseph Campbell, and then later championed by Chris Vogler. What follows is my personal summary of the major parts of this framework, as an amateur cinephile. I’ve illustrated each part with images from Frozen, Iron Man, and The Matrix, which all stick closely to this formula.

Structure of the journey

The ordinary world

We’re introduced to the protagonist’s typical life. Things have been going the same way for some time, and the world is familiar. However, there are some questions or tensions lurking beneath the surface.

The breaking point

The minor or unseen issues in the world burst to the forefront, disrupting the protagonist’s world in a catastrophic way. Life can no longer continue as it has in the past.

Into the wild

As a result, the hero is forced out into an unfamiliar world (physically or metaphorically). Their previous expertise and relationships are useless here, and they have no clear plan of action.

The mentor

The protagonist meets someone in the wild who begins to guide their path. This is someone that they never would have met in their ordinary world, and whose advice they might have previously dismissed. They begin to gain insight about themselves and their new environment.

Formulating the plan

With the mentor’s help, the protagonist understands their final goal and formulates a plan to achieve it. They often embark on this quest alone, without the mentor’s help, but with new knowledge gained from their time in the wild.

The failure of the plan

Despite their careful planning, the plan fails. The hero was missing critical information, about their own abilities or the loyalties of their supposed allies. All appears lost, the hero is dead or near death, and the final goal appears impossible to recover.

Seizing the sword

Miraculously, the hero is able to recover and continues the fight. A new plan is put into action, and the protagonist discovers new sources of strength. The enemy is defeated and goal achieved, albeit at great cost.

The return

The hero returns to their ordinary world reborn, and we see the consequences of their quest. The old stability is replaced with a new one, and the hero sees old relationships in a new light.

Why we love the journey

This story structure is so common because it has a number of nice features that help keep the audience engaged:

Background information is only required at the beginning. In order to follow the story and empathize with the hero, we need to be able to understand their feelings and actions. Doing this in the “ordinary world” is difficult, since the hero knows many things that that the audience does not know. By clearly designating that most of the story takes place in unfamiliar territory, we share the hero’s confusions and assumptions, and the audience doesn’t need to be constantly brought up to speed.
There is a clear call to action. The problem faced by the hero is brought sharply to a breaking point, and there is no choice but to face it. Keeping things as they are is literally not an option.
Help from an unlikely source. There are always “typical” ways of dealing with problems, such as just getting a bigger army or working harder. The mentor teaches an entirely new approach that runs counter to traditional wisdom in some way.
The quest is near impossible. The failure of the hero’s plan emphasizes that the problem being solved is extremely difficult, and that even with all their new knowledge and training they couldn’t do it on their first try. When they do accomplish the task, it’s only through some incredible force of will.
The story has a impact beyond the protagonist. The stakes go beyond the selfish desires of the hero, and the ordinary world is no longer the same after the events of the story.

Takeaways for presentations

If we want to make a talk into a good story, we should respect these same rules. Specifically:

Background information needs to be given up front, and limited to what is strictly necessary. The audience needs to understand the current landscape as quickly as possible, and the rest of the talk should be new information that you are adding to the field.
Talks need to have as sense of urgency - it’s not enough that something simply hasn’t been studied before. Why is there an immediate, desperate need for a breakthrough in this area?
It’s more interesting when insights come from a surprising source, such as a disconnected field or forgotten research from the past. What did you bring to the table that no one has previously tried?
Talks shouldn’t shy away from showing how difficult a result was to achieve. For example, if presenting a complicated system you engineered, walk the audience through your thought process - what did you try first, and why did it fail? Problems that can be solved on the first try are generally not that interesting.
Spend time in your conclusion talking about the impact of these results on the field in general. How does this change the way people have been thinking about a problem, or enable a new line of research?

Of course not all talks can use all of these elements, and this is only a starting point (since it says nothing about how to communicate the specific technical content). But we should all strive to take the audience on a journey with us during our presentation - make it an exciting one!

Are you thinking what I'm thinking?

2015-02-24T00:00:00-05:00

Flight of the Conchords has been one of my favorite comedy bands for years, mixing general silliness with some insightful satire on life and relationships. They have a whole catalog of great songs (one of which is highly relevant to the title of this blog), but one of my favorite exchanges is at 1:00 into their song “We’re Both in Love with a Sexy Lady”:

Bret: Are you thinking what I’m thinking?

Jemaine: No, I’m thinking what I’m thinking.

Bret: So you’re not thinking what I’m thinking?

Jemaine: No, ‘cause you’re thinking I’m thinking what you’re thinking.

Both of Jemaine’s lines touch on deep concepts in cognitive science, so let’s take them one at a time:

I’m thinking what I’m thinking

Jemaine’s first objection is that of course he’s not thinking Bret’s thoughts, he’s thinking his own! It is actually very unclear exactly what it means for two people to have “the same” thought. For systems with a standardized architecture like computers, it makes perfect sense to say that a file on my computer is the same as a file on your computer - each file consists of an ordered list of 0s and 1s, and we can just compare these two lists to see if they are the same. For neural systems, however, we can’t hope to do this same kind of comparison. It seems implausible that two people could have a one-to-one correspondence between the neurons in their brains and have the same pattern of firing on these neurons. Is there a way to summarize the current state of the brain in a way that abstracts away from the architecture of a particular person’s brain, and allows us to compare thoughts between people?

The answer is, of course, that humans invented such a system about a hundred thousand years ago, called language. The fact that we are able to convert our thoughts into words, and decode others’ words into our own thoughts, is an incredible feat. In a sense, we are all bilingual - our internal thoughts are represented by patterns of brain activity unique to our own brains, and we can translate these to and from a language that is understood by others. This sentence is some dynamic state in my brain which I have compressed into a string of characters, which your brain is then uncompressing to create a corresponding state in your brain. Language is not perfect, and we can sometimes struggle to translate our thoughts into words, but considering the complexity of the human brain (with 100 trillion connections between neurons) we can do an impressive job of copying thoughts between brains using a one-dimensional channel of text or speech.

There has been some early work on building computer systems which can perform this same kind of task, producing a natural language description of some complex internal representation of information. For example, both Stanford and Google have built artificial neural network systems that can look at an image, produce some numerical vector representation of its content, and then translate that representation into a sentence that humans can understand. My labmate Andrej Karpathy has put up a cool demo of sample sentences (you can refresh to get new ones) to show how well this is currently working. It’s clearly far below human performance, but describes a surprising number of images correctly.

Going back to human brains: though we can’t expect a neuron-level correspondence between brains, might there be a more coarse similarity at the millimeter scale? Maybe the average activity of a clump of ~10,000 neurons in my brain when I think about dogs looks like the average activity in a similarly-positioned clump of neurons in your brain. There have been a great deal of interesting neuroimaging experiments testing this hypothesis, and it seems that there are a number of brain regions with this property. In fact, the more in-sync a listener’s brain is to the speaker’s, the better they comprehend what the speaker is saying.

You’re thinking I’m thinking what you’re thinking.

Jemaine’s second objection is that Bret is currently thinking about a model of Jemaine’s mind, and since Jemaine isn’t thinking about his own mind their thoughts can’t be the same. The ability to make these kind of statements, about the mental states of others, is called having a Theory of Mind. In fact, understanding Jemaine’s sentence is a fourth-order theory of mind task for us as listeners: Jemaine thinks that Bret thinks that Jemaine is thinking what Bret is thinking. The fact that we can even comprehend this sentence is remarkable. No other animal (as far as we know) has the ability to build these high-order models of the thoughts of others, and humans require years of practice.

If you want to test how good your theory of mind skills are, take a look at one of the questions used in studies of schoolchildren:

There was a little girl called Anna who had a big problem on her mind. It was her mums birthday the very next day. Anna wanted to get her mum a birthday present, but she just didn’t know what to buy. She thought and she thought. Tomorrow was nearly here! Anna remembered that her brother, Ben, had already asked mum what mum would like most of all for her birthday. Ben was out riding his bike so Anna decided to look around his room to see if she could find what present he had got for mum. Anna went in and found a big bunch of beautiful flowers with a little card that said: ‘Happy Birthday Mum, love from Ben.’ Anna thought to herself ‘mum must want flowers for her birthday!’ Just as Anna was leaving the room, Ben was coming up the stairs, but he didn’t see Anna leaving his room. Anna didn’t want Ben to know that she had been snooping round his room, so she said to Ben: “Ben, have you got mum a birthday present?” Ben thought for a minute, he didn’t want Anna to copy him and get mum the same present, so he said: “Yes, Anna, I have. I have got mum some perfume. What have you got for her?” Anna replied: “Erm, nothing yet, Ben” and she walked away.

Which is true?

a) Ben thinks that Anna believes that he knows that Mum wants perfume for her birthday.

b) Ben thinks that Anna knows that he knows that mum wants flowers for her birthday.

11 year-olds are unable to answer this question, while most adults can. It requires tracking the knowledge states of multiple people, and understanding how each is trying to deceive the other. (In case you’re still struggling, the answer is (a)).

There have been some efforts to incorporate basic theory of mind into AI assistants like Siri, Google Now, and Cortana, in the sense that they can keep track of basic context: if you ask about the weather and then say “what about this weekend?”, these systems will understand that you’re still thinking about the weather and interpret your question in this context. However, I don’t know of any systems that really try to build a deep understanding of the user’s mental state or tackle higher-order theory of mind tasks. I predict that in the near future we’ll start seeing assistants that keep track not only of what you should know, but also what you currently do and don’t know, so that they can deliver information only when you need it. If my wife emails me to say that she’s moved up the time of our dinner reservation, and my personal AI sees that I haven’t read the email and haven’t left for the restaurant, it should guess that I am mistaken about my wife’s plans and interrupt what I’m working on. Perhaps a more useful measure of an AI’s conversational abilities than the Turing test is a theory-of-mind test: can an AI understand what we know, how we’re feeling, and what we want? Feeding it some Flight of the Conchords as training data might be a good place to start.

Jerry Kaplan: The Law of Artificial Intelligence

2015-02-19T00:00:00-05:00

Jerry Kaplan has been an interesting force in the Stanford AI community over the past couple years. He’s been a major player in Silicon Valley since the 80s, was one of the early pioneers of touch-sensitive tablets and online auctions, and wrote a book about his 1987 pen-based operating system company (which was ahead of its time, unfortunately for the company). Recently, however, he seems to have a new mission of fostering a broader discussion of the history and philosophy of AI, especially on the Stanford campus. He has been giving a number of talks on these topics, and taught a class that brought in other AI speakers.

His most recent talk, through the Stanford CodeX center, was partially a plug for his upcoming book, “Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence” but focused specifically on how the legal system should handle the rise of autonomous systems. He draws a distinction between a system being “conscious” (which is far off for machines, and more of a philosophical question) and a system being a “moral agent” that is legally considered as an “artificial person.” Arguably corporations already fall under this definition, since they have rights (e.g. free speech) and can be charged with legal action independently from their employees (e.g. in the BP Deepwater Horizon spill).

Can an AI be a moral agent? Jerry argues that systems like autonomous cars are able to predict the consequences of their actions and are in control of those actions (without requiring human approval), so they meet at least a basic definition of moral agency. As has been discussed by others, self-driving cars will have to make decisions analogous to the philosophical “trolley problem,” in which every possible action results in harm to humans and value judgments must be made. Since AIs can (implicitly or explicitly) have some encoded moral system, they should bear some responsibility for their actions.

He proposed a few ways of actually enforcing this in practice. The simplest would be some type of licensing system, in which every AI system meeting some threshold of complexity would need to be registered with the government. If an AI is misbehaving in some way, such as a driverless taxi driving recklessly fast in hopes of a better tip, then its license would be revoked. The AI might just be destroyed, or if we think that the problem is fixable then it could be “rehabilitated” e.g. with new training data. There are many possible complications here (I asked him about how this would work if the AI is partially distributed in the cloud), but this general approach makes sense.

I’m happy that we’re handing over more and more control to AIs, freeing up humans from mundane tasks and probably outperforming them in many areas (though this will require some restructuring of the economy, which is a topic for another post). There are, however, some very practical problems that we need to address sooner than later - I’m glad to see that technically-minded people like Jerry are diving into these issues.

Michael Gazzaniga: Tales from Both Sides of the Brain

2015-02-13T00:00:00-05:00

Michael Gazzaniga’s new book is Tales from Both Sides of the Brain: A Life in Neuroscience, which is a memoir about his scientific study of split-brain patients. If you’re unfamiliar with this work, people who have had their two brain hemispheres surgically separated (for medical reasons) show some amazingly interesting behaviors, which raise deep questions about consciousness and free will. Gazzaniga set out to simply write a standard popular science book about his research, but ended up publishing a much more autobiographical book that discusses the actual process of scientific discovery. As he writes in the preface:

Most attempts at capturing the history of a scientific saga describe the seemingly ordered and logical way an idea developed… I want to present a different picture: science carried out in friendship, where discoveries are deeply embedded in the social relations of people from all walks of life. It is a wonderful way of life, spending one’s years with smart people, puzzling over the mysteries and surprises of nature.

This is a fact of scientific research that isn’t often highlighted - personalities and relationships are central to the scientific community, even though it rarely looks that way to the public.

One of the main takeaways from the talk for me was how young the field of neuroscience is, compared to most other hard sciences. Gazzaniga started his work in the early 1960s, when the term “neuroscience” barely even existed. The fact that the founders of the field are still around should remind us that neuroscience is in its infancy, and we still know very little. Instead of being depressing, it’s kind of liberating - it means that most of the big ideas and unifying theories are still out there to be discovered, and we should take everything we “know” so far with a grain of salt.

Gazzaniga said that when he started at Caltech, there were basically two rules:

Do important things: don’t do an experiment just because it’s a new thing to try.
If it’s important, just do it: don’t spend too much time planning and worrying about exactly the right way to test something. In my experience, it’s often hard to even tell ahead of time what the hard parts are going to be, and trying things out is a must faster way to make the experiment better.

Given the early state of the field, neuroscientists should take these to heart. We’re still looking for the big principles of the brain and mind. Now is not the time to be polishing up the details of our models - we’re still brainstorming.

Hello World

2015-01-27T00:00:00-05:00

I visualize a time when we will be to robots what dogs are to humans. And I am rooting for the machines. -Claude Shannon, Omni Magazine (1987)

I’m a soon-to-be PhD who has spent the last few years at the intersection of machine learning (i.e. building systems to model complex data) and human neuroscience (i.e. trying to collect data on a complex system). I believe that both of these fields are going to have a profound impact in the coming decades, in terms of how we interact with technology, how we communicate, and how we see ourselves. In contrast to my published papers, in which I have to actually support my points with hard evidence, this blog is an opportunity to speculate about where we are and what’s coming next.