Interview

Transcript: Dr. Juergen Hahn interview

The following is a transcript of our October 3, 2025 interview with Dr. Juergen Hahn. The transcript has been edited for clarity.

The Poly: So, I just wanted to start off with some of your prior studies and one paper that I was particularly interested in was the metabolite work and the microbiota transplants. Can you talk a little bit more about that paper?

Dr. Hahn: Yeah, yeah sure. So, let me start with the motivation for the paper. So as you know that my group focuses very heavily on data science aspects related to autism and what you find is that people usually don't have autism diagnosis, and that's it. They oftentimes have many, many other co-occurring conditions that go along with it. One thing that is very, very common is that there's GI issues. Okay, so autism and GI disease issues, I mean not everybody has them, but a lot more than if they don't have an autism diagnosis. What you find is that they're not easy to treat. Yeah, so clearly there's something going on there where standard treatments don't seem to work so well. And so the study we were involved in, and this one is actually run out of Arizona State. Yeah, we just, we run the data analysis, but that looks at can I basically, can I transplant or replace the microbiome somebody has with somebody else's microbiome. Okay, in the hope that the microbiome then basically will help with some of the GI issues that you're looking at. Okay, now why are you, are we looking at this approach? Well, the first thing is that GI issues are over represented. Yeah, the second issue is that most of the standard things you're looking at don't, you don't really seem to be able to do a whole lot there. The microbiome produces a lot of the metabolites in your body. Okay, like one example, I mean, maybe you've all heard about serotonin. Yeah, but 90% of your body's serotonin is produced by the microbiome in your gut. Okay, and there's many, so it's not that your body produces all the things you need, sometimes it actually gets them from the microbiome. And so if there's disruptions in the microbiome, you will also expect that there will be disruptions in the metabolites within the blood of the individual. And so what we are looking at with this study, will people improve if you replace the microbiome? Now in all fairness, my collaborators at Arizona State do all the heavy lifting, they're recruiting people, they are working with a group at the University of Minnesota who produces the microbiome. We get the data and then we basically, yeah, analyze, we're looking for differences there, both from like before a microbiome, microbiome, being you started as well as at several time points to see how does the microbiome change over time. So, it's everything related to microbiome is a pretty hot topic of research these days and because it's it's a very complicated issue we don't understand all details of it but this is certainly and the results are pretty seem to be pretty promising for the studies so isn't just isn't just one paper there's a number of studies going on for people different ages and people with the somewhat slightly different conditions, as well, so but they all they all look pretty promising.

How do you—what is your approach to applying your computational methods into big data sets like that and is it—I was gonna ask, is it specifically tailored to look for certain markers related to ASD or is it more—is it broader than that?

So yeah let me—so let me, let me take a step back there because that's where people then oftentimes start to see where the value of our work is, yeah. We really bring value that we are using what I call multivariate techniques, okay. Like simple example: if I go to doctor and I get my cholesterol checked, they take a blood sample, they measure how much HDL do I have, how much LDL, and in each case you look at HDL, you look at the reference range, you look at “am I in that range outside of it”, you look at LDL reference range. So in this case both cases have two measurements but I use them one at a time, yeah. But then with cholesterol people also look at well I can look at the ratio of the one to the other, now you're looking at two value variables together with the ratio may be more meaningful than what you get individually, yeah. Now this is where it stops when it comes to cholesterol screening but what we've been doing now in our case, particularly in the autism realm, so we look at if I have a number of metabolites that are measuring rather looking at them one by one look at how are they—where am I seen differences we're looking at them together to find patterns that are unusual. Okay, so what I mean you may find patterns where metabolite 1 is a little too high, but not so high that you would say it's outside of the normal range. Metabolite 2 may be a little too high. Metabolite 3 may be a little too low. Everyone, when you look at them one by one, may not find anything unusual. But together, you say, “oh, this combination,” that is very different for somebody, for example, who has autism versus somebody who doesn't. Okay, so they're bringing these multivariate techniques. This is really what our, to the field of autism research, this is what our group has done over and over again. Yeah, and in some other fields, the methods were more advanced earlier, but in this field, things seem to be lagging a little bit behind. So this is then where our main contributions come from.

For me personally, what's been drilled into me since high school was the method of the scientific process, hinging on falsification. Correct? So my other question is, could you direct us to some kind of research that has been attempting to falsify the research that you've been doing?

I mean, like, like falsify in what sense? Like if people…

Repeat it.

Oh, okay. So verification.

Yeah, verification.

Yeah, yeah, of course. Yeah, I mean, look, so there are several papers out there. Yeah. For example, one of our papers that got the most press is when we looked at differences in metabolite concentration of children with and without autism diagnosis, and we couldn't predict with very high accuracy, meaning over 95%. Yeah, if the sample came from the group with the autism diagnosis or the ones without. Okay, so this is, so that was one of our more higher-sided papers. So first of all, when we published this, there was a group, yeah, out of Princeton, and they basically said, well, these results are great. Let's see if we can do this. And they said, let's see if we can do it better. But for them, better means in less time, so not necessarily better results. And so what they found is, essentially, using the data sets that we published, they found that, yes, we come up with the exact same results. They were able to speed up the process, which to me didn't make a whole lot—I mean, if you—wasn't the most important thing is because you only do this once. I don't care if it takes five minutes or 30 seconds. If it takes years, it's a different story. But in the end, I mean, their algorithm worked faster than ours. No argument, but they came to the same conclusion on what are the important metabolites. And this is what our paper was all about. So clearly, that was a nice verification. Separately, there was a group in China who also, they actually read their own data. And they said, “no, we're going to recruit people.” They may have to make some slight modifications to what they could measure. Yeah, but for the most part, they used our measurement set. And they also came up with, with good accuracy. They didn't get 95%, but they were over 80%. And 80 is usually viewed kind of as the threshold. The third thing that we did, now that we did this ourselves, we said, well, let's get some—let's see if we can get more data, but running these trials is expensive. So we looked at other studies. In this case, these were autism treatment studies. But they did a test before treatment and then after. They said, I don't care about after because then everything changes. But let's pick the people before. And can I predict that this data really comes from somebody with autism? Because everybody who was recruited for these trials did this. And we still got good accuracy. We predicted like 88% correctly. And that was no fitting. It was really just testing, using this as a testing set. So there's a good amount out there. I'm sure you could always have more verification. But that's actually more than most papers get. The other thing is, for our work, we published raw data sets. I've had some undergrads who revisited this and ran their own analysis. It's out there for everybody to test. So far, nobody's ever found anything that contradicted what we found there.

So, now this project, the contract—I was at the Senate meeting on Tuesday and you had mentioned about the confounding variables when it comes to autism research. I was wondering if you could elaborate more about what are the most common confounding variables you see and some obscure ones you weren’t expecting?

Well, I mean, so the thing is always basically when you're looking at two data sets and you want to find if there are differences, what is really important is to make sure that you're really comparing apples to apples to the degree you can, rather than apples to oranges. And this is like, for example, you want to make sure that if group A is in a certain age range, that the group B you're comparing is also in the same age range. Otherwise, you may find that suddenly age plays a major role in what you're seeing and it's not the difference is not due to the condition you're looking at, but it's the age, okay. Likewise, I mean, that's a very common one. Another one, obviously, is you need to make sure that you have a similar male to female ratio. That's particular, for autism, difficult because you end up having three to four times as many males have an autism diagnosis than females. But if I compare them to a group that is typically balanced, usually 50-50, right? So you have to make sure that this does not affect what you're seeing. Another variable that always comes into play often is socioeconomic status. And that's particularly important when you do any type of epidemiology because you will find that based upon people's income, if they live in a more rural versus a more urban setting, they may have different access to health care. They may have different utilization of health care patterns. So those are some very, very common ones that are commonly used. Other ones, it's not obscure, but it's very hard to deal with. It's like one of the things you know is that autism often runs in families, meaning that if you already have one kid with autism in the family, the chance that any younger siblings have autism is much higher than if you just randomly pick somebody where nobody has autism in the family. Latest numbers there, like general prevalence of autism, is a little over 3%. But if you have a sibling with autism, it's a 20% chance that another sibling has it. So it's seven times as high. Roughly, it's a little over 20. I would have to look up the number. So clearly then you find, well, if there are certain things that run in families, can I use this as a confounding factor? Because that can change things dramatically. But oftentimes in the study, you don't necessarily know if they have an older sibling who has this. And I mean, yes, if I design a study where I can ask that question, I can take into account, that's fine. But in some types of studies, you're stuck with data that people have collected, and if that isn't part of it, then you can't use it. But this is, for example, a pretty important factor because the difference there is so big. Oftentimes these confounding factors have some effect, but they're not huge. But if you have something where the autism rate changes sevenfold, then it can be a game changer for your results. You may find something that you would statistically say, if you don't take it into account, you may find something different if you do take it into account. And then the key question is, well, A, can you take it into account? And if not, then you know that that's a limitation of your study, right? Every study has limitations. There is no study that will ever completely answer a question. Never has been and never will.

Has geographic location of a certain sample ever affected the results in any way?

I mean, it's generally understood that it is. But it isn't just location. For example, if you want to get any treatment or drug or treatment approved, you usually have to do multi-site study. It's like you can start out with saying, yes, we used one clinic where we get samples from. But at some point you're going to say, no, you need also a clinic somewhere else. Because, I mean, for starters, if I pick an example, if I run a study in Southern California, I'm going to have a very different ethnic makeup than if I run a study in Kansas somewhere. And so it's understood that you need to basically use multi-sites. The other thing is also that you may find people have slightly different practice on how they go about things. To give you one simple example, for our study where we looked at this autism diagnosis versus no diagnosis, there's a number of metabolites that we're measuring that are not very stable. They have very short half-lives. And so in that case, it's important that the sample gets frozen quickly and then shipped. And in some clinics, they may freeze this an hour later. And if your half-life is only of a certain compound you're interested in, it's only like half an hour, then letting this sit out there for an hour has a major effect on what you're measuring. So now this is then more practice versus location. But it's generally understood that you have to take this to account. Now, this is why for clinical trials you have phase one, phase two, and phase three, where you add more and more variables to take this into account. For phase one, you start with one side, but you may add another side later on.

RPI was given a no-bid contract, not a grant, from the CDC to conduct this research, and it's for a year, if I'm not wrong?

That's correct.

Okay, so with a 12-month contract and limited funding, and the fact that it is a contract, what is your timeline to get the project done?

Yeah, I mean, look, we plan to get this done in a year. Otherwise, we wouldn't have agreed to this in the first place. But what will help us get started quickly is that you have some prior work that is very related to this work where we have identified cohorts that we'll use as a starting point. I mean, the other thing also, just to talk a little bit about this grant versus contract, there are differences. In the end, a lot of it depends on what are the terms that you work out. Because there are some contracts where it's very tightly controlled what you're doing, and there's some other ones where you have a lot more flexibility. And given that we're a university, we made it very clear from the beginning that it's very important for us that if it's a contract, it has to be a contract where we can independently do the work here and we can decide how the publication process works. So the main, actually the differences in this case are not as big as people may think. There's a bit more work, and yes, I have to write more than I would have for a regular grant. Yeah, and then the payment schedule is a bit different. For a grant, you usually get a certain amount upfront or maybe the whole thing, and for a contract, you get it for meeting certain milestones, but these milestones may just be that I submitted my quarterly report. So the differences are not necessarily as big as you may think. There's a bit more work. Yeah, it's a somewhat tight timeline. Okay, I'm aware of that. But given that we have some good results that we can build on, I have no doubt that we can do this. Let's say once we fully get started, we're still in the process of assembling everything needed, you know, how to get started. I wouldn't have agreed to it if I didn't think we could do this. I try to deliver, all right? And by deliver, I don't necessarily mean the results people want, but I try to deliver what I promised, and that means we're going to look at this and we're going to have some results.

Could you expand on the impact that the data from your past studies will have on the project and how it'll change how you're able to do the research versus other universities?

Yeah, I mean, so basically, so in particular, there's two data sets that we will build on. Okay, one is based upon a data set where we basically look at medical claims data. And in our case, medical claims data will be set. We want to have, in quotation marks, a complete set of records from birth to at least age five. And we built a cohort where we have this, where we can really see that it's a complete data set. The reason I insist on complete is that sometimes you find that, well, you have data for somebody for two and a half years, and they disappeared for six months. And disappear doesn't mean they didn't do anything. It just means that medical claims data usually goes through some health insurance, and if the family changed insurance providers, then they disappeared from that data, which means I don't know what happened in between. Nothing may have happened. I can't use somebody like this. So I needed somebody where there's continuous insurance coverage for at least five years. Okay? And so when we did this, we basically built a cohort of over 275,000 children where we had a complete set of records. That's 275,000 children, not medical records. We had numerous medical records for each child. And out of those, we then identified how many people were diagnosed with the autism diagnosis. And we still found 3,200 something. Okay? Which is a very good number for this. And this was over a time span from, all right, now it gets a little complicated, from 2000 until 2012/15. Because we have different cutoffs for different things. That's why. So that is one cohort that we're going to start building from. What we did in past work, we used that same cohort. But for example, we were interested in how many of the people in this cohort also had GI issues, which is now, you see that it's a very different type of study, but it's a GI issue which really comes back. And again, we're interested in how many of them get an epilepsy diagnosis. Yeah? Because that's also something that's overrepresented in children with autism. And then a few other questions. But we can use the same cohort, except that now we need to pull in additional data that we weren't interested in before for the kids. Right. But we know who to look at. Yeah. The second study that we did, and we're going to build upon this too, is that we took this number of children and said, hey, can we link them up with their mothers and see what information we had from the mothers during pregnancy? Yeah? And so now, then we look at, we looked at 12 months. I know the pregnancy doesn't last 12 months, but we looked at 12 months prior to birth, now for the mothers, and we then looked at five years after birth for the kids. Yeah? Now, in this case, connecting the child to the mother wasn't always so easy, and so we have fewer people. So basically, we ended up, yeah, we ended up with, yeah, almost 120, over 120,000 child-mother pairs that we had. Okay? So actually—careful—some of them, it was over 100,000 mothers, but over 120,000 child-mother pairs, because some of them, same mother, but different kids. Right? Because we tracked them over several years. We had some of them were siblings, then. Yeah? And that's still a very good number. In that study, what we were interested in was saying, so what were, what health records did we have for the mothers during pregnancy, and how do they affect the probability of an autism diagnosis of the kid a few years later? Okay? So those are the two cohorts we're gonna start working on. What we're gonna do is that, A, I would like to update the cohorts so that we have data that's closer to 2025. Yeah. Coming in, that's doable. I mean, we, in order to do this, by the way, interrupt me if I get too technical.

No, that’s okay.

We had some more newer data last time, too. It’s just that in 2012 we changed the autism diagnostic manual from DSM-IV to DSM-V. For the previous study the autism diagnosis had to be before 2012 because I didn’t want to mix the two. And the other thing is that, since we’re dealing with medical claims, you have these ICD codes. Yeah, for procedure, it’s being done or has—gets our diagnosis. The procedures are separate goals for a diagnosis. You get a certain ICD code that says it’s like, for example, an ICD code for “yes, you have autism.” But we changed—ICD 9 was used until 2015 and then in the Fall of 2015, was changed to ICD-10. So this is why we put the cut off there and says, “look, we’re not tracking from DSM-IV to DSM-V. So the autism diagnosis needed to be before 2012, and I’m happy to track other medical conditions but not as it moved to ICD-10.

There’s two cutoffs.

Yes, okay and well, because the other thing was, we wanted to make sure we also had some data from people after the autism diagnosis for just the co-occurring conditions. But, and then we had a few more days—or years of data afterwards, but not enough that we say, “yeah, let's make really an effort to try to match this.” Now that we're back in 2025, it may make sense to say, okay, fine. How do I do this transition from DSM-IV to DSM-V and from ICD-9 to ICD-10, with which you get newer data and use that as well. So, that is part of the—to be honest, that’s probably the hardest part of what we’re proposing. The analysis is not going to be trivial, but that’s not going to be it—not that hard, but extending the core. But for all work, you can start with original cores that we have, you get some initial results but then we also want to have some of the newer data in because I mean medical practices have changed over the last 10 years. And so I've autism rates, okay, and that will be reflected in the data, so it's important to use some new data as well, rather than just historical data.

What does the process look like when you have to go find new children for the cohorts?

Well, so basically, you have a big database that you have access to. You have to define criteria, inclusion criteria, as well as exclusion criteria. Inclusion criteria, for example, was—needs to be, well, we want the complete medical record from birth to age five, okay? So you had to use the exclusion criteria saying,if you don't have socioeconomic data, then I don't want the person, that doesn't mean I couldn't use them somehow, but if I'm talking about confounding factors, then I couldn't use that. Yeah, and so this was a criteria. The other thing you have to be somewhat careful of: there are other conditions that, some don’t, overlap with autism. We wanted to make sure that, you're not like, accidentally, including or excluding somebody based upon that, and so, it was sometimes done better through just saying, okay, if somebody also has this diagnosis, just leave them out. Yeah, like another criterion, was—they needed to be—had an identifier that were diagonals with autism, and after two years of age. And at least on two separate offices, okay. Because we didn’t want somebody who—“we’ll diagnose this very early on,” and then the doctor said, “you know, what? I'll do this, after all. It wasn't exactly what we thought,”—that's not what we want. Yeah, so there's a number. I mean, if you look through my papers, it describes in detail what the criteria and yeah, in any type of good science, you have to exactly describe what you're dealing with. Yes, because other people can reproduce it.

Is there public access to the datasets?

So, now it depends. If you're talking to these epidemiology studies, so like, anybody can, or most people can get access to it. But there, we don't own the data center. Okay, they're in our case, the data came from Optum Labs, Optum Labs data warehouse. If you have access to the Optum Labs data warehouse, you can get access to it. But obviously, if you don’t have access to it, you don’t. Yeah, but there's their company that provides access to people who are interested in using it for their studies. So, other people should be able to get access to this. From our side, happy to publish our codes that we used for getting the cohorts. That's not a problem.

Okay, thank you.

Do you know of any other research group that has access to this data?

Yeah, there's—I mean, they have numerous customers, including many of them are listed on their website in several major universities, hospitals. This is—I mean, look. This is, Optum Labs is part of a major corporation. Their business model is to use this healthcare data for types of studies, you know, and what drugs work, what risk factors are there, and a number of other things. What is nice about them, for all I know, this is the biggest dataset out there. Okay, they have access literally to 160 million patients. Which is huge. Now, you argue, oh, why am I talking about hundreds of thousands of kids? You start out with that 160 million, but then—

Filter, yeah.

Ages 0 to 80, on average, and we're only interested in the first five years. So I'm already at 116. So now I'm down to 10 million, which is still a lot. But then, from the 10 million, I don't have access, necessarily, to data from birth to age five, and this is where I lose a significant number of—around to a few hundred thousand, but that's still a lot. Okay, there's plenty other studies where people work with a few thousand. Yeah, so, this is what got us interested. And, yeah, they have enough broadcast numbers.

Coming back to the question before this, what kind of conversation did you have with the office who awarded the contract in preserving the integrity of the results that do come out of this?

Yeah, I mean, look. So, part of it is, I mean, first—obviously, there was some interest in what we can do, okay, because it's something—you can read some of this paper—but then, for example, like, can you get access to the data, yeah? By the way, in my case, I have an application pending with them. Okay, so Optum Labs is still, they still have to sign off on this, so this is all part of the—yeah, well, but they're not interested in talking about hypothetical projects, they want the project first, and then they negotiate with RPI or the terms of access to it, because obviously it needs to be confidential, and many other things. So, we have discussions there. The other thing is that, I wanted to know they are interested in, what questions they want answered, okay, because clearly they have a huge number of questions they want answered but given that we're talking about a one year project, I’d rather want to answer one or two specific questions rather than this is a huge number where I'm not sure we could get through all of this. Yeah, then you have to make a plan on how to go about this, and it doesn't hurt to—it doesn't hurt to touch base regularly on where things stand from their side, as well as from ours. But we are the ones performing the work here. Yeah, now in all fairness, I have some people at CDC that we will have virtual meetings with because I also want to get some of their feedback on this but we are going to do the analysis here with my grad students.

So that's not been decided yet.

What has not been decided?

Yes, you just mentioned that you want to do the analysis here with your two grad students.

Yes, that's—well, I mean, we'll do—we will do the analysis here. Okay, but the thing is, I first need—first of all, I need access from Optum Labs to get access to the data. Before that, I can’t do anything, right? Then what happens, the way this works, you write a proposal to them what you want to do because they have a huge data warehouse and they’re not going to give you access to all of this. Okay, they create a sandbox for you to play in. Okay, then we need to basically build a sandbox. The good news is, from our previous studies, I kind of know what our sandbox needs to look like for the most part.

Yeah.

But no, it’s been decided that we will do the work here and, I mean, my students, yeah, Halil and Henry are the ones going to implement this, and I know who they are. They know about this. They were part of when this whole thing was discussed, yeah. At the same time, I said there's some information where I would actually love to get some feedback on CDC because they have a lot more—I mean, I have a lot of experience on autism, but when given that we're talking about childhood vaccinations, they have a lot more information childhood vaccinations than I do. Yeah, and so getting some feedback on this. Like, for example, when it comes to these ICD codes, there’s oftentimes, they have subcodes and it doesn’t hurt if you better understand the medical practice on how they code these things. That is helpful to get some feedback from people who are much more involved in this than us, than trying to just reinvent the wheel from scratch here. Okay, so, but we are the ones doing the analysis here. That is, that has bee—that was decided as part of this, and that's not going to change. But we haven't started because I first need access to the data before I can start anything.

Another hypothetical: if analysis yields from this dataset that you will get, is there a possibility for the research question to change slightly?

I mean, yes. And no, okay, I mean. So, first of all, in any research, you may change the question because sometimes you find out, oh my God, you found something that's a lot more interesting than we started out with, and then it would be silly to not consider doing that. Yeah, but we're not going to do is that if we have a question to answer and we answer a question and we don't like the results, saying, “ah, let's just ignore it and do something else.” That's not what we’re gonna do, okay. But obviously, if something much more interesting comes up that we just can’t foresee yet, we would be foolish to not look at it.

I’m trying to resolve the distinction between the grant and the contract, and then a contract is usually for a specific research question–but there is flexibility.

It’s-there’s a lot of flexibility.

Okay.

The thing is that the things we're trying to answer, I mean, they're pretty easy to describe, yeah? And there's many different ways to go about this. But, for example, this whole idea with what confounding factors to include, how do they affect what you're getting? All of those are important questions, and sometimes, I mean, yeah, you can make a plan that you think is the right plan, but you should also deviate from your plan to see how does it change things? Okay? And I'm not saying, deviate from your plan, your final results, but part of research is we look at the number of possibilities; rather than saying “this is the procedure we’re gonna follow and then we’ll have one result.” The really important part of all this type of work is that you have to understand how robust are your results. And that means that I don't just need to know something that gives you a certain number, but I need to know the confidence interval around the number. Because that confidence, like, if you look at if something is associated or not, it may very well be that the confidence interval goes from not associated to associated. That means you can't say anything with statistical significance, so just getting one number from one experiment may not get you all the way.

At the student senate meeting on Tuesday, there were some concerns raised about the role of politics in public health research. So, how do you ensure that the research that you and your graduate students are going to be siloed from the media discussions and public discussions that are going on or do you want to engage with it?

Yeah, this is a tough question. I mean, look for the most—the thing is, for the most part, there's well, there's two parts to it. I mean, number one, I obviously can’t ignore what the public says, okay. Like, we don't want to be the scientist who ignores the implications of the work. Okay, we're not naive, we understand that part. At the same time, you have to be able to do your work objectively without whatever influence you get from the public. Okay, and because, like in particular when you look at important and somewhat controversial questions, there’s always some group that doesn't like what you're doing, and they're worried about what you may or may not find. Yeah, that should not influence what we're doing. At the same time, I said, I can’t ignore that, what we're doing, what implications might be there, and you have to take this into account. Okay, so yes, I will—I'm not interested in talking to media about things that are work in progress where we really have nothing to report, which is one of, by the way, one of the reasons why we declined any interviews, even for major newspapers so far, because we didn't even have a contract in place. And again, for us, it was important to end up with a contract that gives us a lot of flexibility: the contract looks very similar to a grant, okay, the document and that was important. But I certainly wasn't going to discuss that with the newspaper while we were still talking about what this thing may look like. And, so the same thing when we start doing the work—let us do the work, let’s try to put things aside, but at the same time, yes, we have to and you have to understand what implications there might be because we're not disconnected from this. Yeah, but obviously, I don’t want anybody to take my work and then use it for their own purposes. You can't always completely prevent that, but the best thing you can do is to try to do your work in as open a way as possible, yeah, and as objective. Make sure that you publish your things in a way that's reproducible, and that other people have access to it. Yeah, now in our case, so if we get access here to Optum Labs to implement our work, other people can get access to this to reproduce it, but if they do, then our results should be reproducible. That's really important, because otherwise, having results that only your group can find, and nobody else can reproduce, it is completely useless at best, and it's actually maybe misleading at worst.

That’s a good segue to the next part. So, besides the technical part that we spoke about in this interview, how would you like to explain to the broader student body here about what the research is, what this specific project is actually trying to find—in a four to five sentence summary?

So, the question we're trying to address, okay, is that we're going to look at a cohort of children that, we have one group they have basically—they've not have not received vaccines for the first two years of their life. And the other one who’ve basically gotten the standard recommended ones. Okay, and we're going to look at if there's any difference that we see in the rates of autism and other neurodevelopmental conditions over the first five years of life. That's, in a nutshell, the question we're trying to answer, okay? Does that answer it?

Yes—

That is the scientific question, yeah. The one thing I can add to this: we’lll separately want to look at our child mother's cohort. This is why we needed that, and then one to have a look at what's the role of vaccinations of the mother during pregnancy, and to account it in terms of autism rate versus—or more generally neurodevelopmental conditions. Okay, those are the two questions we're trying to answer.

Are there any significant deviations in this type of study from previous studies that have looked at correlations between autism and vaccines?

First of all, nobody’s ever used this data set. Okay, so that's the first thing. The second thing is, we will extend our—the cohort that we're looking at is like, very recent data. Yeah, because oftentimes, when one thing that often gets cited are these data from these Scandinavian countries, and they have very good records. But also keep in mind sometimes, these older records, and I’m going to talk about like, autism as an example. The autism rate is a little over 3%, yeah, but if I go back to the period from 2000 to 2010, it was less than 1%. So the rates clearly have changed, yeah. Our medical practices have changed, so having access to newer data is important. This is also one of the reasons why, when people bring up saying, “well, you never need to revisit a question.” Well, if your rates and you’re practiced enough, then you should revisit it even if you just revisit to see, “well, anything new here, or is everything the same?” So, that is one big item. The other thing is that, I think It's great that the Scandinavian countries have all this data, but we also need some data on US cohorts. Okay, I mean, the U.S population does not look like the population in Sweden or in Denmark. We have a much more diverse country, we have much more people from different backgrounds. Our healthcare system, our healthcare delivery is different than other countries. Okay, so it is important to do studies with recent data and recent US based data. Yeah, so it isn’t always, like somebody says, “oh my God, what other people did is all wrong,” (usually it’s never that but), or they made all these assumptions I disagree with—I mean, fine, maybe some of them—but sometimes, it's also that they say, “look, you need to have a look at using more modern data sets, more modern algorithms.” Now, in our case, the algorithms are not that different, so I want to claim that. But for some studies, that is important. Yeah, and also data sets that may be of more interest to us.

Are you aware of any studies that have taken place in the United States regarding autism that have happened since the DSM got updated to the DSM-5 and the ICD got updated to the ICD-10 because I was looking on the CDC website and the most recent they had there was I think 2014.

So well, this gets very detailed. Yeah, I mean, first of all, I mean are there other studies on autism since the update. Yes, but not, but I mean, there's no study that tries to do exactly what we are suggesting, yes, here. Yeah, but obviously, one example—I mean, the CDC publishes public data on autism rates. Every two years, they publish a new paper on this. They have an entire network from which they pull data for that. By the way, it’s a paper we often cite. But what they do is they have a certain number of sites where they collect this and then they extrapolate for the country. So it’s not like they have millions of records; they usually have a few ten thousands of records, and it’s assumed that they’re geographically spread out across the country, that you can extrapolate from this. But I mean, so it is certainly a topic that people look at. Look, if somebody had already done more or less the same study we’re proposing, there’d be no point in doing the exact same thing—that would be a total waste for everyone.

What would your response be to, like on the aspect of students and their perception of this research at RPI, a student that disagrees that controversial research like this be even done at all?

Look this is a—this is a tough question. Yeah, yeah, and the problem there is, I said two things. Yeah, first of all, I would argue that, oftentimes, there is a bit of a misunderstanding of what research is all about. Okay, I'm not claiming that we're gonna answer a question once and for all, okay? That never happens in any one study. It is not going to happen in our study, either. Okay? In fact, when it comes to any type of diagnosis or treatment, people are usually looking at many, many, many different papers, and then somebody writes a meta-analysis over all of this. And then usually they're looking at multiple meta-analyses before anything happens. So, this idea that, basically, “don't do this paper because I may not like the results.” Ultimately, your paper shows that for the data set you looked at, using the assumptions that you made, you found this, and this, and this. That’s what it says. No more, and no less. So people have to understand that there's always assumptions involved. Some people may disagree with the assumptions; you can try to come up with a reasonable set. Yeah, but even then, there's no universal agreement on these things. Yeah, what I do—would suggest, is there—so, first of all, don’t overinterpret what the research means, okay? Research’s important but at the same time, it is not like we're setting policy. Policy should be set on good research, but that means, usually, a number of studies, not one, okay? The second thing is there, I mean, look—If you look through history, sometimes things that were controversial weren't so controversial a few years later, and other things that were not controversial, are highly controversial a few years later. Things change, okay, and oftentimes they change because our understanding changes. Yeah, and so that's why we have to be able to say, “you know what, there's research. I may not see much use in it, right? Yeah, but just because I don't see a lot of use and doesn't mean it shouldn't be done.” Because somebody else may find use in it or maybe in a few years down the road we find, “oh my God, this was super useful!” Okay? So I would say be open minded, okay? And people should be doing research as long as it’s high-quality. And yes, that should be aware of potential implications of the work. But that doesn't mean that just because somebody could misuse your research that you shouldn't be doing the work. Yeah, that would be my suggestion, and the other thing is also like, being a student also means you're in a group. We have a lot of students here with a lot of different opinions, and you have to also be able to interact with other people who may disagree with you, from time to time. Okay, I mean, my wife and I disagree all the time. Okay, and we’re married for 25 years. So clearly, you have to be able to interact with people you disagree with from time to time, and then it's not always—the world isn't black and white, so this idea that let's not do this research at all because I may not like the findings. I mean, I don’t even know what the findings are, at the moment, I haven’t done the work, right? Even if you don't like what we find, yeah—or don't find, whatever comes out of it—understand, this is one piece of research and no more than that.

I think that helps. That will help a lot of students wrap their minds around it.

Speaking of the implication of certain results in this research, what kind of conversation do you want, then, people to have from what comes out of this research? Like, is there any shift in public health systems you would like to see, coming out of this research, if it yields a certain kind of result?

Here’s the thing, basically. First of all, let’s see what we’ll find. Okay, but I mean, you can find: maybe there's an association, maybe there's not. Right? Two potential outcomes, and of course, some gray area in between. Yeah, if you find that there is no association, then you can say, “well, that fits nicely in the majority of studies that are already out there.” And now, we've done, looked at the more recent data, and it's US-based data, and it also seems to show that. And hopefully people will look at that and say, okay, that is—that's something that that something would we think is important, okay? If it points out, if you find something that there is some associated and maybe people have to look at—saying, okay, we have a lot of other studies that shows there’s nothing there but maybe we need to look at, maybe, more modern sources of data as well, and see, would we find something as well? Last thing is, they may also find that saying, well, the assumptions that we’re making are likely somewhat different from what others are doing because everybody has slightly different assumptions, right? We're trying to have most of them the same, so we can compare things, but there’s always some differences. And then people can revisit and say, “well, if he had made this assumption differently, he may have gotten different results.” And that is fine, too. Okay, all that is part of research. I don’t expect anything major to change based upon one study. But I do hope that basically, we'll get some that convince some people that either whatever we found is important one way or the other.

This is a more academic question. In your opinion, has medical research kind of stalled a bit? Like biomedical research, specifically, in the last maybe decade or maybe not decade, but in the last five years, do you think it is stalling a little bit?

I mean, I don’t think it’s stalling. Okay, basically, now there are different areas that develop at very different speeds, okay. Like, give one example, we’re talking about the cancer moonshot, yeah. And if you were like, look when I grew up if somebody had cancer, this was terrible. It was pretty bad: it was incurable, painful, plan for end of your life in the near future, yeah, well the medium term at best. Nowadays, there's a number of cancers you can treat pretty well. Okay, some of them you can even cure if it’s caught early. All of that is because of academic research. One thing that is a bit more controversial, is when talking about autism research, I think we could have done a lot more.

Yes.

And I think that's actually where a lot of people are frustrated, yeah. Is that, basically, when you look at how much money we are spending on particular conditions—there is very little money spent on autism research by the federal government. There are other conditions that–they get 10-20 more times the money per patient that they have than this field, and so this is something that I think there's a lot more that could be done. Yeah. But that's one particular field, but overall, we've made tremendous progress with everything. We've made some progress here, too. Yeah, I don’t want to say it’s nothing, but it's—what? People are frustrated with it. The number of people getting diagnosed with autism rises all the time, and our progress is there but it is not at the speed we’d like to see.

Yes, I asked my question a little differently but by that I meant, specifically, autism research. Sorry.

I mean, if you want a comment off the record, I can make one, but don't quote me on that.

Maybe after. Okay, I have a final question. So, what responsibilities do you think that the university has when this kind of research becomes—it is a part of public debate? And do you think that President Schmidt submitted an appropriate response to the petition that was launched, and he sent a response on Monday. Do you think that was an appropriate response?

Yes, I think that was an appropriate response. I mean, look, as a university we have a commitment to doing quality, meaningful research, okay. And part of that is sometimes your research will be a bit more controversial. Okay, otherwise, if you want to move a field forward in one form or other, you have to be able to look at controversial topics. Otherwise, we just work—everybody's just work that nobody nobody cares about. Nobody gets upset about it, but we're not, you know, making any progress with anything. Yeah, or if you only work on [research] where there's 100% people agree that that's kind of hard to do these days. So no, I think President Schmidt’s response was spot on. That it’s important, that it needs to be done. Okay, and I think that the university really has taken very good care to make sure it's done the right way. I can tell you from, that was from all the negotiations they had regarding the contract, the terms of it to make sure that we have Independence, that we have, basically, that it’s our say and how we go about publishing results, about the transparency there. Likewise, there's data related questions, and to ask, “hey, what data are you asking?” The university wants to make sure that whatever we're doing, we'll do them by the books. Okay, and so they're very serious about that. And that's as it should be.

Sorry one more question, the project is for one year, but are you planning to publish also within the one year or is the publishing will probably happen afterwards?

It’s probably a mix. I don’t think we’re gonna get the peer review done.

Yeah, that's right.

The key is, we want to have documents ready. And what we’ll most likely do—probably do this—yeah, I'll probably submit to a journal where they have, where it gets put in an open archive as non-peer reviewed yet but people can see it. Because I know people want to see what’s going on, yeah. But at the same time, peer review may take a while, yeah. Especially when you’re dealing with a controversial topic, you just need to have one reviewer who doesn’t like whatever your findings are, and then you go back to the drawing board. That's most likely our approach, but I'm not sure we'll have a publication within a year, but we'll have a document that summarizes the results that people should get access to. That’s the plan.

Thank you so much.