Training Volume and Muscle Growth: Part 3

Ok, let’s finish this thing up.  So far I’ve looked at the 7 current studies (as of this article’s writing in October of 2018) in often excessive detail in Part 1 and Part 2 and now it’s time to put them all together to see how training volume and muscle growth relate.  As noted in Part 1, I’m throwing the Radaelli paper into the trash.  I consider the results too random and nonsensicial  to be worth considering.

There is simply no world where growth in triceps in beginners doesn’t start until 45 sets per week but 18 sets for biceps is effective and where LBM gains are higher for calisthenics than low-volume weight training.  So it’s out.  Agree or not, I put my reasoning up front and looked at it in detail to explain why I think it’s garbage so it wasn’t just a hand-wave like most would do.

That leaves 6 studies in trained individuals (minimum 1 year training experience and a usual range of 1-4 years) looking at different volumes of training and the muscle growth response.   Yes, they used varying methodologies, some only used body composition methods via DEXA, some used DEXA and Ultrasound, one used DEXA, Ultrasound and muscle biopsy (of quads only).

As I stated at the outset, I’m going to simply take them at face value for the time being.  It’s the data we have and with the qualifications I was sure to make as I went, it’s what we have to build the model on at this point.  Yes, future data may change the model.   When it arrives, the model will have to be updated with it.

Building the Model

First, let me put the 6 remaining studies together to see if a pattern shows up.

Yes, this is a terrible chart and I probably got at least one of the numbers typed in wrong since I type fast and frequently don’t take the time to check after the fact.  The true guru will dismiss the entire article based on a typo.   But guru gon’ guru and there’s nothing I can do about that.

Instead let’s focus more on the generalities of the data and less on my lack of proofreading.   As no values went down, all numbers represent an increase from the beginning of the study.  Percentage change is a percentage change and any absolute numbers are mm changes in muscle thickness.  I’ve shown the best response in red.

Paper Muscle Low Mod High Volume of best response (sets/wk)
Ostrowski Quads 6.7% 5% 13.3% 12 Nothing higher tested
Triceps 2.3% 4.7% 4.8% 14 28 sets no better than 14
Amirthalingam Totes LBM 2.7% 1.9% Lower volume better than higher
Trunk LBM 4.1% 1% Lower volume better than higher
Arm LBM 7.8% 3.4% Lower volume better than higher
Leg LBM 0.5% 2.4% Higher volume better than lower.
Tri MT 2.3 4.5 18-19 triceps
12-18 quads
For triceps 26-28 sets upper no better than 18-19
Leg volume of 12-13 and 16-18 so similar that I am considering them as part of a full range.
Bi MT 2.4 0.3
Ant Thigh MT 2.6 1.1
Post Thigh MT 1.2 2.2
Hackett DEXA only 18-19 vs. 26-28 upper

12-13 vs. 16-18 lower

Essentially identical to Amirthalingam with moderate volume superior for overall LBM gains and upper body and the slightly higher volume slightly better for lower body.  Differences were overall small and moderate and high volumes were more or less identical.  Very small number of subjects weakens statistical power.  Lack of direct measure of muscle thickness is not ideal.
Huan LBM changes Gains up to 20 sets/week.

No further from 20-32.

By muscle thickness via Ultrasound, triceps showed growth up to 20 sets and then SHRANK.
By biopsy, quads shrank up to 20 sets and grew after that.  Unclear why Ultrasound did not pick up quad growth but biopsy did.Total summed changes in triceps and quads a whopping 1.8 mm.  Miniscule changes overall.
Over 20 sets per week, water retention measured by TBW and ECW significantly increased, making LBM gains above that point insignificant.Two conclusions: cap to useful volume of ~20 sets per week.  Volume without tension is shit for growth because volume is NOT the primary driver of hypertrophy and never will be.
Schoenfeld Biceps 0.7% 2.1% 2.9% 18 sets By their own statistical methods, the highest volume was weakly/insignificantly better than moderate (described as ‘not worth mentioning’ in stats texts).

Conclusion: 18 sets upper and 27 lower optimal with higher volumes showing no meaningful differences (and certainly not for doubling the volume).

RF and VL changes ADDED together for poor comparison to Ostrowski.

Data in question because of an outright LIE in the discussion regarding Ostrowski triceps data.

Triceps 0.6% 1.4% 2.6% 18 sets
RF 2.0% 3.0% 6.8% 27 sets
VL 2.9% 4.6% 7.2% 27 sets
Heaselgrave Biceps 1 3 2 No statistical difference but a trend for 18 sets better than 9 and 27 no better than 18 (or even worse).

Ok, now let me summarize that horrible chart by showing where each study finds that their optimal results fall in terms of sets per muscle group per week.

Study Optimal Volume
Ostrowski 12 sets lower body (no higher tested), 14 sets upper body
Amirthalingam/Hackett* 12-18 sets lower body, 18-19 sets upper body
Huan 20 sets upper as a cap, possibly 20+ for lower body (needs more study)
Schoenfeld 18 sets upper (compared to 6 with no middle value for comparison)
27 sets lower (compared to 9 with no middle value for comparison)
Heaselgrave Trend for 18 better than 9 but 27 no better than 18.

*I grouped the two studies together since they were an identical methodology and only differed by length
and I’m getting tired of writing and making tables as it’s a pain in the ass in WordPress.

Looked at this way a pattern starts to show up.  Which is that a moderate volume tends to beat out either lower or very high volumes under basically every circumstances in trained individuals (again defined in most studies as 1-4 years of training or a minimum of 1 year regular training).  Or rather, a set count somewhere between 10/12 to 20 sets/week provides about the optimal results in all cases, at least within the limitations of the data available.  Only Schoenfeld’s leg data exceeds this but this is from a low volume of 9 compared to 27 with no middle value for comparison.  We can’t know what would have happened between those numbers.

Let me note that even IF you prefer the conclusion that Brad’s highest volumes groups gave a trend towards higher growth, it STILL contradicts the broader body of literature.  He still can’t explain why he needed to use 2X or 4X the volume to achieve the SAME growth as Ostrowski.  He can’t (read: won’t) explain a damn thing, especially when the data disagrees with him.  Let me note again that James Krieger made the explicit point that you have to look at all of the data and not focus on one study.  Yup.  And what all of the data except Brad’s study says is that 10-12 to 20 sets/week is the right number and Brad’s numbers are wrong.  Gotcha, James.  You played yourself, too.

Spoiler: My conclusion above is EXACTLY what Eric concluded as well in his MASS piece, that 10-20 sets/week was about optimal (and of course he did because it is what the MAJORITY of data supports). This was after he desperately tried to make Brad’s numbers and study not be total bullshit by dismissing the endless problems with it methodologically, by playing the “I do science” card and all other manners of silliness.   Which makes you wonder why he tried so hard to defend it with such pitiful arguments and reasoning.  He doesn’t even think the numbers are right or he’d drawn a different conclusion.  And yet he keeps trying to defend it with weaksauce defenses (leg extension volume load hahahahahaha.  I will never stop laughing at this).  I guess when seminar appearances are on the line you have to tow that line…

But this is kind of interesting because it does actually agree with Brad’s original meta-analysis (I am giving him the benefit of the doubt that it’s worth a shit to begin with and I question that with every passing day) which concluded only that 10+ sets gave the best growth response with no ability at the time to determine an upper cap.  So that passes the first reality check.  In all cases, up to 10 sets there is a clear improvement in growth response.  Above that, TO A POINT, there is a greater growth response but it shows a clear cap where higher volumes do NOT generate a greater response.  In most cases, it’s the same, in the case of Haun and triceps, it was worse.  As this is the only study showing a worse response at the highest volumes so no global conclusions can be drawn here in terms of more volume being detrimental.   It simply isn’t any better.

But despite Brad’s attempts to make huge volumes better, the broader body of work (5 of 6 studies) supports a cap of about 20 sets for upper body, if that, and possibly more for the legs (for which we need far more systematic research).  Not 45 sets/week more.  But possibly more than 20.  This goes along with endless anecdotal beliefs that legs need more training volume but more systematized studies need to be done to show where an optimum might fall and whether or not upper and lower body truly have different optimal volume levels in terms of their growth response.

Now I could cut this article here, going a really long way to reach the above conclusion.  But that would be the easy way out and I’m not done yet.  Because I now want to return to an issue I brought up in Part 1 and said I’d revisit.

The Set Count Issue Redux

I want to return to the issue of how sets should be counted that I mentioned in Part 1.  When the Heaselgrave study came out Brad responded with the following to try to make the study fit his conclusions.  Or he may have been talking about both that paper and the modified GVT paper as they are the only two that used isolation movements and he needed to dismiss the fact that they contradicts his results (nevermind that his own study used leg extensions).

He was jumping around a lot and it was tough to tell what he was trying to dismiss to make his own (incorrect) results happen.  Guru speak can be tough since the goalposts change with every post, sometimes including leg extension load volume data when nothing else will work….sorry can’t resist.

His assertion was that perhaps volume requirements are different for compound and isolation movements and that that changed how sets should be counted (which means that his numbers could still be right).   Or rather, that isolation movements should be counted differently (basically trying to dismiss Heaselgrave’s set count accuracy).

And honestly, this is just total guru bullshit in the sense that is is a change in argument when he needed it.  For years now, in every study he’s done and every meta-analysis Brad and his group have counted sets on a 1:1 basis.  They’ve always done training with compounds movements and measured peripheral muscles and treated the total set count for the compounds as applying to those muscles on a 1:1 basis.

It’s always been 1:1.

If someone does 1 set of bench press, that’s 1 set for very muscle involved by bench so one for shoulder and for triceps.  It has to be because even in his own paper he was measuring bicep and tricep size changes in response to compound work.  He wasn’t looking at chest and back thickness (again I question why pec isn’t done more since it is clearly technically possible and wonder if back can be measured).   If you’re going to look at bicep/tricep growth in response to only compound work (and note the odd little bench press study I described that looked at growth in pecs and triceps in response to bench only which means that pec CAN be measured) and count those sets in total, you’re calling it 1:1 for compound and isolation.  Brad and his group have treated it as such from the get go.  It’s also how he reported the ‘findings’ of his recent study too.  He didn’t say it was 30 and 45 sets but this has to be kept in the context of measuring triceps and biceps with only compound training movements.  He said 30 and 45 sets was best for growth with NO qualification whatsoever (well, he didn’t qualify anything until he got backed into a corner).

It’s been 1:1 from the get go for him.

Or it was UNTIL a study/studies came out that he wanted to dismiss. Suddenly, it’s no longer accurate to count it 1:1 which is just terribly convenient. Because you do not after the fact get to decide that the most recent study (or the GVT study) was different due to the isolation work and therefore do not contradict your results.   Even here the argument is totally worthless.  The Heaselgraves study did row, pulldown and curls.  Two compounds and one isolation so at most you count the third exercise differently.  Brad’s leg workout was 2 compounds and one isolation too so what’s the difference except that he doesn’t like data that contradicts him?  The Ostrowski study used isolation work too and Brad was happy to lie about the numbers there without considering set count.  He considered it 1:1 when it was convenient and then lied about the data to change the conclusion on top of it all and then decided it wasn’t 1:1 when it was no longer convenient to do so.

Pure guru shenanigans. The argument changes when it needs to.

I said back in Part 1 that I only used that convention for consistency to what they had been doing and that I didn’t agree with it at face value.  I’ve said the same thing for years.  Brad only said it when he needed it to defend his paper.  But even so, let’s go with this logic and see where it leads us.

Because NOW Brad seems to be arguing that isolation work counts differently towards the training response than compound work.  Presumably it’s worth more since it’s well direct.  That is, nobody is denying that bench works triceps and delts.  The question is to what degree in terms of tension and volume overload it works them and how it compares to direct work in that regard.

Let me add for honesty: in the discussion section of his most recent study, Brad does acknowledge in the limitations that the use of compounds and measurements of isolation might alter the set count conclusion and he used this as an argument for why his change of attitude wasn’t just a convenient excuse.  Which might fly except that this was NEVER brought up until he was backed into a corner and needed to bring it up to dismiss a study that contradicted his.  So he can say all he wants that he considered it but he knows that the majority don’t read the discussion (maybe why he thought he’d get away with his lie about the Ostrowski data).  But when every post he makes crowing about his results IGNORES it, he’s just bullshitting after the fact so far as I’m concerned.  An honest scientist mentions the limitations of their work UP FRONT when they present it whether in research OR PUBLICLY.  Like I have done in this article series for each paper, addressing the potential limitations (small subject number, DEXA only) as I went.  I’m not waiting to get backed into a corner to change my argument or magically find new data (cough cough leg extension load volume…..fucking seriously?)  He did just like James Krieger did.  You can ask me any question about this article series, and nothing I report will change from what I’ve already written unless I made an explicit mistake (which I will then fix).

But let’s go from the assumption (which I have felt from the get-go) that you need to count volume differently for compound and isolation work in terms of determining the growth response to training (note that Eric said this was also true in MASS even if he hemmed and hawed about the ratios involved).  I always have in practice and I’d say anyone with real-world training/coaching experience does as well.  We don’t consider a set of compound chest work to be 100% a triceps exercise (and nobody counts pec deck as a triceps movement although it does involve a little biceps although absolutely nobody counts that).  Most people aren’t even aware that triceps long head is involved in rowing due to it’s function as a shoulder extensor but nobody on the planet would count that towards triceps volume.

It might be conditionally true for trainees with very specific levers who can just bench and build big tris (and even perfectly built benchers do extra triceps work) but we don’t generally count it that way.  Well I don’t and neither does anybody else I know with actual training or coaching experience.  If you look at every workout routine I’ve ever written, total sets of chest (which always includes a compound movement which might be followed by a second compound or an isolation movement depending) are higher than for direct arm work because I’m counting some of the compound chest (or back) work towards arms in terms of daily and weekly totals.   Delts is always funky but I do the same thing kind of.  Shoulders are complicated since it’s three heads with different functions and one is pushing, one is pulling and one is either neither or both depending on you look at it (it’s really humeral abduction but let’s not get too entrenched in this).

How Do We Count Sets?

The question is then how should you count the sets of a compound exercise towards smaller muscles.  I don’t know but I’m going to start with an assumption of a 0.5:1 relationship.  That is, I will count one set of bench or row as one half of a set for triceps or biceps.  Is this right?  Doesn’t really matter, ignore the specifics and follow the logic.  You can math it out for a different value if you’d like.  Call it 1/3rd or 2/3rds.  Call it 3/4ths.   Whatever fits your personal bias. This is my assumption but it’s only that.  Make your own and do the math based on it.  It will only change the specifics but won’t change the general conclusions that come out of the exercise.

Let me note that the ratio must be lower than 1:1 since nobody would EVER count a compound movement as MORE than 1 set for the smaller muscles (ok, I know how people read my articles and someone will make a strawman about poorly done rows being more biceps and that’s fine. Let’s define this as the movement being done properly in a technical sense).  The question is simply how much lower.  Pick your ratio and break out the calculator.

Just don’t call it 1:1 until it’s convenient not to do so like Brad did.

Re-Analyzing the Volume and Hypertrophy Data

Because if Brad is now going to say that compound and isolation movements count differently in terms of sets then EVERY OTHER ONE OF BRAD’S STUDIES AND ALL THE REST have to be recalculated in terms of their effective set count including his original meta-analysis AND his most recent paper and all of the others I’ve examined.   His study used only compound movements (with leg extensions for quads) but looked only at single joint muscles and many studies seem to do this.  If he NOW thinks isolation exercises count differently than his effective bodypart volume changes.  And so do the set counts on every other study (the set counts on his meta-analysis will also change but I don’t know what body of literature it used and what proportion used compounds versus compounds and isolations).

So let me recalculate them based on my assumption that a compound set counts as 1/2 a set for smaller muscles and a  direct exercise counts as a full set (i.e. bench press is 0.5 sets for triceps, triceps extension is 1 set for triceps).  Again, this is my starting assumption and nothing more.  Use whatever ratio makes you happy so long as it’s less than 1:1.  And don’t play silly buggers at the extremes, we all know it’s not 0.9:1 to one or 0.1:1.  It’s probably not 1/4 to one or 4/5ths to 1 either but somewhere clustered around the middle depending on the movement.  Maybe 1/3, 1/2, 2/3, 3/4…again I don’t know for sure and nobody else does either. So I’m using 1/2 and literally splitting the middle.

So all I did was go back and count the sets.  If it was compound exercises, I cut the number of sets in half (10 sets per week becomes 5).  If isolation, I didn’t (5 sets per week = 5 sets per week).  If there was a mix I counted compounds as half and isolations as one and added them (10 sets compound chest = 5 sets + 5 sets isolation triceps = 10 total sets for triceps, down from 15 originally).  Only two used a mixture so I probably didn’t screw the math up too badly.  Even if I got a number slightly wrong it doesn’t change the overall conclusions.

Now let me be clear again, I am NOT saying that this is a perfect analysis or that a 0.5:1 estimate is right so spare me the strawman arguments that I’m trying to force a set of data.  I’m simply saying that if Brad is going to dismiss a study result he doesn’t like based on isolation vs. compound needing to be counted differently, that opens the door for this type of analysis.  Beating a (very) dead horse, you can redo it assuming compound is worth 2/3rd of set of 3/4.  The numbers will change slightly and that’s fine and they’ll be marginally higher than my 1/2 assumption. If you go with 1/3rd they will be marginally lower.  But for rational set counts, the differences aren’t even that much.  Focus on the principles, not the specifics, folks.

But they will ALL go DOWN from what Brad was claiming them to be originally.  EVERY SINGLE STUDY.  That means in his meta, in his own study and in his examination of previous studies the numbers will all decrease.  Even in Ostrowski which he lied about in his discussion.  All of the set counts decrease.  None of the studies I examined used only isolation movements so there is NO situation where the numbers don’t go down.  And since nobody would ever count a compound movement as MORE that one set, they can’t EVER go up.

Yes I am beating a dead horse but I know how people read my articles. At least one person will claim “Lyle said all compound movements are worth one half a set for the other muscles involved” which I am not saying in the least.  I am saying this is my working assumption for lack of a better one and that’s all it is.  Again, use your own number that is lower than 1:1.  Just follow the logic here.

Yes, we need data on how to compare the exercises to see what the best counting approach would be.  I am aware of one that compared pulldown to biceps curls for recovery and the biceps curls took longer to recover from than the pulldowns so clearly it’s NOT 1:1.  The pulldowns didn’t hit the biceps as much as direct arm work did.  Dadoi.  Until more data exists, we make assumptions.

It might even and probably will turn out that different movements should be counted differently.  An undergrip pulldown is more biceps involvement than overgrip and a parallel grip is halfway in between (with more brachiails).  A high bar squat is more quads than low bar and a close grip bench is close to an compound triceps exercise but a flared elbow bench is more pec specific and how you’d count those towards triceps would likely differ (I’d call close grip almost 1:1 for triceps but flared elbow as 0.5:1), etc.  Back gets super complicated as we’re dealing with the traps (with multiple sections), rhomboids, teres, lats (which have two segments with slightly different orientations) and back movements work them to varying degrees based on movement, grip, bar, etc.

Coaches make adjustments for this based on experience.  If I were using overgrip pulldowns with a trainee, I’d give them slightly more direct biceps work to compensate compared to if they were doing undergrip pulldowns which would have worked the biceps more. If they did V-bar rows, I’d make adjustments to biceps compared to doing an undergrip row.  I’d give a low bar squatter who sits back more direct quad work than one squatting high bar for example.  Everybody with any real-world experience does this in practice to some degree.  We do this based on 1/2 guesswork,  1/2 experience, 1/2 science, and 1/2 intuition (and sometimes1/2 luck).

Again, let’s not get too mired in the specifics here (as I do that very thing).  Follow the logic.

Rebuilding the Model: Part 1

And with my assumption of a 0.5:1 relationship, here are the re-mathed set counts for each of the 6 studies I’ve included. I’ve shown the original set counts in parentheses next to the re-mathed value and I probably messed at least one of these up because math is hard, my brain is tired, and I don’t bother to run it twice.  And this is total sets per week.  I’ve indicated in red which group did best (based on the analysis above) and might have even gotten it mostly right.  I am quite sure that anybody wishing to dismiss my conclusions based on a single typo will make me aware of that typo and I will change it because that’s the intellectually honest thing to do.

Study Muscle Low Moderate High
Ostrowski Lower 2 (3) 4 (6) 8 (12)
Upper 5 (7) 8 (14) 20 (28)
Amirlingtham/Hackett* Triceps 10-11 (16-18) 16 (26-28)
  Quads 7-8.5 (11-13)  8.5-9 (16-18)
Huan Increasing from 5-16 sets with a cap on growth at 10 sets for upper and possibly more (up to 16) for lower.
Schoenfeld Upper 3 (6) 9 (18) 15(30)
Lower 6 (9) 18 (27) 30 (45)
Haeselgrave 6 (9) 12 (18) 18.5 (27)

*I grouped the two studies together since they were an identical methodology and only differed by length
and I’m getting tired of writing and making tables as it’s a pain in the ass in WordPress.

And this brings the results into even starker view.  Ostrowski fits with all other data which shows a clear dose response relationship up to 10 sets for legs although the lack of data above 8 sets limits this finding and we can’t know if more would generate more growth. For upper, 20 sets (down from 28) wasn’t better than 8 down from 14.   Since 8 and 20 got the same growth, it seems unlikely that a middle value would get different results although it’s possible that 14 would but 20 was too much for some reason.  Without data, this is a guess and we need studies examining different intermediate values to know for sure.  Test 8, 14 and 20 next time.

For the GVT studies 10-11 sets per week as a mix of compounds and isolation was as good as 16 sets/week for upper body.  Inasmuch as the differences were miniscule, 8.5-9 sets per week for legs was better than 7.5-8.5 but at this point, we’re looking at a single set difference when it’s re-mathed and that alone would explain why the results were essentially identical (the stimulus was essentially identical).  You wouldn’t expect 1 set to matter but maybe if it were 7-8.5 vs. 15-16 it would. That group has already done the same study twice, now do it a third time with real differences in lower body volumes (give them a second leg day).

Schoenfeld’s data becomes a lot less idiotic now and at least starts to pass the reality check, in line with the other studies.  9 sets was as good as 15 in terms of triceps growth (because his stats did NOT show that the highest volume was more than insignificantly superior to the moderate).  Even if you believe that his highest volume was superior, it’s cut to a realistic 15 sets per week from an absolutely moronic 30 sets per week.  This starts to fit the reality check and is still well within the realm of 10-20 sets.  Like I said, big picture whether you accept my contention that moderate was as good as high or potential trend for highest to be superior, when you count the sets rationally, it stops mattering, at least for upper body where both moderate and high fall within 10-20 sets/week.

We still lack data on chest growth per se and it might require more volume or it might not so whether or not the original values matter is unknown (i.e. does the chest somehow need 30 sets of direct work…I doubt it). Until it’s measured, we don’t know.  No study can address that yet  and the bench press only study I referenced in Part 1 didn’t compare different volumes although it would be hard to see how the 9 sets that worked for 6 months suddenly needed to be tripled after one year.   But he doesn’t get to count chest volume and then measure triceps to draw conclusions about optimal sets for all muscle groups (which he essentially did) and then decide that you have to count volume differently for isolation exercises after the fact (which he actually did).

For lower body the 18 sets was as good as 30, again passing the reality check.   Here, if you take his higher volume claims as better that’s a pretty high set count (30 sets/week) although there might very well be a plateau value between 18 and 30 sets (we don’t know) which would be consistent with Haun (maybe) and anecdote.  Maybe.   If more than 20 sets IS optimal for legs (and this is still in the IF stage), a third group at 24 sets might have done better.  Testing 18 vs. 24 vs. 30 sets would be very informative but it has to be a lab that isn’t Brads.  The stats and strength gains still don’t support it and the fact that he lied about data should make his study inadmissible on fundamental grounds.

There’s still that pesky ECW issue to worry about above 20 sets per week which now ONLY the highest volume leg work in Brad’s study crosses (maybe that explains the almost significantly higher leg extension load volume.  Hahahahaha.  I’m never gonna stop laughing at that shit).  Then again, Haun was using pure compounds so that probably doesn’t make any sense as I think about it since I’m now comparing a compound only study to remathed sets.  So yeah, forget that bit, it’s wrong.  Based on initial volume, several groups in Schoenfeld cross 20 sets/week.  And that means ECW might be playing a role or artificially increasing the results.  I’d only note that with a spread of 18 to 30 sets, we don’t know if a middle value (i.e. 24 sets) would be superior until it’s directly tested.  Finally is Haeselgrave which found that 12 sets was better than 6 but no better than 18.5.

Rebuilding the Model: Part 2

Recreating the chart from above with the new numbers we get the following optimal volumes per week.

Study Optimal Volume
Ostrowski 8 sets lower body (no higher tested), 8 sets upper body
Amirthalingam/Hackett* 10-11 sets upper, 8.5-9 sets lower (the huge drop in set count is due to the single leg day)
Huan Increasing from 5-16 sets with a cap on growth at 10 sets for upper and up to 16 for lower.
Schoenfeld 9 sets for upper body, 18 sets for lower body (15 and 30 if you accept the highest volumes)
Heaselgrave 12 better than 6 but 18.5 no better than 12

So we get systematically lower numbers here, as expected.    And again, if you disagree with my 0.5:1 and use a different value, the numbers change slightly but they still all go down (i.e. if you use 3/4:1 Ostrowski’s leg data might be 10 sets instead of the original 12 or my 0.5:1 assumption 8) Basically, for any moderate set count, the differences in remathed sets just isn’t that significant.  I mean, consider a group that did 10 sets/week of compound.  If I assume 0.5:1 that goes to 5 sets.  Assume 1/3rd and it goes to 3.  Assume 2/3rds and it goes to 6.  Assume 3/4 and it goes to 7.5 or whatever and we are looking at a 4 set spread.  Use the lower ratio and it’s a little lower, use a higher ratio and it’s a little higher. And it all more or less stays in the same overall range we’re looking at here.

But it’s ALWAYS less than the original value which is my point here.

Looking at the new numbers, Ostrowski’s upper body optimal volume is 8 sets/week.  Lower body matches 8 sets/week with no higher values tested so we can’t know what would happen above that.   The two GVT studies are 10-11 sets for upper and 8.5-9 sets for lower with no higher volumes of lower body tested.  Haun finds a cap on upper body of 10 sets (down from 20) and possibly up to 16 for lower (down from 32).  Brad’s number stop being totally moronic when you don’t count them in an ass-backwards way with 9 sets for upper and 18 sets for lower body, generally matching the results of Haun.  Heaselgrave is at 12 sets for triceps but 18.5 was no better.  And all of this basically agrees with the original 10+ set meta-analysis even remathed (though it’s conclusions should probably change if it is remathed) except that we now have a much better idea of the upper caps on weekly set volume.  There’s a dearth of leg data at higher volumes and more study is needed here.

So my next to final comment: whether you look at the original unadjusted data or the semi-adjusted data for set count, you still see a general optimal range of 10-20 (original count)/8-16 (remathed count) sets per week per muscle group which is all close enough for government work.   Let’s just call it 8-20 sets/week and move on with our lives.  And again, this is consistent with Eric Helm’s own conclusions in MASS of 10-20 sets/week after his pitiful defense of Brad’s paper.

There is still the slight indication that *maybe* more for sets/week for legs would be better but it’s understudied and any conclusions would be tentative as hell.  But there is no way on god’s green earth to justify the 30 and 45 sets Schoenfeld et. al. is so desperate to prove as optimal.  His own data doesn’t support it, his stats don’t support it, the bullshit apologism by everyone involved doesn’t support it, a rational re-analysis of the set count doesn’t support and neither does the broad body of literature, his lie about the Ostrowski data notwithstanding, support it.  Nothing supports it except his burning desire to support it with guru games and because he’s believed all along in these types of high volumes.

Now We Refine the Model

Because like I said, this is how science works: you take all the available data and you make a model.  You don’t fixate on individual studies and it’s the overall body of literature that is relevant (again, I thank James Krieger for making my point for me on this).   And ignoring Radaelli, I have presented 5 studies showing that, in the aggregate, moderates volumes somewhere between 8-20 sets per week provide the maximal growth response and 1 that fails the reality check so hard it hurts, where data was lied about in the discussion (a fact that NOBODY has yet to address directly for me) and which should be dismissed on that fact alone.  Legs maybe need a bit higher but we need more data.

Now, it’s possible that more work will change that and I’ll change my model and opinion when and if they do.  But unless we do find out that Ultrasound doesn’t measure muscle growth or something and all of these studies go into the junk pile I won’t hold my breath.  They match one another despite varying methodologies and they match (for what it’s worth) with real-world training practices.  They pass the reality check is what I’m saying.  At best we’ll refine the above numbers with more targeted research.

That is, future research might start from the idea that 10 to 20 sets/week is optimal and determine what specific volume is optimal within that range.  Or more systematically compare lower body and upper body.  Perhaps look at 10,15,20 for upper body exercises and 15,20,25 for lower. But stop doing 9,18,30 where the variance is just too huge to know what happens in the middle ranges.  Is 12 the same as 18, is 24 the same as 30?  Stop focusing on sets per exercise.  If you want to test sets per week, set it up to do that in a rational way.

Feel free to contact me with help with the study design.  I can also probably figure out how to pre-register the study, describe the randomization in the methods (and randomize the subjects in a blinded fashion to do the Ultrasound) and efficiently write the discussion with accurate data representation for anybody who just doesn’t have time…..

But expectationally based on the broad body of literature, optimal results are likely to be found between 8-10 to 20 sets per muscle group per week.  Once again, Eric drew this same conclusion after his desperate efforts to make Brad’s paper not be shit.   And I just read something by Bret Contreras of all people saying to stop doing insane volumes and focus on intensity.  Good lord, when Bret is the rational one and Mike Isratel is the intellectually honest one in all of this, Mercury must be in retrograde.  But Bret was on Brad’s paper and he better be careful or Brad will kick him out of the paper publishing circle jerk or prevent him from getting seminar appearances for not towing the party line.

Ok, two more comments and I’m done.

The Generic Bulking Routine

With the above in mind, a rough volume of perhaps 10-20 (or 8-16 depending on the analysis) sets per week as an optimal growth number, I want to look at what I have presented for years as my Generic Bulking Routine.  This was an intermediate program I drew up absolute ages ago that has proven to work for intermediates for over a decade.  I report this only anecdotally and nothing more, I’m not James Krieger who thinks anecdote counts as ‘science’.  But if we’re going to pretend to integrate science and practice, then it is always nice when practice actually matches up with the science.

It was an Upper/Lower routine done 4 times per week with each day having the general structure shown below and was meant to be done as 2 weeks of a submaximal run up and then 6 weeks of trying to make progressive weight increase (progressive tension overload being the PRIMARY driver on growth with sufficient volume within being optimal) prior to backcycling the weights and starting over with the goal of ending up stronger over time.   It was mean to be an intermediate program used from about the 1-1.5 year mark of consistent training to maybe 3 year mark before my specialization routines were implemented.

Weights didn’t HAVE to be increased every week or workout, that was simply the goal (as Dante Trudell put it in his Doggcrapp system, you should be trying to beat the log book at each workout).  In my experience, so long as folks were eating and recovering well and started submaximally, they could do so over relatively short time periods like this (over a longer training cycle, I’d do different things).  Women perhaps less so than men for unrelated reasons but no matter, Volume 2 is coming eventually…

I’m going to provide specific exercises in the template but just think of them as either compound or isolation for the muscles involved since exercise selection is highly individually dependent. RI is rest interval and note that I use fairly long ones so ensure quality of training with real weights and ideally all sets are at the same heavy weight (oh yeah, in ‘Merkun a single apostrophe is minutes and a double is seconds and I am told this is the opposite of the rest of the world).  Big compound movements get 3 minutes, smaller muscles get 2 minutes and high rep work gets 90 seconds since it’s meant to be more of a fatigue stimulus to begin with.

But for big movements, a 90 second rest interval is bullshit and means that you’re probably squatting with 95 lbs on the bar by your fifth set ‘to failure’.  Better to do less sets and give yourself long enough to do quality work.  In that vein, after the submax run up, the goal RIR was maybe 2-3 for the initial set which would likely drop to 1 or even near failure by the last set.  The goal is progressive TENSION overload over time (meaning multiple training cycles).  When your workouts don’t use stupid volumes, you’re in the gym the same amount of time but can actually do quality work than when you’re trying to fit in 45 fucking sets and get done before tomorrow.

Upper SetsXReps(RI) Lower SetsXReps(RI)
Flat Bench 3-4X6-8(3′) Squat 3-4X6-8 (3′)
Row 3-4X6-8 (3′) RDL or Leg Curl 3-4X6-8 (3′)
Incline DB Bench 2-3X10-12 (2′) Leg Press 2-3X10-12 (2′)
Pulldown 2-3X10-12 (2′) Another leg curl 2-3X10-12 (2′)
Lateral Raise 3-4X6-8 (3′) Calf Raise 3-4X6-8 (3′)
Rear delt 3-4X6-8 (3′) Seated Calf Raise 2-3X10-12 (2′)
Direct Triceps 1-2X12-15 (90″) Abs Couple of heavy sets
Direct Biceps 1-2X12-15 (90″) Low Back Couple of heavy sets

The exercises are for example only and the other two workouts per week could be a repeat of the same movement or different within that general structure (exercises can be also be changed with each succeeding training cycle).  So start with incline bench and pulldown for the sets of 6-8 and do flat bench and row for the sets of 10-12 or whatever.

Now let’s add up the set count:

Compound Chest: 5-7 sets twice/week for 10-14 sets/week (counted as 5-7 sets for tris at 0.5:1)
Compound Back: 5-7 sets twice/week for 10-14 sets/week (counted as 5-7 sets for bis at 0.5:1)
Side delts: 6-8 sets per week.
Rear delts: 6-8 sets per week (gets hit somewhat by pulling but hard to math out)

Note: Effective delt volume is likely a bit higher than this but it’s a pain in the ass to estimate how much side delts do or do not get hit by compound pushing.  Or rear delts via compound pulling.  It might math out to 8-10 sets/week or less or maybe more.  Again, hard to say but most report just fine delt growth from the above (and no shoulder problems which is why it’s an upper/lower to begin with).

Bis/Tris: 1-2 direct sets added to compound work = 2-4 sets/week + 5-7 indirect sets/week = 7-11 sets/week.  Add a third or even fourth set if you like to get 8-12 sets/week or 10-14 sets/week of combined indirect and direct arm work.  I certainly agree that if you do heavy pushing and pulling you don’t need a lot of warm work.  I simply do NOT agree that the sets count 1:1.  But my workout designs usually have proportionally less direct arm work since I partially count the compound pushing/pulling and always have and always will.

Let me comment before moving forwards that while this might seem like a low per workout volume to some (and high to others), it matches the set count data based on my analysis above.  As well, I have contended for years that if you can’t get a proper stimulus to your muscles with that number of sets, volume is not the problem.  Rather, you are.  Whether it’s due to suboptimal intensity, focus, technique sucking, etc. you are the problem with your workout.  Doing more crap sets will never top doing a moderate amounts of GOOD sets.

Regardless, looking at it now, with 15 years of experience with it and the data analysis I just did, I might bump up the side delt volume a bit.   As noted above, the contribution from chest work is tough to really establish here and the delt has three heads with differing functions.  But no matter.  Let’s focus on generalities.  Which are that my general set count for this workout is and has always been right in the range of what the analysis of the majority of the training studies found to be optimal.

This template could be adjusted in various ways.  The second chest and back movement could be isolation which would reduce the indirect set count on arms, necessitating an increase in direct work.   So if someone did 4 sets of flat bench and 3 sets of incline flye that’s still 14 sets/week for chest but reduces indirect arm work to only 4 sets/week (8 sets of compound pushing divided by 2) so you bump direct arms to 3-4 sets per workout to get to 6-8 direct sets per week and 4 indirect sets for 10-14 sets per week.  I think that makes sense.  The point being that I am looking at total set counts per week (actually I was counting reps but it all evens out) and adjusting volumes for smaller bodyparts based on exercise selection.  If you use more isolation movements for chest or back that decreases the indirect set count for bis and tris so I’d add more direct work there.

The same holds for legs where quads are worked for 5-7 sets twice weekly or 10-14 sets, same for hams and calves.  I might bump this up slightly although high volumes of truly HEAVY leg work is pretty brutal, add a third movement like leg extension and another leg curl to for a couple of higher rep (12-15 rep) sets apiece.  Now it’s 7-9 sets twice a week or 14-18 sets.  Towards the higher end of volume but until we know for sure that it’s 20+, I’m not changing much here.  And, again, a workout with 20+ heavy sets of legs (including quads, hams and calves) is gruelling.

But overall upper body comes in at somewhere between 7-14 sets for upper body muscles and 10-14 for legs.  Again, intermediate program from like 1.5-3 years or so.

Those numbers look so very familiar.

The Wernbom Meta-Analysis

Let me finish by revisiting the original Wernbom analysis that looked at intensity, volume and frequency in terms of optimal growth.  It’s become pretty fashionable these days to dump on it for various reasons.  It’s fairly old, there is more data now and there was simply very little work done on intermediate much less advanced trainees at the time.

Irrespective of that, within moderate intensities (the typical ‘hypertrophy zone of perhaps 70-85% 1RM), it concluded that a volume of 40-70 repetitions twice/weekly was optimal for growth with triceps and quads being the muscles of interest.   I honestly think using reps per week is a better approach than sets since obviously 10 sets of 1 and 10 sets of 10 are not the same stimulus.  That said, since almost all of work on this topic stays in the 8-12 range or so, set counts are at least conditionally appropriate.  Within any rationally accepted repetition range, it just all sort of balances out.

If you add up the reps on my GBR you can see where my numbers come from. I use a combination of heavy 6-8’s for tension and 10-12 or 12-15 for more fatigue which is why I mix them but you end up with roughly that number of reps for every muscle group (you can count reps on compound chest/back/legs as half the reps for arms but it should all math out more or less correctly because that’s how I set it up).

No Wernbom wasn’t on well trained subjects but none of the above studies used elite guys either because a 1.1 bodyweight bench is not elite in men, it’s advanced noob.  Wernbom was basing on a limited data set in, at best, limited work on even intermediate trainees (again, just like the above studies) and still concluded 40-70 contractions twice a week gave optimal growth compared to lower and higher values.

So we double 40-70 and that’s 80-140 repetitions per week per muscle group.   Some quick maths.

At 10 reps per set 80-140 reps per week yields 8-14 sets per week.
At 8 reps per set 80-140 reps per week yields 10-16 sets per week.
A mix of 4X8 (32 reps) and 3×12-15 (36-45) for 68-77 reps per week is 14 sets/week.
A mix of say 5X5 (25 reps) and 3-4X1012 (30-36 reps) for 55-71 reps twice a week is 16-18 sets/week.

So for any rational workout design an optimal repetition count of 40-70 reps/workout done twice per week for 80-140 total reps per week put us somewhere in the realm of 8-18 sets/week for the optimal growth response.

Well whaddya know about that?

A Challenge to Brad Schoenfeld and Others

So I had originally said I would leave this be, that this wasn’t a rap battle, after writing my last detailed criticism of the recent Brad Schoenfeld study.  Well clearly that’s not the case.

More on the Statistics

First let me point readers to a thorough analysis of the statistics used in Brad’s paper by Brian Bucher.  Basically he takes them apart and shows that none of the THREE metrics supports their strongly worded conclusions.

None of them.

In this vein, here’s something interesting.

Brad and his group have NEVER used Bayesian statistics until this paper.  I searched on my folder of his papers and the term Bayesian shows up 4 times.  Three are papers that Menno Henselmans was on and it’s his email address.  The fourth is one of James Krieger’s meta analyses.  At best James has used them before.

Now I find this interesting because there is no way to know if Brad and James had planned to use this approach ahead of time.  James has asserted that they did but this cannot be proven.  Here’s why: in research it is common to register trials before doing them.  This is required in medical research by the Declaration of Helsinki (note that not all journal choose to follow this).

Basically you outline your goal, hypothesis, methodology, what statistical methods you intend to use.  This prevents researchers from gaming it after the fact, using different statistical methods to try and make an outcome happen.  This is common in research, you just keep throwing different statistical methods at your data until something says it’s significant. Registration is just another way to reduce bias or, in scientific terms, shenanigans.

Brad Schoenfeld appears to have never registered a single trial of his.  Again, apparently he is above scientific standards.  But this allow for the following potential to occur.

They gather their data, colored by every issue I already described.  Now they run standard P value stats and find that there’s no difference between moderate and high volume.  That’s what Brian’s analysis showed as the subscript on the 3 and 5 set groups is identical.   They were NOT different from each other by that method.  Both were better than low volume but moderate and high were IDENTICAL statistically.

In the absence of registration, there is NOTHING to stop them from applying another method, say Bayesian to try and make their desired outcome happen.  Which they did.  Even here, the Bayesian factors were weak, approaching not worth mentioning by standard interpretations.  This didn’t prevent them from making a strong conclusion in the paper or online.  Brad is still crowing about it despite the simple fact that the statistics do not support the conclusion.

But there is no way to prove one way or another that they did or did not do this.  But without registration of the trial, they can’t prove that they didn’t. And the onus is on THEM to do so.  Somehow I doubt that they will provide said proof.  But registering the trial would have prevented yet another criticism of their paper.  More below.

More on the Individuals Involved

Before I get into anything else, I want to examine how three individuals involved in this responded to both my original and more recent criticisms.

Brad Schoenfeld: Brad ducked them completely.  He said he wouldn’t respond because I insulted him.  You know who else told me that?  Layne Norton when I took him to task over reverse dieting years ago.  Then Brad left my FB group and blocked me on Facebook.  Because it’s always easiest to win an argument when you don’t allow for dissenting opinions.  Gary Taubes, Noakes, Fung and Layne do the same thing.  It’s standard guru operating procedure.  Just make any criticism you can’t address disappear.  If anyone had done it to Brad you’d have never heard the end of it.  But he’s above the law.

I’d note that I never removed Brad from my group.  He left voluntarily, presumably because he got tired of seeing people ask him to address my criticism when he couldn’t.  So he just punked out completely.  I’d also note that he played the “you don’t even science” card on a couple of critics.  That’s a Layne Norton tactic too.  Typical guru approach.

I’d add that Brad also blocked Lucas Tufur (who wrote an excellent article on their paper) as well for the mere act of “suggesting Brad was misinterpreting a study somewhat”.  Typical guru behavior and don’t pretend it’s anything else.

Science is based on discourse and debate and that is how it progresses.  Honest scientists embrace debate because it gives them the opportunity to defend their work (flatly: if you can’t address criticism, perhaps your work is not as strong as you think).

When Brad writes letters to the editor about papers he doesn’t like, he expects a response.  But just as with blinding and randomization and Cochrane guidelines, Brad is clearly above the scientific method.  He gets to guru out. Others do not.  The behavior he would never allow anyone else to engage in is acceptable for him and him alone.  Well and other gurus like Layne Norton who built himself up as the “anti-guru” until he became one himself.  The standards he held ALL OTHERS to stopped mattering when he was selling reverse dieting (seminars on which bought him a mansion).

Eric Helms: Now in a sense Eric Helms has no dog in this fight in that he wasn’t involved with the paper.  Except that he does because of how he dealt with this issue.  In email I had asked him about it and told him my issues and he debated back and forth before telling me he hadn’t even read the paper.  But he was already defending Brad.

In my group (and note that I tagged him, forcing him to get involved) I asked him about Brad’s misrepresentation of the Ostrowski data.  His response, a total deflection was “But you’ve done it too.”  I asked him when.  And now get this: he referred me to a NINE YEAR OLD blog post I did on FFMI.   NINE YEARS OLD.

In it, I looked at some data from the Pope paper on previous Mr. America winners and had stated that only one or two exceeded the FFMI cutoff.  The real number was closer to 6.   But it’s of no relevance.  My mistyping didn’t change my conclusion: While there are exceptions to the FFMI cuttoff, overall it is a good cutoff in 99% of cases.  But it was an error, yes.  I admitted it and changed it immediately (because intellectually honest individual admit a mistake and fix it, something more in this field should try).  I’d have changed it sooner had I known before a week or so ago.

Regardless, what Eric did was to compare a nine-year old blog post (where the error changed nothing) to a scientist lying about data (in such a way as to change its conclusion) in a published peer-reviewed journal. Said scientist using that lie to support a paper’s conclusion and increase his own visibility (and presumably seminar visits at $5k per appearance).

I’m sorry but does Eric really think a NINE YEAR OLD blog post can or should be held to the same standard as a published scientific paper?  Apparently so but only because it allowed him to completely avoid addressing my issue with Brad’s paper.  It was a guru deflection and nothing more.   He has never since addressed a single criticism I have levied against Brad’s paper.  NOT ONE.

More than that it was a blindside.  Eric has been a colleague and I guess friend for many years. I edited his books, he contributed a lot to The Women’s Book including an appendix on peak week and making weight.  He could have told me about this error any time in the last half decade.  Instead, he apparently saved it up for ammunition against me in case he ever needed it.  It would be like me leaving one of the myriad errors in his books in when I edited it to use against him if need arose.  But I didn’t do that.  Because I have intellectual integrity.

Eric has since blocked me on FB and left my group.  Again it’s easy to win an argument when someone can’t defend themselves.  He has apparently claimed it was due to me ‘impugning his integrity’. Sorry, Eric I can’t impugn something that doesn’t exist.  He has acted unprofessionally and, in an effort to defend Brad (with whom he also does seminars with) blindsided a different colleague entirely.

A man with integrity would not deflect a real criticism with a blindside.  A man with integrity doesn’t have to crow online about how he has integrity.  A man with integrity shows that he has integrity by his actions.  By being intellectually honest and not applying a pathetic double standard when it suits him.

Eric has not shown integrity in this matter.

Whatever, I will show him the meaning of true integrity shortly.

James Krieger: And finally James himself. First let me point readers to a FB thread where James is getting kicked around by Lucas Tufur over his recent post. You can watch James mis-reference and mis-represent studies front, back and sideways while Lucas points out his errors and he just moves the goalposts.  Maybe that will tell you what’s going on.  It’s just desperation at this point, he can’t admit that the study was methodologically unsound, the statistics didn’t support the conclusion or say they were wrong.  So it’s pure guru behavior.

Now I will continue to give James credit in that he was the only one with the balls to even attempt a defense.  Brad punked out and deflected and Eric did too which is simply pathetic.  James at least tried even if he used the same guru tactics, deflections and obfuscations in doing so.  He still doesn’t understand and can’t defend Bucher’s analysis of his stats but even then I’d question why someone with an MS in nutrition and ex. phys is doing the stats in the first place.  That’s what mathematicians are for.  I have a cousin with an MS in applied mathematics and she runs the stats on big medical trials.  Why is James doing it?

Well I think it’s simple: he’s good enough at it (mind you computer programs do most of the work at this point), knows how to use Bayesian statistics to obfuscate stuff and, most importantly, shares Brad’s bias about volume.  An unbiased statistician wouldn’t play silly buggers like James did.  Note again my comment above: in an unregistered trial there is NO evidence that James didn’t run the frequentist methods and then, when it didn’t support the conclusion they wanted, use other stats in a feeble attempt to make them happen.  This happens a LOT in science.  That’s why you register papers, to eliminate the potential or accusation that happens. It’s why you randomize and blind to reduce the RISK of bias.

After writing my last criticism, James first attempted to defend to some degree the criticisms before writing HIS final last response.  You can find it here.  Note that the intellectually honest individual shows both sides of the story something I doubt he has done.  Now I’m sure he made some good points.  But he also made some gross misrepresentations, ones most won’t catch.  Some highlights.

Lyle Doesn’t Even Science

James asserts that I simply don’t have enough experience doing science (there it is again, “Lyle doesn’t even science”) to understand the realities of doing it.  And yet I know that proper randomization, blinding, trial registration, data reporting etc. are good practices to reduce the risk of bias.  Maybe Brad and James should do some remedial work since I seem to know a lot more about good scientific practices than they do.  Seriously, if a bunch of randos on the Internet are having to educate ‘professional’ researchers about basic methodology….

Cost and Funding

He also blathers about the cost and funding involved and how many of the methodological issues aren’t realistic financially.  I never said science was easy and I know it’s expensive so there’s his strawman.   According to Google Scholar, Brad has published about forty seven papers already this year.  That’s 5 per month since it’s only September, most researchers do maybe 1 a year because that is how long data gathering and analysis usually takes.  Funding is clearly not an issue and perhaps Brad should do one GOOD study per year instead of putting his name on 4 per month.  Or maybe that’s why they are hiring a non-mathetmaticist to do the stats: he can’t afford an actual statistician because he’s using his funding on too many poor quality papers.

Let me add: Describing the randomization of a study is free.  Registering a trial is free.  Blinding might increase the costs for technical reasons but, as above, rather than doing endless sub-par studies, why not put the funding towards fewer QUALITY STUDIES?  The same fancy computer that James uses to run his stats can randomize subjects to the different groups at no cost.  Certainly getting a second Ultrasound tech might cost money, perhaps Brad is the only one on campus trained in it.  He could still be blinded to who he is measuring which, as per Cochrane, reduces the risk of bias from high to low.

Blinding of Ostrowski

He also babbles something about whether or not Ostrowski was blinded or why I didn’t mention it above.  This is a pure deflection.  Essentially he’s arguing that since other studies might be methodologically unsound, it’s ok for theirs to be.  This is like arguing in court that “Yes, this man may have murdered someone.  But how do we know YOU haven’t murdered someone.” to deflect attention from the issue at hand  The methodology of Ostrowski is not at question here, the methodology  and discussion of Brad’s paper is.  Regardless, it’s irrelevant.

Whether or not the Ostrowski is blinded or not doesn’t matter because I’m not the one holding it up as providing evidence.  I agree that there is a trend as claimed by Brad but I’m not using the data per se as evidence.  If James is saying it should be dismissed for not being blinded, then Brad can’t use it in the discussion to support his conclusions.  And in using it in his discussion, Brad is saying it’s valid.  Which means that how it was represented is all that is at issue here.  And it was misrepresented completely, a point that James finally acknowledged himself.

Basically, either the data is valid or it’s not and James can’t have it both ways.   And my actions don’t impact on that.  Only Brad’s does.

And that is still just a deflection from the fact that, whether the Ostrowski data is good or not, BRAD LIED ABOUT WHAT IT SAID to change it’s conclusions from contradicting to agreeing with him.  Of course, James has still failed to address that so far as I can tell at least not directly.  He even said it was a misrepresentation. Ok, so why is it still not an issue that has to be addressed?

Put differently, why does Brad Schoenfeld get to lie about data in a published paper and nobody blinks?  I make an inconsequential error in a 9 year old BLOG post and I’m at fault.

Edema Studies

James also argues that the studies on edema timing aren’t relevant sine it was a new stimulus to the trainees.   The Ogasawara study cited in Brad’s paper was in beginners while the Ahtiainen I cited in my last piece was a long-term training study in strength-trained men.  So James is not only trying to have it both ways but he’s factually wrong.  The study THEY used is in untrained individuals, the study showing edema is in trained individuals.   So this is just more of his endless deflection.  I refer you back to the link above where Lucas Tufur is kicking James around on this topic and you can see James continue to defend what is indefensible.

Oh yeah, James Krieger has now blocked me on FB as well, right after publishing his article.  But it’s always easier to win an argument when the person you’re arguing with or attacking can’t argue back isn’t it?   I’d note again that I left all three of them in my FB group to give them the opportunity to address criticisms and all three voluntarily left.  I did not and would not have blocked or booted them so that I could win by default.  They left by choice.  And then blocked me so that I don’t even have the opportunity to respond to them.

On his blog, James has asserted that he blocked me due to me sending them nasty emails and calling them mean names.  All true.   And?  First and foremost, Layne Norton played the same card.  Said he didn’t have to address my criticisms since I made his wife cry (I wonder if she cried more when he left her for an Australian bikini chick).  Second, this is just disingenuous posturing.  I’ve been emailing Brad, Eric and James for a  couple of weeks calling them guru shitbags and more (my creativity for insults is quite well developed after so many years).

And it wasn’t until the day he posted his last response that he blocked me (or that I noticed, he had been in my group for that previous week).  He waited to do it to ensure I couldn’t respond to him.  So just another, well let’s call it what it is: a lie.  He’s lied in his articles about studies and data, and he’s lying now.

Finally, what is with this industry and otherwise big muscular dudes being such insecure children?  Do words on a screen really hurt them that much?  Or is this just more pathetic guru behavior to avoid my criticisms? Hint: it’s the latter. Seriously, these guys need to eat less tilapia: their skin is too thin.  But I digress.

The Anecdote

Oh yeah, if you look at Jame’s article he put up a picture of one of his clients who does the high volume bicep work as supposed proof of concept.  But the last time I looked, anecdote (i.e. one individual) doesn’t count as science.  It never has an never will.

This is like the cancer quack holding up a SINGLE survivor from their program and ignoring everybody who died.  This is like Layne Norton, who pretends to be the anti-guru, saying he’s got 100’s of emails so science doesn’t matter.  So James, either pretend to be a scientists or don’t.  Don’t play silly buggers where it’s science until it’s not.  Even if it’s bad science in this case which it is.

If I were a different person, I’d do the same and put up a picture of someone who got big using moderate volumes.  But that’s not an argument that has any validity so I won’t.  I’ll leave such nonsense to gurus and pay attention to the scientific facts.  If you want to science, stick with the science.  If you want to use anecdote, that’s fine.  But don’t pretend it’s science.  James wants it both ways, just like he does in most of his discussion above.

The Guru Crew

So add James Krieger to the guru group of Brad Schoenfeld and Eric Helms.  Their actions are no different than endless others before them: Tim Noakes, Dr. Fung, Gary Taubes, Layne Norton, individuals that most reading this piece see as stupid gurus while they give Brad, James and the rest a pass for identical behaviors.

Individuals who just block and ignore criticism rather than address it, usually after endless deflections and obfuscations.   Brad ducked every criticism, Eric deflected it and blindsided me and James used a mix of deflection and obfuscation.  Standard guru operating manual.

If that doesn’t tell you everything you need to know about the conclusions of this study, no amount of in-depth analysis by me or Lucas Tufur or Brian Bucher will help.  They have played nothing but guru games from the get go.  They shouldn’t get a pass when others do not.  And yet here we are.

Others at Play

I am told that others have joined the circle jerk.  Greg Knuckols of course but he writes for MASS with Eric Helms and of course he has to agree.  Also, that pesky seminar circuit he has to keep himself on to make the big bucks.  And all are, of course, saying my criticisms have no weight or that I have no weight.

Nevermind that Brad routinely shared my previous research reviews, Eric thought I was good enough to contribute to my book, James and I have done a webinar together.  All I have done is ask them to address specific criticisms which, for the most part they have not.  So like many before them, they resort to simple ad hominem attacks against me.  Now I don’t even science, now I don’t have any weight.  They didn’t mind when they were making money off of me.  But that’s par for this course isn’t it?

This is every guru in the history of ever.  Food babe, the Snake diet guy, more than I can think to name.  The true guru ignores criticism, deflect, obfuscate, block critics and then attack them in a forum they can’t defend themselves.  Don’t tell yourself it’s anything else.  Don’t let the fact that you like them and think I’m a foul-mouthed asshole color it. Simply ask yourself why this group of individuals is being allowed to engage in behavior that, if it were anybody else, would be criticized and destroyed?  It’s that simple.

Ask yourself why they get a pass.

A Challenge to the Whole Crew

And now my challenge.  All of the individuals involved claim to blend science and practice in terms of their training recommendations.  I have done the same for nearly 25 years (and note that most of these people came up reading MY books to begin with).  I’ve been in the weight room since I was 15, I’ve been involved in the science of performance since college.  It’s 35 years of training and nearly 30 years of being a training and nutrition nerd.  I know that science is good but limited, so is anecdotal evidence.  So you have to see how they fit together.  Sometimes science supports what the bros knew (protein is good), sometimes it contradicts it (meal frequency doesn’t matter).  It’s useful to compare.

But if the crew is going to say that they blend science and practice, well that has an implications which is that new data, such as Brad’s volume data, should be incorporated into their training advice or how they train themselves.   But will it be?  Somehow I doubt it.

I doubt that any are going to move their trainees to the workout that Brad’s data would suggest as optimal.  And that tells you all of what you want to know.  If they believe in this study’s results, they should all adapt their own personal training and that of their trainees to it.  When/if they do not, then they either don’t believe it or are being disingenuous in saying they blend science and practice.  It’s that simple.

They can’t have it both ways.


Ignore the study, ignore the fundamentally flawed methodology, ignore Brad’s lie about the Ostrowski data, ignore the back and forth between the nerds and everything else and just ask yourself this question:

If Brad et. al. think this data is valid, why aren’t they implementing it in their trainees?  Yes, I know, Brad has prattled on about using high volume overreaching cycles with folks.  Huge volumes for a couple of weeks.  Fine, I’ve no issue with that.  But this study was 8 weeks straight of volumes no human, juiced or otherwise, has done with good result.  If they’ve done it at all.

Now, if I saw data that I found applicable, I would implement it into the advice I give.  If I found data I didn’t agree with, I wouldn’t.  I don’t think this study is worth a damn and won’t change my approach to training or my recommendations based on it.  I’m finishing up an analysis of the 7 extant volume papers to present in a week or two when it’s done.  And the gross data supports not only what I’ve always recommended but will continue to recommend.

But I bet that neither will they.  Yet they are defending a piece of data that I can almost assure readers they will not apply.  Why?  Because they know it’s bullshit.  They know it’s not right.  Because if it were, they’d have moved their trainees to that style of training 2 weeks ago.  I even asked them via email when they would be changing their own or their client’s training.


They know this finding doesn’t mean shit.  It also goes against 5 of the 7 total papers on the topic.  One is garbage 5 say moderate volumes beats out high and Brad’s paper is the outlier that’s left.  In science, you build models of the total of the data, not the single study.  Again note that James stated this explicitly, how you can’t draw a conclusion from a single paper, implying that I was doing that.  Except that I wasn’t, Brad et. al. are the ONLY ones drawing strong conclusions and it was yet another deflection by James, attempting to project THEIR behaviors onto me.  I love it when other people make my argument for me.  I will be examining all 7 current studies on the topic in a week or two and you’ll see what falls out of the model.  And it’s not Brad’s conclusions.

But whatever, I’m back into nerd mode.

Do the Workout

So back to the challenge, either to them or anybody who thinks they are more right than I am (which is fine, I never said everybody had to agree with me, I just asked that they address my criticisms honestly which has not been done).

Do the workout.  If you think the results are valid then DO THE WORKOUT.

And because I am a helper, I have drawn one one up based on THEIR findings. Recall that it suggested 30 sets per muscle group per week for upper body and 45 for lower body as providing optimal growth.  This was in individuals with a minimum of 1 year of training and their strength levels were advanced noob at best.  So if you’re not a rank beginner, and believe their data, it applies to you.  One year or more of regular training and at least a 1.2bodyweight squat and 1.1 bodyweight bench and you can DO THE WORKOUT.

But also keep in mind that it only used compound movements (ok, leg extension for quads) and looked at quads, biceps and triceps.  We have no data on pecs or back or delts or glutes or hamstrings.  So those have to exist independently of the other muscle groups, especially for lower body.  And the below is what that implies in terms of a practical workout scheme based around the performance of 30 sets/week for each upper body muscle (all sets to failure) and 45 sets/week for each lower body muscle.

I’ve provided two options below.  The first is a non-split routine training either upper or lower body on each of 3 days/week to get the total volume in.  All sets are on 90 seconds rest and taken to concentric failure. If you get more than 15 reps on any given set, add 3-5%.

Option 1: Non split routine

Mon/Wed/Fri: Lower body
Squat: 5X8-12RM
Leg press: 5X8-12RM
Leg extension: 5X8-12RM
RDL: 5X8-12RM
Lying leg curl: 5X8-12RM
Seated leg curl: 5X8-12RM
Standing calf raise: 5X8-12RM
Leg press calf raies: 5X8-12RM
Seated calf raise: 5X8-12RM

So that’s a 45 set workout for just legs.  If I wanted to get pedantic, I’d suggest another 15 sets for glutes to make Bret Contreras happy*.  I leave that to the individual trainee but that would take it to a 60 set workout three times weekly.  Have fun.

Tue/Thu/Fri: Upper Body
Flat bench: 5X8-12RM
Incline bench: 5X8-12RM
Cable row: 5X8-12RM
Undergrip pulldown: 5X8-12RM
Shoulder press: 5X8-12RM
Lateral raise: 5X8-12RM
Rear delt on pec deck: 5X8-12RM
Face pull: 5X8-12RM
Barbell curl: 5X8-12RM
Preacher or incline DB curl: 5X8-12RM
Close grip bench: 5X8-12RM
Triceps pushdown: 5X8-12RM

Now we might quibble over the above, Brad’s study did 30 sets/week of compound pushing and pulling. But they measured biceps and triceps.  Should we take out the direct arm work or is it 30 sets of compound pushing and pulling and 30 more sets of direct arm work.  I leave it to you to decide and I’ll address the odd way of counting sets (which is fundamentally wrong in my opinion) in a future article.

Split Routine

This is a 6 day/week split routine with each muscle hit once/week.    That means that all 30 sets for upper or 45 for lower muscle groups have to be done on that single day.  Note that even Arnold and his ilk did 20 sets per muscle once a week and not all sets were remotely close to failure.  With drugs.    Brad is suggesting 1.5 times that for upper body and a little over double for legs.  Enjoy.

Monday: Quads
Squat: 15X8-12RM
Leg Press: 15X8-12RM
Leg extension: 15X8-12RM

You could technically pick more movements but this is what they used in their paper so I’m using it too. If you want to do 5 sets of 9 different movements that target quads, go to town.  But it’s 45 sets of 8-12 to positive failure with 90 seconds rest for quads no matter what.

Tuesday: Chest/back
Flat bench press: 5X8-12RM
Flat DB press: 5X8-12RM
Cable crossover/pec deck: 5X8-12RM
Incline bench: 5X8-12RM
Incline DB press: 5X8-12RM
Incline flye: 5X8-12RM
Narrow grip Cable row: 5X8-12RM
DB row: 5X8-12RM
Shrugback: 5X8-12RM
Undergrip lat pulldown: 5X8-12RM
Medium grip lat pulldown: 5X8-12RM
Cable pullover: 5X8-12RM

Wednesday: Calves
Standing calf raise: 15X8-12RM
Leg press calf raise: 15X8-12RM
Seated calf raise: 15X8-12RM

Thursday: Delts
DB overhead press: 5X8-12RM
Barbell overhead press: 5X8-12RM
Hammer overhead press: 5X8-12RM
DB Lateral raise: 5X8-12RM
Cable lateral raise: 5X8-12RM
Machine lateral raise: 5X8-12RM
Pec delt rear delt: 10X8-12RM
DB bent over rear delt: 10X8-12RM
Face pull: 10X8-12RM
I can’t think of more rear delt movements.

Friday: Hamstrings
RDL: 15X8-12RM
Lying leg curl: 15X8-12RM
Seated leg curl: 15X8-12RM

Saturday: Arms
Barbell curl: 5X8-12RM
DB curl: 5X8-12RM
Preacher curl: 5X8-12RM
1-arm preacher curl: 5X8-12RM
Incline DB curl: 5X8-12RM
Cable curl: 5X8-12RM
Close grip bench: 5X8-12RM
Triceps pushdown: 5X8-12RM
1-arm triceps pushdown: 5X8-12RM
Barbell nosebreaker: 5X8-12RM
French press: 5X8-12RM
Cable French press: 5X8-12RM

Again we might quibble over the set count on arms.  Does it count separately or does the compound pushing and pulling get it done? Another topic for another day.

But there ya’ go.  The applied Schoenfeld et. al workout routine.

Believe their data? Think I’m full of shit? Then do the workout.  When you get overuse injuries and overtrain and get tendinitis, you can buy my injury nutrition recovery book.  Regardless, if you think they are in the right it’s simple:


But make up your own decision: If you believe them, do the workout above and report back on your results.

I love to be proven wrong and will always say I was wrong when that happens.

I also love to tell people “I told you so” and have them tell me “You were right.”

Time will tell which will occur.

See you in 8 weeks.

* I mentioned Bret Contreras above and wanted to add this comment.  Bret is listed as like the third author on the paper, right.  But what did he actually do on the paper?  Mind you, this is another issue: in most papers, the contributions of each individual author is commonly listed.  Is Bret even in the same location as Brad?  What did he contribute to the study?  As importantly, why has he remained completely quiet on the issue (so far as I can tell)?  He’s not defending it nor promoting it and one has to wonder why.

A Response to James Krieger

So there’s a war brewing in online fitness land.  About three weeks ago, Brad Schoenfeld et. al. released a paper purporting to show that more volume meant more growth with 30 sets per week for upper body and 45 sets for lower body outperforming lower and more moderate volumes.  To say there has been a shitstorm, much of which is driven by myself, is a bit of an understatement.

I wrote a fairly critical piece about that paper (that had issues, see below) and brought up several other problems with it (including one I will finish this piece with).  My questions at Brad or James went completely unanswered with any number of deflections and obfuscations occurring throughout. Even when others, not me, asked similar questions, they went unanswered or were deflected with the kind of behaviors only the best gurus use.

Then, a few days ago, James Krieger wrote an article explaining why different studies find different results to “address criticisms being leveled at the study by certain people (like Lyle McDonald)”.  Let me note that I’m far from the only person critical of this paper.  But as so typically the case, I’m the only one mentioned by name.  And his article was basically just a tedious attempt to ignore the actual questions that I and others have been asking.

So I’m writing a response directly to and at James (others involved in this will get mentioned as well).

For those who want to read it, go here.  Let me note that I am providing this link because I feel that it is important for individuals to see both sides of the argument (not that his article has anything to do with what I have asked him.  Individual variance within or between studies has jack squat to do with anything I am going to talk about today).  This is called intellectual honesty, something more people in this industry should experiment with.  Because somehow, I doubt James will link to my piece any time soon because that might allow people to see MY side of it or see the issues he is so steadfastly trying to pretend don’t exist.

Now some may notice that my original criticism/research review of the study has been unpublished on this site.  Here’s why: many of my criticisms were shown to my satisfaction to be incorrect.  Let me rephrase that: various individuals showed to me that what I had based my conclusions on were incorrect.  I had originally added an addendum to that article and finally just took it down for that reason.

This is also called intellectual honesty.  I was wrong, I admit I was wrong, I took down my article.

See how that works?

Again, others in the industry should try it sometime: admitting that they are wrong rather than obfuscating, deflecting and guru-ing to hold onto their belief system or avoid criticisms of their work.  Because that is mainly what is going on here.  It’s really all that is going on here.

I accept that Jame’s statistics were done correctly.  I originally said the opposite but I WAS WRONG.  See how it works?  When you’re wrong, be a man and admit it.  Like I said, others should try it sometime.

But let me add, proper statistical analysis of bad data still doesn’t prove anything.  The statistics can be done perfectly but if the data is poor to begin with, any conclusions or interpretations can still be poor.  Especially when the person (often and in this case NOT the statistician) interpreting the study is trying to (apparently) prove something.

However, these are far from the only problems with the study.  Far from it.  And what’s interesting is that every time I have shown these criticisms to James, he simply goes back to the statistics.  The ones I no longer disagree with and haven’t for weeks now.

It’s like when you argue with someone, perhaps a girlfriend, and have conceded a point 30 minutes ago. But they keep harping on it to avoid the points you’re making now.  “Well what you said 30 minutes ago was wrong…”  That’s what James (and others) is doing.  And the only reason he WON’T address these issues (Brad himself has dodged them constantly or used the most amusing of deflections to avoid them), is because he CAN’T. Because if he could have, he would have by now.

I mean, c’mon.  Who doesn’t want to straight up own big meaniehead Lyle McDonald?  If Brad had any defense for what I’m going to describe below, he’d have come in and stomped me flat.  Instead he guru’d out of it by saying he wouldn’t address my questions because I insulted him.

It’s a deflection plain and simple.  Others have done the same telling me to “Learn statistics, idiot.”  while ignoring all of the issues I am about to bring up. But the statistics haven’t been the issue I have had with this paper for several weeks.   I already said they were ok.  Let’s move on.

So in this piece, as a response to James, I am going to once again list my methodological issue with this study along with other issues I have with it.  I will make my point, make a bunch of blathering commentary as usual and that will be it.  I’m not getting into a back and forth, this isn’t the new rap war.  It is however very long. But this is the last thing I’ll say about it.

He made his non-points, I’m making my yet to be addressed points.  He hasn’t addressed them, Brad hasn’t addressed them and they clearly aren’t/can’t address them or they would have.  A few days back, James hilariously hit me with a laundry list of questions in my Facebook group recently but that’s not how this works.  He doesn’t get to ignore my questions and then expect me to answer his.

But he wrote his piece and called me out and I’m responding to it and doing the same.

I don’t expect this to change Brad or James’s stance on this paper.  They’ve got to keep defending and deflecting and can’t back down now.   I’ve seen it before and I won’t be shocked when they either ignore or try to dismiss or obfuscate everything I’ve said in this piece and that’s not my goal.

This is to get all of these issues out there to let everyone else make their own decision about the situation.   About the paper itself AND the individuals involved.

A Brief Sidestep: The Cochrane Guidelines

Doing scientific experiments is not easy and nobody would claim that it was.  I’d note that I was involved in a lab in college doing research by Drs. Whipp and Ward.  Specifically they were examining the VO2 max response in carotid body resected patients.  I was at UCLA and the hospital had the largest and possibly only population of subjects.  I was involved in exercise testing, setting up the breath-by-breath monitors along with data analysis after the fact.  So you can spare me the “Does Lyle even science?” bullshit.  I also helped with data analysis although I had NOTHING  to do with the statistics.

But I saw first hand that doing research is not easy.  In the past, a lot of studies were very poorly done in a methodological sense.  It still happens today, I even recently saw a paper studying women that didn’t control for menstrual cycle.  This is indefensible in 2018. But so are a lot of other research practices: they are indefensible in the modern era.

Because there are actually guidelines in existence regarding good and bad methodology for studies or rather they exist to review if a given study is well done or suffers from various types of bias.  Discussion of all of this can all be found in  Cochrane Handbook for Systematic Reviews of Interventions which represents the current status of what is considered good and bad study design.  I’m mentioning it here as I will come back to it below somewhat repeatedly (I do love to beat a dead horse).

The Study’s Methodology

For completeness and since I took my original review down, first let me look at what the study did.  First it recruited 45 resistance-trained men who had an average of 4.4 +- 3.9 years of resistance training experience.  It was also stated that the men needed to have been lifting for a minimum of 3 times per week for at least one year which is consistent with those standard deviations but let’s not pretend that they were well trained if 1 year minimum was their required training experience.   One year is an advanced newbie at best.

The men were randomized to the same workout program which was done three times weekly and did 1, 3 or 5 sets per exercise for 8 weeks.    The exercises done were flat bench press, military press, wide grip pulldown, seated cable row, barbell back squat, machine leg press and unilateral leg extension.  This yielded training volumes of 6, 18 and 30 upper body pushing or upper body pulling sets and 9, 27 and 45 lower body (ok quadriceps) sets per week.

Of the original 45 subjects eleven dropped out leaving a total of 34 subjects which, as the authors admit, made the study slightly statistically underpowered (they had calculated that they needed 36 subjects).  Basically it didn’t have enough people to draw strong conclusions (my note: this didn’t stop Brad and other from doing just that online and it’s funny that James Krieger is harping on the statistics when the study itself was underpowered statistically).  The men had an average weight of 82.5+-13.8 kg although, oddly, individual data for each group was not presented as would normally be the case.

Speaking to their training status, the average 1RM squat was 104.5+-14.2 kg for the 1 set group, 114.9+-26.0 kg for the 3 set group and 106+-24kg for the 5 set group.  Let’s call it 110kg for everyone and, divided by bodyweight and that is a 1.3 bodyweight squat 1RM. That’s 242 lbs at 181 lbs which by standards online is somewhere between novice and intermediate. For bench the numbers were 93.6+-16.1, 96.4+-21.2 and 91.1+-20.9 kg so call it 93 kg average and that’s a 1.1 bodyweight bench.     By strength standards online that puts them between novice and intermediate.   Like I said, this is hardly well trained.  Mind you, if they didn’t do a lot of low repetition training this would be skewed away from a high 1RM.   Still….

Their diet was uncontrolled and relied on food records which are notoriously terrible.  As well, the reported dietary intakes were more or less impossible as they indicated that each group was in a severe calorie deficit and severely undereating protein. But I don’t think that data means jack squat.  It just reflects that diet records are terrible and unrealiable.  But few studies can afford to control diet, especially over the length of a study like this so this is just a reality of the research.

A number of different things were measured throughout the study including maximum strength improvements and muscular endurance along with changes in muscle thickness.    Looking at the latter, the men underwent Ultrasound measurement for biceps, triceps, vastus lateralis and rectus femoris (the latter two being combined into a single measurement and taken as indicative of quadriceps growth which seems a little odd to begin with).  The Ultrasound measurements were done at the beginning of the study and again 48-72 hours after the last workout “In an effort to ensure that swelling in the muscles from training did not obscure results….”   I will come back to this.

The study applied several different statistical methods to the data although, once again, I’m not going to get into that in detail since it’s not my area of expertise.  In terms of results, the study, at least by statistical analysis suggested that there was a dose response from lower to higher volumes for 3 of the 4 measured muscles.  Or that was the paper’s conclusion.

Triceps shows no significant difference between groups which seems at odds with the rest of the study.  Most likely this is due to the study being slightly underpowered statistically.    The absolute changes in triceps thickness were 0.6 mm, 1.4 mm and 2.6 mm which would seem real-world significant, but the 2 mm difference didn’t reach statistical significance.  The other data is actually similar and you can read another examination of this study here which shows the full changes in each muscle.  Note also the large variance in results and the large standard deviations.

For the other muscles measured (triceps, RF and VL), what is called a frequentist analysis (one of the statistical methods used) showed that there was insufficient evidence to conclude that the difference in growth for the 1 and 3 set groups was statistically significant.  Basically the low- and moderate- volume groups got statistically identical growth.  This seems difficult to take at face value but probably represents the fact that the paper was underpowered.

Only the highest volume group (30 and 45 sets respectively, remember) showed a statistically significant difference from the 1 set group, based on pairwise comparisons. The Bayesian statistics (which I will not pretend to understand but will let a statistician friend address below) described this as “…weak evidence in favor of 3 sets compared to 1 set (BFsub10=1.42) and 5 sets compared to 3 sets (BFsub10= 2.25).”  This means exactly what it sounds like, by a different statistical method, 3 sets was weakly better than 1, and 5 sets was weakly better than 3.

The Strength Gains

I’d note that all three groups made the identical improvements in 1RM squat and bench strength (nobody cares about muscular endurance).  Do you believe that?  That the group doing a mere 13 minute workout three times per week got the identical strength gains as a group doing 5 times that much?    Man, powerlifters would be THRILLED.  This goes against literally every study on trained individuals and every meta-analysis ever done on the topic.  It fails the reality check so hard it hurts.  Especially given the KNOWN relationship between muscle size and strength.

If one group truly gained more muscle than the other, how can they not have gained more strength?  Now, ok, the rest intervals were a paltry 90 seconds (chosen out of convenience despite being known to be inadequate) and I’d question who in the hell can squat 8-12RM for 5 sets with 90 second rest to begin with.  They also didn’t do any low rep work.  Still, this result is at odds with basically all previous research, reality, and common sense.  A bigger muscle generates more force and the lack of a difference in strength gains suggests/implies that there were not actually differences in muscular gains.

The Conclusion

Regardless, the paper’s rather strongly worded conclusion was this:

The present study shows that marked increases in strength can be attained by resistance-
trained individuals with just three, 13-minute sessions per week, and that gains are similar to that achieved with a substantially greater time commitment when training in a moderate loading range (8-12 repetitions per set). This finding has important implications for those who are time-pressed, allowing the ability to get stronger in an efficient manner, and may help to promote greater exercise adherence in the general public.  Alternatively, we show that increases in muscle hypertrophy follow a dose-response relationship, with increasingly greater gains achieved with higher training volumes. Thus, those seeking to maximize muscular growth need to allot a greater amount of weekly time to achieve this goal.

They don’t say that it may or that it might.  They state that it does show this relationship.  Shows.  As in the case is closed.  Before looking at the rest of my issue, let’s look at that strongly worded claim first and see if it is warranted.

A Quick Look at the Statistics Above

I’ve made it abundantly clear that I am no statistician and my earlier mistake was talking outside of my area of knowledge which won’t happen again.  But I happen to know a very good statistician (he has asked both James and Brad pointed questions about the paper with about the same non-response I have gotten).  I asked him to interpret the above and this is what he said:

In order to support their unqualified claim of “increasingly greater gains achieved with higher training volumes” instead of, for example, leaving open the possibility of a plateau between 3SET and 5SET, they would need to have substantially good evidence. In the Frequentist, or classical, analysis, none of the three metrics with pairwise comparisons demonstrated a statistically significant difference between 3 SET and 5 SET groups, and triceps didn’t even make it to pairwise comparisons. If the analysis stopped at the classical analysis as reported in most research, there would be zero statistical evidence for 5 SET over 3 SET.

In the Bayesian analysis, the evidence from the triceps results “favored the null” (weakly), meaning no difference in groups, and the other three metrics favored 5 SET over 3 SET, also being “weak evidence.” So, what does “weak evidence” really mean?  Well, Harold Jeffreys originally developed some guidelines (not to be used as hard thresholds), and he described Bayes Factors between 1 and 3 as:

“not worth more than a bare mention”

MY NOTE: the Bayes Factors were 1.42 for 3 vs. 1 set and 2.25 for 5 vs. 3 sets.

Others have shortened this description (weak, anecdotal), but the point remains the same. Trying to use this weak level of evidence to support an unqualified conclusion is a hell of a lot more than “a bare mention.”

Basically, the people who developed these statistical methods take a weak difference to be not worth more than a bare mention.  And certainly not worth the strong conclusion made in Brad’s paper that a dose-response relationship in favor of the highest volume of training existed.

So with that taken care of, let’s look at other aspects of this study that I think are problematic, absolutely NONE of which have to do with the statistics.

So let’s stop bringing that up as a deflection, shall we James?

A Lack of Body Composition of Even Weight Data

As I mentioned above, the study provided initial anthropometrics in terms of height, weight and age.  As I noted above, this data was not presented, at least in the pre-publication paper for each individual group (i.e. weight, height and age for the 1-set vs the 3-set or 5-set group) which is atypical.  Usually you show individual values for each group or at least mention if they were not different between groups.

Far more oddly, post-study weight was not measured, or at least not provided in the results.  This is a bizarre oversight and I can’t recall any study in recent times that did not present pre- and post-study weight along with showing how it changed for the different groups.  I mean, put ’em on a scale before and after, boom, done.  Takes about 30 seconds before you do the Ultrasound.   But they only did it before apparently.  No reason is given for this.

Nor was any measure of body composition done.  Now, this could have been purely a money issue or a time issue but most studies will put the subjects into a DEXA machine to get body comp so they can examine changes in lean body mass/fat free mass (LBM/FFM) and body fat percentage.  Let me make it clear that changes in LBM/FFM are NOT a good indicator of muscle growth.  We all know that you can carb-load or carb-deplete someone or just have a big poo and that changes LBM.  However, it does provide some useful information on other aspects of the study.

For example, it can tell us indirectly about diet.  So if one group gains weight and fat and another loses weight or loses fat, you can infer that the first group was in a surplus and the second wasn’t.  In the context of a muscle growth study, and given the absolutely terrible accuracy of diet reports (and the literally impossible food records of the subjects in this study), this is good data to have.  If one group was eating enough and the other wasn’t that can color the results pretty enormously.   But this study did not have it.  Again, might have been technical, time or financial and this isn’t a deal breaker per se.

The total lack of post-study bodyweight data is however baffling (a search for ‘weight’ on the PDF turns up 4 hits and only one deals with bodyweight which was the initial numbers).  Even if the only data it provides was that the 1 set group lost weight and the 5 set group gained it, that would give us SOME indication of what was going on diet wise.   Given the impossible to believe diet reporting records, this would have been good information to have.  Nope, nada.  Again, an absolutely baffling omission given how unbearably simple it is to measure and report.

A Primer on Ultrasound

While DEXA or other body composition measurements were often used in earlier studies, the current optimal method to measure changes in muscle thickness is Ultrasound which this study did use.  It’s a much more direct measurement of thickness, bouncing sound waves off of tissue and providing a visual representation, it’s how they check on fetal development for example.  The thing is, it’s not perfect.  Yes, there are specific guidelines to follow for anatomical places to put it and that’s all fine and folks get trained on it as well but Ultrasound has a bigger issue which is that the interpretation is subjective because the image being obtained is something like this:

Muscle Thickness Ultrasound

And as anyone who does Ultrasound in a hospital will tell you, there is a huge subjective component and two Ultrasound techs might come up with completely different results (which is why you have two people do it, to compare results). Some of this improves with experience or they wouldn’t use it in a hospital setting but often you are getting sort of a vague image that you have to interpret.  This introduces the potential for bias.

That is, say the ultrasound operator knows what they want to find.  They might interpret the image differently than someone who is just looking without that bias.  So an alcoholic goes to a hospital and gets and Ultrasound on their liver.  If the tech expects to find a damaged liver, it can color what they “see” on the Ultrasound monitor. If they are just told to do the Ultrasound, their interpretation may be less biased.

I’m not saying this happened in this study that the Ultrasound tech was necessarily looking for a given result, well I’m not saying it yet anyhow.  I am saying that there is the potential for it to occur due to the subjective nature of Ultrasound to begin with.  This problem can be avoided in numerous ways, the simplest being to have two people measure the same subject so they can see if their results match up.  But this becomes a real problem when combined with the next issue.

The Study was Unblinded

So let me explain what blinding a study means and why it’s important.  To blind a study means that someone, either the subjects, the researchers or both do not know who is in which group.   If you only blind one group it’s called single blind, if you blind both it’s called double blind (if you poke everybody in the eye, that’s called triple blind and yes that’s a joke).  You can actually blind more folks than that.  The folks doing the statistics might be blinded, for example.  Or any number of people involved in the study. But why does this matter?

Blinding the Subjects

Imagine a drug study that is studying drug X and a placebo and you want to measure some outcome.  You make the pills look identical (so folks can’t compare or know which is which) and then randomize people to one group or another and keep a sealed record of it.  So someone knows that subject 1 got the drug, subject 2 got placebo, subject 3 got placebo or whatever.  You just give everybody a number and someone has a super-secret handbook that says who is who.

But the subjects ideally don’t know if they got the drug or the placebo.  This is critical because if the person knows what the drug is or is supposed to do, that can color what happens.  We’ve all seen that episode of a TV show where kids get drunk on non-alcoholic drinks.  Or pretend to be.  Their expectations make it happen.  If  give you an aspirin and tells you it’s the most potent German steroid ever, I bet you’ll be stronger in the gym. Pure expectation.

If you’re doing a study where the drug is supposed to improve health, if one group knows they got the drug and the other a placebo, they might adopt (even subconsciously) other health promoting behaviors that skew the results.  If someone enters a drug trial hoping to get the newest drug for a given situation and knows that they didn’t get the drug, they not take the pill or care about the results.  You blind the subjects to the drug to prevent that from happening and hopefully have everyone take the pill without expectations or change in their behaviors so that you’re measuring the effect of the drug itself.

Blinding the Researchers

But why would you blind the researchers?  For the same reason: expectations and bias.  Let’s say I’m overseeing a study to see if caffeine improves performance over placebo during an exercise test.  First we bring the subjects in and exercise test them.  Usually some amount of encouragement goes on here to get a truly maximum effort, you yell and holler at them (we did this with the carotid body resected patients I helped to study at UCLA) until they give up.  Now let’s say that they are blinded to whether they got caffeine or the placebo.  They don’t know what they got.  So they have no expectations to color their effort and hopefully everyone goes all out.

But let’s say I know who got which compound.  That is, they are blinded but I’m not.  Let’s also say that I already believe that caffeine improves performance (I do and it does).   Now we re-exercise test them.  I’m in the room and I know who got which compound.  Now my pre-existing belief or expectation can subconsciously or consciously make me act differently.  Perhaps I give the caffeine group a little extra encouragement or the placebo group a little less.  I might not even notice I’m doing it but we are just humans after all and we’re all subject to this.   If/when this happens, I may simply end up proving what I already believe to be true because my motivation may get them to improve by 3%. But it will look like the caffeine.  I probably won’t even know I’m doing it and I’m not saying it will be deliberate.  But humans are great at deluding themselves into thinking we are free of bias and purely objective.

In contrast, if I don’t know who got what (I am blinded to who got what) everybody gets the same treatment during the exercise test and I can reasonably assume that the data is good.  Alternately, if I’m the head researcher, maybe I stay out of the exercise room and have someone else who is blinded administer the test.  They might not even know what the study is about so they just scream and holler equally at everyone.   They can’t have expectations if they don’t even know what we’re studying or if they don’t know who got what compound.  We get the data, break open the chart showing who got what and boom, we have good data.  Caffeine did this, placebo did that.  And it was double blinded so nobody had an expectations.  Not the subjects, not the researchers.

Blinding Everybody

We might additionally blind the people doing data analysis so they don’t know which data came from which individual.  Often data in research studies is messy, it was in the Vo2 max data was collected on the carotid body resected patients when I was at UCLA.  If the data analysts aren’t blinded to who is in which group, THEIR expectations might colors how they analyze the data.  They know what the data is “supposed to look like” and that can color their judgement.

Get it?  Blinding is critical to avoid a risk of bias in studies, bias of the researcher, subjects, data analysts, etc. coloring how the data is gathered or examined.  I am not saying that not blinding makes any of the above happen.  I am saying that it can.  And that blinding reduces that risk.

Blinding the Reviewers

Hell, once the paper is done and goes into peer review (where experts in a given field are allowed to critique the paper and even disallow it until problems are fixed), you could (and probably should) blind the reviewers from the researchers.  That is, the reviewers shouldn’t know who wrote the paper.

To understand this let me explain peer review.  Basically research papers, at least in reputable journals, are put to the test by a variety of peer reviewers, presumably experts in that area, to look at the paper critically and identify problems or areas that should be fixed.

For example, months back Brad asked me if I’d be a peer reviewer on a paper about PCOS, elevated testosterone and training.  Keep this factoid in mind if Brad attempts to dismiss my experience or knowledge in the field since he contacted me specifically to be involved in this.  Given my work on The Women’s Book, I was a logical choice.  And in this case, I saw a problem with the paper (having to do with the levels of testosterone in the subjects) with it and reported it to the researchers.  The researchers responded and since their response (with research provided) was sufficient to allay my doubts and I gave my approval the paper.  Other reviewers do the same and the paper has to be done or written to the approval of all reviewers to pass peer review and get accepted and published.

But in an ideal world, the reviewers probably shouldn’t know who wrote the paper as it could bias them either positively or negatively.  So consider a researcher who has established themselves as a name, publishing often (and presumably well).  Any work they did might be looked at less critically than someone who was either unknown or who the peer reviewer actively disliked for some reason.

Ideally, the paper’s author should probably be blinded from the peer reviewers, that is they should be anonymous.  There is an immense amount of politics in research and if an author found out that a given individual was especially critical of their paper, they might retaliate in some way.

This apparently does occur from time to time but I do not know how prevalent is.  Honestly, it should probably be standard operating practice.  A previously well published and meticulous researcher can still produce poor work and their work should be assessed on the work rather than their name.  That requires blinding of the reviewers .  But this is sort of a side point for completeness about the idea that various types of blinding are critical to reduce bias in all manners of ways.  I will only focus on blinding during the study itself going forwards.

Back to Cochrane

And I spent all this time explaining how blinding works and why it is necessary because a lack of proper blinding during a study is listed in the Cochrane guidelines as providing a high risk of bias.  This is a partial image from this link (note, clicking will download a PDF) and shows that a lack of blinding of participants and research personnel gives a high risk of bias according to Cochrane standards.

Blinding Bias Cochrane

Blinding the Subjects in Brad’s Study

Now in a training study, you can’t completely blind the subjects to what is going on.  Clearly everyone knows whether they are doing 1,3 or 5 sets per exercise or doing continuous or interval exercise.  That’s just a reality that you can’t get around (you can in drug trials by making both pills identical).  At best you might partially blind them.

You might not tell them the purpose of the study (i.e. to measure muscle growth) or you could tell them that you were studying something else.  If they know it’s a study on muscle growth, maybe they change their diet, for example and their diet or supplement regimen might change.  Studies do this fairly often, deceive the subjects about the purpose of the study to try to eliminate their own expectations or subconscious behavior changes.

Alternately, you might not tell them that there are two other groups doing different volumes within the same study. This could avoid a situation where someone in the 1-set group simply didn’t work as hard since they weren’t in the “real training” group doing 5-sets or something.  Given that Brad’s training studies typically have the workouts overseen by trained individuals, this is unlikely to have occurred.  But maybe they only believe in high-volume training and don’t pay as much attention to their diet.  I am not saying this happened.  I am saying that it CAN happen when you don’t blind a study to the best of your ability.

Mind you, it’s a bigger pain in the ass to do this.  You’d need to meet with each group individually, give them different instructions and train them at different times in the gym which adds another variable.  But it could be done with a little bit of work.  It wasn’t but it could have been.

Blinding the Research Personnell in Brad’s Study

But that doesn’t mean you can’t blind the researchers in this case. And, quite in fact, you should for all of the reasons I listed above.  And this was not done in Brad’s study (a PDF search for the word “blind” or “blinded” turns up zero hits).   It’s possible that the study was blinded and it wasn’t mentioned in the methods but this would be utterly atypical.  It turns out that the study wasn’t blinded and Brad admitted online that it wasn’t.

Basically all of the researchers, including the person doing the Ultrasound were not blinded to which subject was in which group.  They knew that this subject was in the 1 set group and this other subject was in the 5 set group beforehand.  If the Ultrasound tech had a pre-conceived belief about what results they expected (or ahem, wanted), this introduces a high risk of bias according to Cochrane guidelines.  They might measure or interpret the 1-set group a little bit differently than the 5-set group.  When you’re dealing with a subjective method this is a real issue.

Am I saying this caused a bias in the measurements?  Well, not yet.  But a lack of blinding, a shown in what represents the highest tier of research analysis, says that it introduces a high risk of bias.   Simply, blinding a study reduces that risk and is what good scientists do.  Could bias still occur?  Sure, probably. But that’s a non-argument, a hand-wave that could be leveled at any study you didn’t like.  Not blinding a study gives a high risk of bias, blinding it does not.  A good scientist trying to gather good data would blind the study to the greatest degree possible to reduce the risk of bias to the greatest degree possible.

And Brad and the others did not do this leading to a high risk of bias.

Adding to this is the following issue.

Who Did the Ultrasound?

So that nobody can accuse me of misrepresenting anything, here is a screenshot from the pre-publication text regarding the Ultrasound measurement and who it was performed by.

Ultrasound Technician Screen Shot

It states clearly that the lead researcher did the Ultrasound.  And who was the lead researcher?

Brad Schoenfeld.

He and he alone did Ultrasound on every subject before and after the study.

Let me say that again, louder.

Not only was the Ultrasound done by exactly one technician (meaning that results couldn’t be compared between two different people for accuracy) it was done by Brad himself.

Who, due to the unblinded nature of the study, KNEW who was in which training group.  When he did the measurements, he knew when he was measuring someone from the 1-set, 3-set or 5-set group.  Am I saying this biased him?  Well, not yet.  I’m saying that having a single researcher, who is the lead researcher doing the measurements in an unblinded situation yields a staggeringly high risk of bias.  Especially if that lead researcher has a pre-conceived believe about the relationship of training volume and muscle growth.

All Brad had to do was follow the Cochrane guideline, a guideline that all good studies try to implement, and that would have eliminated even the possibility of bias.  And he didn’t.   You can’t claim he didn’t know, he’s far too experienced a researcher to not know that proper blinding reduces the risk of bias.  So I have to conclude that it was a very deliberate choice.  There’s no other logical reason not to have blinded it at the minimum and had other blinded techs do the measurement at maximum.

But it gets worse.

Responding to Online Criticism

When Brad was asked about this online, why the study was unblinded and he did the measurement himself and didn’t that potentially color the results, he basically said “Oh, it’s fine, you can trust me.”   First off, this confirms that the study wasn’t blinded to begin with or he would have said “Oh, no it was blinded.”  The study wasn’t blinded by his own admission and he knew who was in which group.  By the highest standards of research critique, this introduces a high risk of bias.

And he says that’s fine because you can trust him which, well…really?  Because I’ve looked at that chart from Cochrane a bunch of times and nowhere does it say that a lack of blinding is a high risk of bias UNLESS YOU’RE BRAD SCHOENFELD WHO CAN E TRUSTED.

You can read it yourself above and tell me if I missed it.

But this raises important questions about science and the scientific method:

Is Brad the only trustworthy researcher in the world?  Why can’t we just trust everybody else?  Why blind studies at all if it’s as simple as trusting the guy doing the research?    Why have guidelines of any sort to begin with?  Why bother with Cochrane?   Researchers never try to push an agenda or publish false data.  You know, like Andrew Wakefield who did that fake vaccine and autism study that is still causing problems now.

Nah, Brad is different.  Brad is above the scientific method.  You can trust him.  He said so.

Brad further tried to argue that you can trust him because he disproves his hypotheses all the time.   I’m not going to bother trying to explain what this means since it’s a non-argument, just another obfuscation from someone who got caught out.

Because it still wouldn’t explain why Brad thinks he is above Cochrane guidelines, literally THE standards for scientific of inquiry and study review.  Basically he’s saying that he and he alone doesn’t have to follow scientific standards, that he is above the law that all other good scientists follow.  It’s a shame that all of those other scientists don’t live up to his standards.  Oh wait, I have that backwards.  He doesn’t live up to theirs because they know to blind their studies and he thinks he doesn’t have to.

So am I accusing Brad of bias?  Not yet.  I’m saying that by the standards of scientific inquiry, a lack of blinding raises a high risk of bias.  If someone else had done the Ultrasound unblinded, it would have been bad enough.  Brad doing it himself do it is worse.  Brad’s attempt to defend not blinding his study by saying you can trust him is just a joke.  But not the ha-ha kind.

But that’s not all and now it gets a little technical.

The Time Point of the Ultrasound Measurement

As I stated above, the post-study Ultrasound (done by Brad who knew which subjects were in which group) was done at 48-72 hours after the last workout.   It’s unclear if this was done for practical or other reasons (i.e. insufficient time in a single day) but it introduces another potential variable to the study.

So why not get two techs and do it all on the same day?  And maybe blind them to who was in which group.  Ideally you get both to measure all subjects and compare the two values and now you have data that is not at a high risk of bias.  Because measuring subjects on two different days after the final workout introduces another set of problems and another variable.  Because now we have to worry about whether or not the time of measurement might be impacting the measurement itself.

As I stated above, the paper choose to wait 48 to 72 hours after the last workout to do Ultrasound in an attempt to ensure that edema and swelling from the workout had dissipated (said edema showing up on the Ultrasound as growth).  In support of this, they cite the following paper (although they got the reference number wrong).

Ogasawara R et. al. Time course for arm and chest muscle thickness changes following bench press training. Interv Med Appl Sci. 2012 Dec; 4(4): 217–220.

This is an interesting little paper which had trainees do 3 sets of nothing but bench press 3 times per week for 6 months and measured growth in the pecs and triceps by Ultrasound (this makes me wonder why pecs aren’t measured more often since it can clearly be done and would give more information than doing compound chest work and measuring triceps in isolation and inferring anything about training volumes).

In the discussion from this paper it is stated that:

“Measurements were taken a week prior to the start of training, before the training session on every Mon- day, and 3 days after the final training session. Pilot data from our laboratory suggest that the acute increase in MTH (~12%) following bench press returns to pre- exercise levels within 24 h and is maintained for up to 48 h after the session. This suggests that the measured MTH is unaffected by the exercise-induced acute in-flammatory response although it is acknowledged that is an indirect marker of muscle damage. The test–retest re- liability for this method was less than 1% for the biceps and triceps [10] “

So this seems to support the claim that edema is gone at 48 hours but there is a problem.  This study only had 3 sets for bench in the final workout and there is exactly zero indication that the same would hold for higher volumes than this.  I’m not saying they will or they won’t, there’s not much data.   Regardless, the Ogasawara study can’t be cited or used automatically to assume that a higher volume doesn’t generate swelling that lasts longer.  Of some interest in this regard, in his own textbook, Brad states the following:

If you didn’t click the picture, it says “Although swelling is diminished over time with regimented exercise via the repeated bout effect, substantial edema can persist even in well-trained subjects for at least 48 hours post-workout.”  The phrase “at least” means it might last longer but we know for a fact that it’s still present at 48 hours after an intense workout.  IN TRAINED INDIVIDUALS.  But it was certainly present at that time point.  This means that Brad measured (in an unblinded fashion by himself) muscle thickness at a time point that, per his own textbook, edema might still be present.  Said edema skewing Ultrasound values upwards artificially.  Hmm….

Other Edema Data

There is other data of relevance here and it’s interesting that it wasn’t cited.  In a 1994 study, Kazunori and Clarkson subjected beginners to 24 eccentric contractions of the biceps (this is a heavy load to be sure and I’m not saying it’s comparable to relatively trained individuals not doing pure eccentrics) and then measured muscular swelling at 1, 3, 6,10, 23, 31 and 58 days afterwards.   Perhaps shockingly, they found that muscle swelling was the lowest (equal to or lower than post workout) the day after training and only started to increase again at day 2 through 5.  I’ve presented their data below.

Nosaka Biceps CircumferenceThis actually suggests that the BEST time to measure muscle thickness would be the day immediately after the last workout and that waiting longer than that allows inflammation and swelling to increase, continuing to increase until day 5.  Of course this was in beginners and there’s no way to know if it would apply to trained individuals.  Also pure eccentric work is not the same as regular training so this is no more directly comparable to Brad’s study than the paper he cited.

Speaking to this, Ahtiainen et. al. exposed trained men to 9 heavy sets of leg training (5X10RM leg press and 4X10RM squats) while measuring swelling and thickness in the Vastus Lateralis at various time points.  It found that swelling had increased at 24 hours but was still clearly present at 48 hours (the swelling caused and increase in apparent muscle thickness of the vastus lateralis of +2 mm at both time points).  That is with only 9 sets of heavy legs which can be contrasted to Brad’s highest volume group at 15 sets and it’s clear that edema is still present at the 48 hour mark.

Ahtiainen Swelling Data

Therefore it’s pretty safe to assume that doing Ultrasound at 48 hours was not only LIKELY to have been but absolutely WAS impacted by edema and swelling in the tissue.  Unfortunately, none of the above is dose response so we still don’t know if the amount or time course for edema varies for different volumes.  The bench press study was only 3 sets, the eccentric study might not be applicable and the above study was 9 sets.   Will 12 sets cause more edema?  What about 15?  Is there an accumulation of edema with multiple workouts per week for the same muscle group?  We don’t know.

Now, if the increase in edema is identical irrespective of volume, it’s at least a consistent error and would have applied to all training groups. Mind you, this seems unlikely, that 3 sets and 15 sets for quads would generate identical levels of swelling.  But we just don’t know.

Regardless, 48-72 hours is the wrong time to try to measure real-world changes in muscle thickness (and Brad’s own textbook supports that). Edema is known to be present and you can’t claim that you waited 48-72 hours to let edema dissipate when it is clearly still present and cite a study that isn’t really relevant due to the low volume of training used.  And ignore a different study in your own textbook that says that edema is still present at 48 hours.

Ultimately, even with the limited data there is absolutely no evidence in literature to conclude that swelling would be absent or insignificant at 48 h for all but the lowest volume of training.  Measuring any group with Ultrasound at this time point will be impacted although we still don’t know if volume has any effect on how much swelling is present.  

There is another point worth noting which is that in Ahtiainen et. al. study the magnigude of change in muscle thickness due to swelling is similar to the overall magnitude of change seen in Brad’s study to begin with.   Given that edema was likely present, it’s impossible to conclude that there was actual muscle growth in any of Brad’s groups.   Or that it was different in any case.  Measuring at a time when edema is clearly still present makes it impossible to know for sure.  But it makes the data questionable.

But you say, some people were measured at 72 hours.   Well as the Kazunori and Clarkson study shows, swelling actually keeps going up from 48 to 72 hours, at least in beginners (sadly the second study didn’t measure past 48 hours which would have been fantastic data to have).   So in either case, irrespective of the groups involved, doing Ultrasound at these time points makes the data potentially very unreliable.  All groups were measured when edema was still present and it’s possible that it was still increasing at 72 hours.  It’s difficult to understand why this would have been done to begin with given the (admittedly) limited literature on the topic.

I’ll reiterate that it’s completely unclear how volume impacts on this.  We certainly can’t assume that the amount or time course of the swelling is the same for higher and lower volumes of training.  I’m not saying that it is or it isn’t.  I’m saying that  WE DON’T KNOW.  We certainly can’t assume that a study using a low volume of training necessarily applies to much higher volumes.  In Brad’s study The 1-set group did 2 total pushing sets, the 5-set group 10 sets. Think about the difference in pump you get from different volumes.   It’s no stretch to think that inflammation and swelling is still present.  Again, I’m not saying it is or it isn’t.  We simply don’t know. It’s no more accurate for me to say that it was than it would be for Brad to say it wasn’t.  We don’t know.  But until we know, any measurements taken at these time points are suspect.

The Huan et. al. Study

I’d note in this vein a recent study by Huan et. al. that came out about 3 days before Brad’s paper showing that, above about 20 sets per week, there was a huge increase in extracellular water compared to the lower volumes.  This threw off the supposed LBM gains.  But it is suggesting that, above a certain volume, the body starts to retain water in tissues and this doesn’t occur at lower volumes.   I don’t know if that would impact the Ultrasound measurements or not (hi Andrew Vigotsky) but it does raise another consideration here.

Only the highest volume of upper body work crossed the 20 set threshold in Brad’s study although both the moderate and high volumes for lower body crossed it.  Without measuring ECW or correcting for it, there’s another potential problem with comparing set volumes in this fashion as this might have thrown off the values further.

If the higher volume group got more water retention/swelling than the lower volume group, that colors the supposedly greater increase in muscle size.    Again, I’m NOT saying this happened. I  am saying that we need to find out how much of supposed changes are due to factors other than changes in muscle thickness before any more research is done or we draw conclusions that are measuring at a time point where edema is a known issue.

Do a Pilot Study

Honestly that is data that should be gathered before another study using Ultrasound is done.    Find out exactly when it’s safe to re-measure muscle thickness after the last workout to get an accurate reading and do it for different volumes of training.  Test a bunch of people on 1 set per muscle, 3, 5 10 and measure them every day for a week (or until the edema dissipates) to determine the time course of the swelling and when it is gone.  Because if it’s longer than 48 hours a lot of studies have a lot of potential problems.

This is what pilot studies are for, to examine issues of methodology for a research project BEFORE you do that project to make sure what you’re doing is valid.  Citing potentially unapplicable studies is not the sign of a detail oriented researcher.  Doing measurements at a time where it’s clear that there IS still swelling is methodologically unsound.  Doing it at two different time points, when it’s totally unclear how much swelling does or does not change is worse as it introduces another variable into the overall equation. As above, it might have been a purely practical issue.  But it is an issue.

But there is yet another issue.  The paper gives exactly no indication of who got measured at what time point.  Ideally this would have been randomized (as the subjects initially were to each group) but, if it was, this was not stated in the paper.  Given that the randomization of the subjects was mentioned explicitly, I have to infer that randomization was not done for the post-study Ultrasound.  I may be wrong about this and someone can correct me if I am.  This is simply my inference based on how the paper was written.

Why is this important?  Well, what if edema goes down at 72 hours, especially for lower volumes.  But doesn’t at 48 hours.  Or that at 48 hours the highest volume is still swollen but the lower volume isn’t.  That alone causes a problem because who got measured when changes the reliability of the results.

But to this we add that Brad knew who was in which group since it was unblinded.  What if, by accident or design, the higher set groups got measured at the 48 hour mark (when edema is KNOWN to still persist) and the lower volume groups got measured at 72 (when it might or might not for lower volumes)?  Then you get a systematic error in favor of who got measured at 48 hours and they will get larger apparent growth due to the edema.

I’m not saying this happened.  I am (repeatedly) saying that when you don’t blind a study and don’t randomize who gets measured when, the risk of bias is enormous.   And saying “You can trust me” doesn’t make that bias go away, it makes you question the person who said it and why they think they are above current scientific gold standards.

You blind for a reason.  You randomize for a reason.  You don’t measure on two different times points when you don’t know what edema is like.  Good scientists pay attention to these factors.

An Easily Solvable Problem that Wasn’t

The entire issue could have been dealt with easily.  First, get two Ultrasound techs, neither of whom are Brad and blind them to who is in which group.   If possible, have them measure everybody on the same day, eliminating the variable of 48 vs. 72 hours.  Ideally have them both measure all subjects for accuracy for inter-tester reliability.

If that’s not possible, at least randomize the subjects to the 48 or 72 hour time point so that there is a spread of who gets measured when.  At the very least, if Brad is doing one set of measurements, get a second tech to do a repeat measurement to test for reliability between the two.   Blind both and randomize the subjects. This provides more validity when you’re using a subjective method.  Seriously, this is research methods 101 and anyone trying to design a valid study knows to do this stuff.

But having the lead researcher, who is not blinded to who was in which group measure everybody at two different time points introduces far too much bias.  Again, I’m not saying it was there or deliberate or conscious.  I am saying that Cochrane standards EXIST to help avoid bias.  A simple blinding of the study would fix it.  Without blinding, the risk of bias is high.  Add in all of the other variables and I’d say it’s almost guaranteed.

Oh yeah, when I emailed Brad about this he got really defensive telling me he was a “seeker of the truth”.  Methinks thou doth protest too much, sir.  If you want to seek scientific truth, follow the established scientific guidelines.  And when questioned on why you didn’t, don’t guru out of it and claim you’re above the law.

Moving on.

Are Brad and James Biased?

Before pulling the trigger and presenting the smoking gun on all of this, let me address something I’ve been hinting at above when I keep asking if I am accusing Brad (or anyone else involved of this) of bias.  I kept saying no, not yet and well, it’s time for me to do so directly.

In 2017, Brad wrote the following paper (actually a letter to the editor I believe) with Ogborn and, hey, what do you know about that, James Krieger is a co-author, titled:

Schoenfeld BJ1, Ogborn D2, Krieger JW3. The dose-response relationship between resistance training volume and muscle hypertrophy: are there really still any doubts? J Sports Sci. 2017 35(20):1985-1987

The title seems pretty self explanatory: ARE THERE REALLY STILL ANY DOUBTS?    Basically, Brad and James (and I don’t know the third guy) are already convinced that there is a relationship with volume and hypertrophy.  Admittedly this was based on an earlier review but this is their pre-existing belief system: more volume means more hypertrophy.
This spells potential bias.  You have research from a group that already believes that volume is king for hypertrophy who would appear to have set up a study, unblinded, unrandomized, using a subjective method at a time point where edema affects the results and who interpreted weak statistics as a strong conclusions.  The fact that all involved continue to deflect and obfuscate and ignore the above points would seem to make it that much more clear.   They have an agenda to push, designed a study to support it and got caught out on it.  Brad got caught out in a methodological disaster of a study and a bunch of criticisms he couldn’t address.  James tried to come to his defense but could only harp on about statistics that don’t really matter in the big scheme.  His statistics are fine, it’s the interpretations, the methodology, the conclusions that are the problem.  As I said above, good statistics on bad data or with bad conclusions are still bad.
And even if you don’t accept any of the above criticism, don’t give a damn about the blinding or anything else, there is still one last point to make.  The one that nobody can defend or argue against on any level.

The Ostrowski Data

And finally is the smoking gun, honestly the only part of this that really matters.  And the one that only James has had the balls to even attempt to address, even though he did it appallingly poorly.  Brad ignored it, others deflected it, one person I trusted blindsided me completely with something I wrote 9 freaking years ago on a blog post (said post has since been corrected and it would have been lovely for him to have let me know about this sometime in the past half decade instead of saving it as ammunition for when he needed it against me).  I’m choosing not to name him for my own reasons but he’s just as culpable in all of this mess since he’s doing the same type of defenses of this as the others.  He’s done other things, trying to draw distinctions between this paper and another based on training status but the subjects in this study were NOT well trained.  So it’s just another pathetic deflection.

Nobody involved has yet to address the issue directly.  And even having written this very thing a bunch of times, they still won’t.

That’s because they can’t.  You’re about to see why.

In scientific studies, in the discussion, it is standard to examine other studies looking at the same topic whether they agree with you or not (in the introduction you usually give a precis of previous work).  You might note that in my books I will often mention when studies are contradictory as it’s important to present both sides of a given topic.  This is called being intellectually honest.  Doing this, looking at the body of literature on a topic allows good scientific models to be built.  So you look at previous studies and address them.

Did your study’s results agree with the broader body of literature or did it disagree?  Or rather, did a given study find results similar or in contrast to yours.    If the latter, it is incumbent upon you to address why the results might differ.  Often it’s methodological.  A training study on beginners and one on more advanced trainees should get different results based on training age.  Training to failure, frequency of training, total volume, a lot of factors can impact on why two training studies get different results.

But if one study on beginners get results so far out of the realm of the dozens of other papers, odds are that that single paper has an issue (consider that if a paper came out tomorrow saying that gravity didn’t work, it would go against hundreds of years of data and is likely incorrect).   Now maybe your contrary paper is correct and represents a novel finding which should be incorporated into or improve the current scientific model.

But you have to address it in any case and try to at least speculate why your results were different from the broader body of literature if that was indeed the case.  If their results were similar, you can basically say that your results were similar which adds to the body of literature concluding that X leads to Y (i.e. dropping something off a building leads to it falling at the speed of gravity).

And more importantly, you have to report the data on a study, especially if it runs counter to your results, honestly.   Now in part of the discussion, Brad actually did this for the strength data, examining why his results (no strength gain difference between groups) ran counter to literally every other study and meta-analysis on the topic.   I’m not being hyperbolic here.

The relationship of training volume and strength gains is extremely well established and, except in rank beginners, higher volumes give better strength gains than lower at least up to a point.  Yet Brad’s 1 set group got the same strength gains as a group doing 5 times the volume over 8 weeks.   It stands in complete and utter contrast to the broader body of literature.  And after addressing that his study was different, Brad reached exactly zero conclusion as to why this was the case (this didn’t stop him from crowing to the New York Times about how little training is needed for strength gains.  ONLY 13 minutes!  Great soundbite.  Only true in beginners.).

Personally I think it’s super easy to explain.  Given the known relationship of muscle size and strength, the lack of difference in strength gains was due to there being no meaningful difference in muscle growth.  Occam’s Razor says the simplest explanation is the right one and the simplest explanation is that the higher volume group didn’t gain more muscle or they would have made better strength gains.   Regardless, Brad did honestly examine the difference between his study and literally 60 years of contrary data even if he drew zero useful conclusion beyond “We dunno.”

It’s on the muscle growth data that the real problem shows up.  In the discussion of their paper, Brad examines a previous paper on the dose-response of training and hypertrophy to compare it to their results (recall that Brad et. al. claim that the highest volumes were the only ones that generated meaningful growth and did so strongly in their conclusion despite the Bayesian statistics absolutely not supporting that conclusion as they were ‘not worth a mention’).

Ostrowski et. al.

The paper in question is by Ostrowski et. al. titled The Effect of Weight Training Volume on Hormonal Output and Muscular Size and Function and was published in the Journal of Strength and Conditioning Research in 1997.  In it, they recruited 35 men but 8 dropped out leaving 27.  This left it a bit statistically underpowered just like Brad’s study.  At least they were both statistically underpowered.  The loss of 8 subjects left 9 in each group so the study was still balanced in that regard.  The men had been lifting from 1-4 years and had an average 1.7 bodyweight squat but only a 1.1 bodyweight bench press.  This makes their squat much higher than in Brad’s study and bench press relatively similar relative to bodyweight and their training experience was essentially identical.  So that’s a wash.

Each did the same weekly workout which had a push, pull, legs and arm day with 3 exercises per muscle.  They tested 1, 2 or 4 sets per exercise  so it was structurally similar to Brad’s study although they used a split routine rather than full-body three times weekly.  Assuming that each set that involves a given muscle counts as one set (an issue I’ll address in a later article), the volumes ended up being 7, 14 and 28 sets for triceps (due to both the pushing day and arm day hitting triceps) but only 3, 6 and 12 sets per week for the quadriceps (due to only the one leg day).  Muscle thickness was measured by Ultrasound and it’s interesting that the study explicitly stated the following

Ultrasound Reliability

Basically, since Ultrasound is subjective, they did pilot work to ensure that he would be consistent across measurements.  This is the mark of attentive scientists.  Before you use a subjective measurement make sure the people doing it are consistent.  If I’m doing tape measure measurements on someone, I need to show that I’m consistent day-to-day.  Otherwise I can pull the tape tighter one day than another and get results that aren’t real.

So they did the pre- and post-training Ultrasound measurements (and so far as I can tell, the time point at which this was done was not mentioned which is a methodological problem that I must honestly point out) to determine the changes in muscular thickness.

For quads (rectus femoris only) they found this:

3 -sets 6-sets 12-sets
Quads (pre) 930 mm^2 940 mm^2 860 mm^2
Quads (post) 993 mm^2 987 mm^2 973 mm^2
Change 63 47 113
%age Change 6.7% 5% 13.1%

Now, the researchers state that there was no statistical difference between groups although a trend is certainly seen for the highest volume group gaining more than the other two.  Likely the lack of statistical difference was due to the low number of subjects and the paper being statistically underpowered but gains in both an absolute and percentage sense more or less doubled from 3 to 12 sets.  This isn’t a shock given previous data which shows a pretty good relationship between training volume and growth up to about 10+ sets.  But nor is it particularly meaningful in that 12 sets is still pretty low in the big scheme (since it wasn’t tested we can’t know if more volume would have generated more growth).  We might ask why this study found growth from 12 sets per week while Brad’s need FORTY-FIVE (again, in similarly trained subjects) with NONE at lower volumes but no matter.  The hand-waving explanation in the discussion was this.

Ostrowski Quad DiscussionBasically, our 9 set group was close to their 12 set group in terms of volume but we needed nearly 4 times as much to get growth but we dunno why.  Ok.

But it’s the triceps data that is more interesting.  Here is Ostrowski’s actual data.  If you’re confused about why the numbers are so much lower it’s because the quad data was in millimeters squared and triceps was in millimeters.

7-sets 14-sets 28-sets
Triceps (pre) 44 mm 43 mm 42 mm
Triceps (post) 45 mm 45 mm 44 mm
Change 1 mm 2 mm 2 mm
%age Change 2.3% 4.7% 4.8%

Now, again the researcher concluded that there was no statistically significant difference in groups (probably due to it being underpowered) but there is a trend where you can see that 14-sets got double the growth as 7-sets but 28 sets got no further growth.  Well, not unless you count doubling your total sets for 0.1% of growth to be significant.  Basically it shows a plateau in growth at the middle volume.  14 was better than 7 but 28 was no better than 14.  Mind you we are looking at 1 vs 2 mm here but even by percentage it’s about double.    Got it? In contrast to the leg data, the triceps data showed a clear plateau in growth at 14 sets/week.  More volume was only better to a point.

Now, in his textbook, Brad actually reports Ostrowski’s original conclusion: that there is no significant difference in growth between groups.  Here’s the chart from his book (with the author’s name misspelled).

Schoenfeld Textbook Ostrowski Report

Somehow, those previous non-significant results became significant now in the discussion of this most-recent paper, which seems awfully convenient.  Brad says they re-analyzed the data statistically but, hmm, why didn’t he do that before?  Just another oddity for a man “seeking the truth” who seems to have changed how a given study data was reported and analyzed but no matter.  Regardless, I won’t disagree that the data shows a trend towards the moderate volume getting better growth for triceps and the lack of a difference is likely a statistical issue due to it being underpowered.  It still doesn’t change the fact that 28 sets was no better than 14 sets.

There was a clear plateau in the response of hypertrophy to training volume.

But that’s just the beginning and now it’s time to pull the trigger on the smoking gun.  The fact that Brad can’t address, James addressed badly and nobody but me seems to have caught (including the peer reviewers for the journal).

Here is a screenshot from the discussion in Brad’s paper. Click to make it bigger.

Ostrowski Misrepresentation

I’m going to go in reverse and first look at the quad data.  Brad reports that the lowest volume group got 6.8% increase while the highest volume group got the 13.1% shown above.  It doesn’t really matter in this case but it’s a  bit of an oddity that the middle data point wasn’t mentioned at all.  I mean it doesn’t change the results meaningfully but what, were they trying to save ink by not writing “…the 6 set/week group got a 5% gain and the 12 set group got….”.  It seems an odd omission and is atypical for a study examining previous work.  Reporting all three data points wouldn’t have changed their conclusion so why didn’t they do it?

I suspect it was to set up a pattern of data reporting for the triceps data.

Here Brad states:

Changes in triceps brachii MT in Ostrowski et al. (19) study were 2.2% for the lowest volume condition (7 sets per muscle/week) and 4.7% for the highest volume condition (28 sets per muscle/week). Similarly, our study showed changes in elbow extensor MT of 1.1% versus 5.5% for the lowest (6 sets per muscle/week) versus highest (30 sets per muscle/week) volume conditions, respectively.

Basically he reports the Ostrowski data as agreeing with his where the highest volume group (28 sets/week similar to his 30 set group) got more growth than the lowest volume group (7 sets/week).  Now, in the strictest sense this is true that the 28 set group did outperform the 7 set group.

Except that it’s not actually true because the 14 set group did just as well as the 28 set group (or rather the 28 set group did no better than the 14 set group).  And here is why this matters:

In omitting the middle data point, the conclusion from Ostrowski

What Ostrowski found for triceps was a clear plateau in growth at 14 sets.  Doubling that had no meaningful impact unless you count that extra 0.1% as meaningful.  28 sets was as good as 14 which was better than 7.  Yes I am beating a dead horse.  What Brad reported was that 28 sets was better than 7 which happened to match his paper’s conclusion (30 sets better than 6 sets).  If this is still unclear, look at this graphic I made.  The black line is what Ostrowski found, the red what Brad reported by not presenting the data for 14 sets/week

Ostrowski Comparison

See the difference?  Ostrowski found the black line, an increase from 7 to 14 sets and then a plateau at 14 sets.  Brad ignored the middle data point which makes it look like it was simply more is better.  If this is still unclear let me use a simple car analogy:

You come to my car lot and I have three cars that cost 70,000$, 140,000$ and 280,000$.  The first goes 100 MPH, the second goes 200 MPH, the third goes 201 MPH.  You tell me you want the fastest car I have for the best purchase price.  I tell you that, well, this car is 70,000$ but only goes 100 MPH but this baby is 280,000$ but goes 201 and I don’t tell you about the middle car.  I tell you that the more you spend the faster you go and get the contract out for you to sign.

Is what I said true? Well technically it is.  The more expensive car goes faster.
Is what I said actually true or HONEST?  No.  Because for half as much money you’d only go 1 MPH slower and there is a plateau in performance at 140,000$.  I bet if you found out about the middle car later you’d feel ripped off or lied to.

That’s what Brad did here.  He ignored the middle data point that showed a plateau and which directly contradicted his conclusion.  And by doing so he made a paper that actually contradicted him magically agree with him, all through the highly scientific method of “misrepresenting the data”.

Now, James, god bless him, spent a lot of time trying to defend this in my Facebook group.  He was playing semantic silly buggers, based on how the discussion was written (very carefully) to make it seem like it wasn’t a misrepresentation of the data.  But it is.  Because the way the data was presented reverses what was actually found in the Ostrowski paper.  Nothing changes that fact.  The data was completely mis-represented to change what it actually said.

Brad and the rest took a paper that disagreed with their conclusions and made it agree with them by ignoring data points and misrepresenting the actual data from that study.  This is not science.  And, flatly, if anybody writing a paper Brad disliked had done it, Brad would have anti-gurued them faster than you can drink a protein shake and the letter to the editor would have been written before he had closed the PDF on the paper he didn’t like.    But just as Brad is apparently above Cochrane guildelines, it’s ok for HIM to misrepresent data.

When James couldn’t argue this anymore, he finally admitted that yes, it was a misrepresentation but that it wasn’t deliberate.  Yeah, seems like an uncannily convenient mistake to make, to ignore the data point that changes the conclusion of a paper from disagreeing with you to agreeing with you when your career has supposedly been built on the quest for truth and an attention to accuracy.

James is also wrong that it wasn’t deliberate. And I told him this and of course he told me I didn’t understand statistics again or something equally irrelevant.  Ok, James.

Because, in 2016, Brad and I discussed the Ostrowski paper.  He claimed it said that more volume was better and I looked and it and pointed out that the triceps data didn’t actually support that.  That the 28 set group was not meaningfully superior to the 14 set group unless 0.1% better gains is meaningful for twice the training volume.  That it showed a clear plateau.  And thankfully I use Gmail which saved the EMAIL I SENT HIM at that time which appears below.

Ostrowski Schoenfeld EmailNow, I won’t share his private email response but in essence he said he’d look at it and think about it.  Clearly he did and decided to just ignore the middle data point when it was convenient to change that paper’s conclusion to agree with his.

And this means James is wrong.  Brad knows EXACTLY what Ostrowski found.   He knows for a fact that the triceps data shows a plateau because I told him about it 2 years ago.

Brad knew exactly what he was doing here.

That makes it deliberate.  James can hem and haw and deflect and obfuscate all he wants but he can’t address this or he would have.    Or any of my other points.  And statistics doesn’t have anything to do with it.

So let me sum up the issues I have that James, Brad and others have steadfastly failed to address:

  • Ultrasound is subjective. This is inherent to the method but can be addressed by having the tech do two tests on different days to see if they get reproducible results. Or having two techs take repeat measurements for comparison purposes.
  • The study wasn’t blinded, introducing, by Cochrane standards, THE standards of scientific review, a high risk of bias.  No exception is made for Brad Schoenfeld or anybody else.
  • Brad did the Ultrasound himself. Knowing who was in which group because it wasn’t blinded.
  • Brad did the Ultrasound, for at least some subjects at a time point where edema is known to still be present (48 hours).
  • Brad measured some people at 72 hours and lord knows what that changes.  We don’t know who got measured when since it wasn’t (apparently) randomized.
  • Brad and James have published a paper (several in fact) indicating that they believe a priori that more volume equals more growth.  That is their personal bias: more volume means more growth.
  • Most importantly and almost making the rest irrelevant:

Brad, or whomever was involved in writing the discussion, misrepresented data from a previous paper in such a way that the conclusions of that paper go from CONTRADICTING Brad’s results to SUPPORTING it.  If it was deliberate, and I think it was, that makes it a boldfaced lie.  Brad just happened to run into the one person who had already discussed Ostrowski with him and happened to mis-reference it in the prepublication paper which made me look more closely and see what he did.  It’s a shame I wasn’t on the peer review for this mess.

If it wasn’t deliberate, then it represents an indefensible oversight for someone claiming to be a meticulously detail driven scientist “seeking the truth”.  I mean, it’s not as if he didn’t re-read Ostrowski, re-grind the data and know that the triceps data didn’t support his paper’s conclusion.  He even said that he re-analysed the statistics and decided that they were statistically significant.  He knew that the moderate volume triceps group got the same results as the higher volume group.   He knew that the paper contradicted him. And his only recourse was to leave out the middle data point to make the conclusions seem opposite to what they were.

Even if you want to hand-wave #1-6 away as methodological nitpicking (note: the same type Brad himself engages in in papers he doesn’t like), #7 can’t be addressed or defended.  Period.  Scientists don’t get to misrepresent data when it suits them.  They don’t get to ignore data when it disagrees with them.

Yet here we are and Brad did just that very thing.  And, James Krieger, your statistics don’t make a damn bit of difference in that context.  Not blinding the study and misrepresenting data to change a paper’s conclusion have NOTHING to do with your statistics or pretty pictures or any other deflections or obfuscations you can come up with.


Simply, this paper was a methodological disaster.   Unblinded with the lead researcher (with a pre-conceived beleif) doing the Ultrasound in a non-randomized fashion at a time point where edema is known to occur with a method that is impacted by it.  And with a discussion where a previous paper that contradicted his results was deliberately (in my opinion) misrepresented to reverse its conclusion to agree with them.
And which, in response to numerous people (not only myself) bringing up these issues repeatedly, the individuals involved have ignored them.  Or when they have addressed them they have done so with nothing of substance.  Rather they have addressed it with the worst of tactics.
With deflection.  With obfuscation.  With constant irrelevancies.  With pure guru speak.
Well, James, Brad….now is your chance to HONESTLY address my (and other’s) criticisms if you can.

Ball’s in your court, James.

Addendum Saturday September 22nd, 2018: Since publishing the above piece, Brad Schoenfeld and Eric Helms (the unnamed individual above who is now being named) have both BLOCKED me on Facebook.   This is the behavior of people like Gary Taubes, Dr. Fung, Tim Noakes and other gurus who will not accept criticism of their work.  Draw your own conclusions.  Go check it out, ask them about this paper and the problems with it and I bet they block you too.

Addendum Saturday September 22nd, 2018: While Brad and Eric have both blocked me, James Krieger has attempted to defend and deflect with the following:

“Thus, be wary when someone places too much emphasis on the results of a single study, or tends to draw conclusions with high levels of certainty based on limited data. Each study is a very small piece of a larger puzzle, a puzzle of which you may not have an idea of what it looks like until you’ve gathered enough pieces to do so. And even when you do start to get enough pieces, you still only have an idea of what the picture might finally look like. Your conclusion remains tentative. And in science, you almost never have all the pieces of the puzzle. You make an educated guess as to what the puzzle ”

That’s actually pretty funny since this is EXACTLY what their group is doing.  Drawing a strong conclusion (which I quoted above) from a single paper based on shoddy methodology where data was LIED about in the discussion.  But no matter, in a week or two I will be examining all of the studies on this topic, about 7 of them at last count.  And we’ll see what the overall body of literature says about the topic.  And unlike Schoenfeld et. al., I won’t be misrepresenting any data to do it.

Addendum Saturday September 22nd, 2018: Lucus Tafur has done his own in-depth analysis of the Schoenfeld et. al. study.  He makes many similar points to myself, along with several others.  You can read it here.

Addendum Monday September 24th, 2018: Brian Bucher has written an extensive analysis of the statistics and interpretation of them which can be found on Redditt.  Short version: none of the three statistical methods used in the paper support the claimed conclusions. NONE OF THE THREE.

In a similar vein, James Krieger has written yet another semi-deflecting piece, examining briefly Lucas Tufur’s piece but ignoring completely Brian’s analysis defending the paper.  He makes many amusing arguments, accusing me of not having enough experience to understand how research is done (note: Layne said “You don’t even science” as well) but this is typical guru speak.  Somehow I know that studies should be blinded and randomized, something Brad Schoenfeld does not.   So maybe I know a bit more about proper study methodology than he thinks.

He also blathers about the cost and funding involved.  I never said science was easy and I know it’s expensive.  Yet Brad has published FORTY SEVEN papers this year (that’s 4 per month, most researchers do maybe 1 a year).  Funding is clearly not an issue and perhaps Brad should do one GOOD study per year instead of putting his name on 4 per month.

He also babbles something about whether or not Ostrowski was blinded or why I didn’t mention it above.  This is a pure deflection.  Essentially he’s arguing that since other studies might be methodologically unsound, it’s ok for theirs to be.  This is like arguing in court that “Yes, this man may have murdered someone.  But how do we know YOU haven’t murdered someone.” to deflect attention from the issue at hand  The methodology of Ostrowski is not at question here, the methodology (or lack thereof) of Brad’s paper is.  Regardless, it’s irrelevant.

Whether or not the Ostrowski is blinded or not doesn’t matter because I’m not the one holding it up as providing evidence.  If James is saying it should be dismissed for not being blinded, then Brad can’t use it in the discussion to support his conclusions.  Either the data is valid or it’s not.  In using it in his discussion, Brad is clearly taking the results as valid.  James can’t have it both ways.  And that is still just a deflection from the fact that, whether the Ostrowski data is good or not, BRAD LIED ABOUT WHAT IT SAID to change it’s conclusions from contradicting to agreeing with him.  Of course, James has still failed to address that so far as I can tell. But neither has anybody else addressed it.  Apparently Brad Schoenfeld, unlike any researcher before or after, gets to flat out lie about data in a discussion and nobody even blinks.

James also argues that the studies on edema timing aren’t relevant sine it was a new stimulus to the trainees.  Well the study Brad cited in his discussion was in beginners too.  And the second study I cited above was a long-term training study so the argument is wrong anyhow.  James can’t allow a study of untrained individuals as support for their claim and then say a novel stimulus (i.e. untrained subjects) doesn’t apply.  He can’t have it both ways, now can he?  And yet this is exactly what he’s trying to do.

But James, like Eric and Brad, has gone full guru mode at this point.  He has no other recourse.  They are backed into a corner without a ledge to stand on and just have to keep arguing in circles.  Let me be clear, I give James points for at least trying to address *some* of the points that were brought up.   Neither Brad nor Eric Helms had the guts do that.  Brad ignored every question and Eric defended him with pathetic deflections.

At least James tried.  He just didn’t do a very good job of it.

Oh yeah, James Krieger has now blocked me on FB as well, right after publishing his article.  But it’s always easier to win an argument when the person you’re arguing with or attacking can’t argue back isn’t it?   I’d note that I left all three of them in my FB group to give them the opportunity to address criticisms and all three voluntarily left.  I did not and would not have blocked or booted them so that I could win by default.  They punked out by choice.

So add James Krieger to the guru group of Brad Schoenfeld and Eric Helms.  Their actions are no different than endless others before them: Tim Noakes, Dr. Fung, Gary Taubes, Layne Norton.  Individuals who just block and ignore criticism rather than address it.  If that doesn’t tell you everything you need to know about the conclusions of this study, no amount of in-depth analysis will help.

Science is predicated on discourse and discussion of data.  No real scientist avoids discussion or uses guru tactics like Brad and Eric have.  Those actions are the mark of the guru and nothing more.

Protein Amount and Post Workout Protein Synthesis – Research Review

MacNaughton et. al. The response of muscle protein synthesis following whole-body resistance exercise is greater following 40 g than 20 g of ingested whey protein.  hysiol Rep, 4 (15), 2016, e12893

The currently accepted amount of protein required to achieve maximal stimulation of myofibrillar protein synthesis (MPS) following resistance exercise is 20–25 g. However, the influence of lean body mass (LBM) on the response of MPS to protein ingestion is unclear. Our aim was to assess the influence of LBM, both total and the amount activated during exercise, on the maximal response of MPS to ingestion of 20 or 40 g of whey protein following a bout of whole-body resistance exercise. Resistance-trained males were assigned to a group with lower LBM (≤65 kg; LLBM n = 15) or higher LBM (≥70 kg; HLBM n = 15) and participated in two trials in random order. MPS was measured with the infusion of 13C6-phenylalanine tracer and collection of muscle biopsies following ingestion of either 20 or 40 g protein during recovery from a single bout of whole-body resistance exercise. A similar response of MPS during exercise recovery was observed between LBM groups following protein ingestion (20 g – LLBM: 0.048 ± 0.018%·h−1; HLBM: 0.051 ± 0.014%·h−1; 40 g – LLBM: 0.059 ± 0.021%·h−1; HLBM: 0.059 ± 0.012%·h−1). Overall (groups combined), MPS was stimulated to a greater extent following ingestion of 40 g (0.059 ± 0.020%·h−1) compared with 20 g (0.049 ± 0.020%·h−1; P = 0.005) of protein. Our data indicate that ingestion of 40 g whey protein following whole-body resistance exercise stimulates a greater MPS response than 20 g in young resistance-trained men. However, with the current doses, the total amount of LBM does not seem to influence the response.


So first let me thank Brad Schoenfeld for making me aware of this paper.  It’s quite timely given that I’m currently mired (yes, mired) in the around workout nutrition chapter of the woman’s book.  Now, in recent years, the whole post-workout nutrition thing (or more generally around workout or peri-workout nutrition) has become a little bit more confusing than it was originally.


A Comparison of Strength and Muscle Mass Increases During Resistance Training in Young Women

Chilibeck PD et. al. A comparison of strength and muscle mass increases during resistance training in young women. Eur J Appl Physiol Occup Physiol. 1998;77(1-2):170-5.

Strength gains with resistance training are due to muscle hypertrophy and nervous system adaptations. The contribution of either factor may be related to the complexity of the exercise task used during training. The purpose of this investigation was to measure the degree to which muscle hypertrophy contributes to gains in strength during exercises of varying complexity. Nineteen young women resistance trained twice a week for 20 weeks, performing exercises designed to provide whole-body training. The lean mass of the trunk, legs and arms was measured by dual energy x-ray absorptiometry and compared to strength gains (measured as the 1-repetition maximum) in bench press, leg press and arm curl exercises, pre-, mid- (10 weeks) and post-training. No changes were found in a control group of ten women. For the exercise group, increases in bench press, leg press and arm curl strength were significant from pre- to mid-, and from mid- to post-training (P < 0.05). In contrast, increases in the lean mass of the body segments used in these exercises followed a different pattern. Increases in the lean mass of the arms were significant from pre- to mid-training, while increases in the lean mass of the trunk and legs were delayed and significant from mid- to post-training only (P < 0.05). It is concluded that a more prolonged neural adaptation related to the more complex bench and leg press movements may have delayed hypertrophy in the trunk and legs. With the simpler arm curl exercise, early gains in strength were accompanied by muscle hypertrophy and, presumably, a faster neural adaptation.


I haven’t done a research review in a fairly long time since I think I found it more useful to write articles and just link out.  Two weeks ago when I was babbling about neural adaptations to training, I mentioned a paper suggesting that more complex movements might cause slower increases in muscle growth.