code jockey with a posse
11 stories
1 follower

Zeus: Analyzing safety of smart contracts

1 Comment and 2 Shares

Zeus: Analyzing safety of smart contracts Kalra et al., NDSS’18

I’m sure many readers of The Morning Paper are also relatively experienced programmers. So how does this challenge sound? I want you to write a program that has to run in a concurrent environment under Byzantine circumstances where any adversary can invoke your program with any arguments of their choosing. The environment in which your program executes (and hence any direct or indirect environmental dependencies) is also under adversary control. If you make a single exploitable mistake or oversight in the implementation, or even in the logical design of the program, then either you personally or perhaps the users of your program could lose a substantial amount of money. Where your program will run, there is no legal recourse if things go wrong. Oh, and once you release the first version of your program, you can never change it. It has be right first time.

I don’t think there are many experienced programmers that would fancy taking on this challenge. But call it ‘writing a smart contract’ and programmers are lining up around the block to have a go! Most of them it seems, get it wrong.

Zeus is a framework for verifying the correctness and fairness of smart contracts:

We have built a prototype of Zeus for Ethereum and Fabric blockchain platforms, and evaluated it with over 22.4K smart contracts. Our evaluation indicates that about 94.6% of contracts (containing cryptocurrency worth more than $0.5B) are vulnerable…. (however, we do not investigate the practical exploitability of these bugs).

What could possibly go wrong?

We’ve studied some of the issues involved in writing smart contracts before. The authors of Zeus also provide a short summary of some of the ways that smart contracts can be incorrect or unfair. Implementation challenges leading to incorrect behaviour include:

  • Reentrancy: multiple parallel external invocations are possible using the call family of constructs. If global state is not correctly managed, a contract can be vulnerable to reentrancy attacks. The related cross-function race condition can occur when two different functions operate on the same global state.

  • Unchecked sends: upon a send call, a computation-heavy fallback function at the receiving contract can exhaust the available gas and cause the invoking send to fail. If this is not correctly handled it can lead to loss of Ether.

  • Failed sends: best practices suggest executing a throw upon a failed send in order to revert the transaction. But this practice also has its risks. For example, in the following extract the throw causes loss of money to the DAO:

  • Integer over or underflow: there are over 20 different scenarios that require careful handling of integer operations to avoid overflow or underflow. For example:

  • Transaction state dependence: contract writers can utilise transaction state variables such as tx.origin for managing control flow within a contract. A crafty attacker can manipulate tx.origin (for example, using social engineering or phishing techniques) to their advantage.

Contracts may also end up being unfair due to logical design errors:

  • Absence of required logic: for example, not guarding a call to selfdestruct with a check that only the owner of a contract is allowed to kill it.
  • Incorrect logic: “there are many syntactically legal ways to achieve semantically unfair behaviour.” For example, the HackersGold contract had a bug where the transferFrom function used =+ in place of += and thus failed to increment a balance transferred to the recipient. (15 unique contracts in the data set had copies of the function with the same bug, and held over $ 35,000 worth of Ether between them).
  • Logically correct but unfair designs. For example, an auction house contract that does not declare whether it is ‘with reserve,’ meaning that sellers can also bid or withdraw an item before it is sold. Unsuspecting bidders (with no expertise in examining the source code) could lose money due to artificially increased bids or forfeit their participation fee. (My personal view: when the code is the contract, and there is no other recourse, you’d really better be able to read the code if you want to participate). “This contract… indicates the subtleties involved in multi-party interactions, where fairness is subjective.”

Finally, miner’s also have some control over the environment in which contracts execute. Smart contract designers need to be aware of:

  • Block state dependencies: many block state variables are determined from the block header, and thus vulnerable to tampering from the block miner.
  • Transaction order dependencies: miners can reorder transactions and hence potentially influence their outcome.

A high-level description of Zeus

Of Zeus itself, we get only a relatively high-level description.

…Zeus takes as input a smart contract and a policy (written in a specification language) against which the smart contract must be verified. It performs static analysis atop the smart contract code and inserts policy predicates as assert statements at correct program points. Zeus then leverages its source code translator to faithfully convert the smart contract embedded with policy assertions to LLVM bitcode. Finally, Zeus invokes its verifier to determine assertion violations, which are indicative of policy violations.

Smart contracts are modelled using an abstract language that looks like this:

Programs are sequences of contract declarations, and each contract is viewed as a sequence of one or more method definitions in addition to the declaration and initialisation of persistent storage private to the contract. Contracts are identified by an Id, and the invocation of a publicly visible method on a contract is viewed as a transaction. Execution semantics are succinctly given in the following table:


A source code translator translates Solidity code into the abstract contract language.

The policy language enables a user to specify fairness rules for a contract. These rules are specified as XACML-styled five tuples with Subject, Object, Operation, Condition, and Result.

Our abstract language includes assertions for defining state reachability properties on the smart contract. Zeus leverages the policy tuple to extract: (a) predicate (i.e., Condition) to be asserted, and (b) the correct control location for inserting the assert statements in the program source.

Given the policy-enhanced program, Zeus translates it into LLVM bitcode. Here’s a simple end-to-end example showing the various components:

Finally, Zeus feeds the LLVM bitcode representation to an existing verification engine (Seahorn) that leverages constrained horn clauses (CHCs) to quickly ascertain the safety of the smart contract. Zeus is not tied to Seahorn though, in theory it could also be used with other verifiers operating on LLVM bitcode, such as SMACK or DIVINE.

Evaluating Solidity-based contracts

We periodically scraped Etherscan, Etherchain, and Ethercamp explorers over a period of three months and obtained source code for 22,493 contracts at distinct addresses. We discounted 45 contracts that had an assembly block in their source, and obtained 1524 unique contracts (as par their sha256 checksum). In the remainder of this section, we present results only for these unique contracts, unless otherwise specified.

The collected contracts divide up as follows:

Zeus checks for all of the correctness and miner’s influence issues are run across all of these contracts. Results are manually validated to determine the set of false positive and negatives. Zeus is also compared against the results from Oyente (‘Making smart contracts smarter’).

Here’s what Zeus finds:


  • 21,281 out of 22,493 contracts (94.6%), containing more than $ 0.5 billion worth of Ether are vulnerable to one or more bugs. Across the unique contracts, 1194 out of 1524 contracts were found to be vulnerable to one or more bugs.
  • There were zero false negatives across all seven of the bug classes.
  • Verification is fast, with only 44 out of 1524 contracts (2.89%) timing out in at least one bug class. The timeout threshold was set at one minute.

To evaluate fairness the authors had to write some policies. As as example, for the CrowdFundDao the authors implemented policies that (a) blacklisted developers cannot participate in the scheme, and (b) an investment must be more than a threshold limit. Zeus is able to determine that neither of these checks are encoded in the contract. As a general purpose policy, the team wrote a policy that selfdestruct should only be called by the contract owner. 284 of the 1524 contracts included a selfdestruct, with 5.6% of them violating the policy.

Read the whole story
8 days ago
"I don’t think there are many experienced programmers that would fancy taking on this challenge. But call it ‘writing a smart contract’ and programmers are lining up around the block to have a go! Most of them it seems, get it wrong."
Share this story

Mosaic: processing a trillion-edge graph on a single machine

1 Comment and 2 Shares

Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys’17

Unless your graph is bigger than Facebook’s, you can process it on a single machine.

With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises 1.4 billion vertices and 1 trillion edges. To process such graphs, they ran a distributed graph processing engine, Giraph, on 200 machines. But, with Mosaic, we are able to process large graphs, even proportional to Facebook’s graph, on a single machine.

In this case it’s quite a special machine – with Intel Xeon Phi coprocessors and NVMe storage. But it’s really not that expensive – the Xeon Phi used in the paper costs around $549, and a 1.2TB Intel SSD 750 costs around $750. How much do large distributed clusters cost in comparison? Especially when using expensive interconnects and large amounts of RAM.

So Mosaic costs less, but it also consistently outperforms other state-of-the-art out of core (secondary storage) engines by 3.2x-58.6x, and shows comparable performance to distributed graph engines. At one trillion edge scale, Mosaic can run an iteration of PageRank in 21 minutes (after paying a fairly hefty one-off set-up cost).

(And remember, if you have a less-than-a-trillion edges scale problem, say just a few billion edges, you can do an awful lot with just a single thread too!).

Another advantage of the single machine design, is a much simpler approach to fault tolerance:

… handling fault tolerance is as simple as checkpointing the intermediate stale data (i.e., vertex array). Further, the read-only vertex array for the current iteration can be written to disk parallel to the graph processing; it only requires a barrier on each superstep. Recovery is also trivial; processing can resume with the last checkpoint of the vertex array.

There’s a lot to this paper. Perhaps the two most central aspects are design sympathy for modern hardware, and the Hilbert-ordered tiling scheme used to divide up the work. So I’m going to concentrate mostly on those in the space available.

Exploiting modern hardware

Mosaic combines fast host processors for concentrated memory-intensive operations, with coprocessors for compute and I/O intensive components. The chosen coprocessors for edge processing are Intel Xeon Phis. In the first generation (Knights Corner), a Xeon Phi has up to 61 cores with 4 hardware threads each and a 512-bit single instruction, multiple data (SIMD) unit per core. To handle the amount of data needed, Mosaic exploits NVMe devices that allow terabytes of storage with up to 10x the throughput of SSDs. PCIe-attached NVMe devices can deliver nearly a million IOPS per device with high bandwidth (e.g. Intel SSD 750) – the challenge is to exhaust this available bandwidth.

As we’ve looked at before in the context of datastores, taking advantage of this kind of hardware requires different design trade-offs:

We would like to emphasize that existing out-of-core engines cannot directly improve their performance without a serious redesign. For example, GraphChi improves the performance only by 2-3% when switched from SSDs to NVMe devices or even RAM disks.

Scaling to 1 trillion edges

That’s a lot of edges. And we also know that real-world graphs can be highly skewed, following a power-law distribution. A vertex-centric approach makes for a nice programming model (“think like a vertex”), but locality becomes an issue when locating outgoing edges. Mosaic takes some inspiration from COST here:

To overcome the issue of non-local vertex accesses, the edges can be traversed in an order than preserves vertex locality using, for example, the Hilbert order in COST using delta encoding.

Instead of a global Hilbert ordering of edges (one trillion edges remember), Mosaic divides the graph into tiles (batches of local graphs) and uses Hilbert ordering for tiles.

A Hilbert curve is a continuous fractal space-filling curve. Imagine a big square table (adjacency matrix) where the rows and columns represent source vertices and target vertices respectively. An edge from vertex 2 to vertex 4 will therefore appear in row 2, column 4. If we traverse this table in Hilbert-order (following the path of the curve), we get a nice property that points close to each other along the curve also have nearby (x,y) values – i.e., include similar vertices.

The following illustration shows a sample adjacency matrix, the Hilbert curve path through the table, and the first two tiles that have been extracted.

Also note that the edge-space is divided into partitions (labelled P_{11}, P_{12} etc. in the figure above.. It’s important to note again here that Mosaic does not process every single edge in Hilbert-order (that would require a global sorting step), instead it statically partitions the adjacency matrix into 2^{16} \times 2^{16} blocks (partitions), and then processes the partitions themselves in Hilbert-order. The tile structure is populated by taking a stream of partitions as input

We consume partitions following the Hilbert order, and add as many edges as possible into a tile until its index structure reaches the maximum capacity to fully utilize the vertex identifier in a local graph… This conversion scheme is an embarrassingly parallel task.

At runtime, Mosaic processes multiple tiles in parallel on four Xeon Phis. Each has 61 cores, giving 244 processing instances running in parallel.

Due to this scale of concurrent access to tiles, the host processors are able to exploit the locality of the shared vertex states associated with the tiles currently being processed, keeping large parts of these states in the cache.

When processing a tile, neighbouring tiles are prefetched from NVMe devices to memory in the background, following the Hilbert-order.

System components

The full Mosaic system looks like this:

The Xeon Phi coprocessors run local fetchers, edge processors, and reducers. Each Phi has one fetcher and one reducer, and an edge processor per core. The local reducer retrieves the computed responses from the coprocessors and aggregates them before sending them back for global processing.

The host processor runs global reducers that are assigned partitions of the global vertex state and received and process input from local reducers.

As modern systems have multiple NUMA domains, Mosaic assigns disjoint regions of the global vertex state array to dedicated cores running on each NUMA socket, allowing for large, concurrent NUMA transfers in accessing the global memory.

Programming model

Mosaic uses the numerous yet slower co-processors cores to perform edge processing on local graphs, and the faster host processors to reduce the computation results to global vertex states.

To exploit such parallelism, two key properties are required in Mosaic’s programming abstraction, namely commutativity and associativity. This allows Mosaic to schedule computation and reduce operations in any order.

The API is similar to the Gather-Apply-Scatter model, but extended to a Pull-Reduce-Apply (PRA) model.

  • Pull(e) : For every edge (u,v), Pull(e) computes the result of the edge e by applying an algorithm specific function on the value of the source vertex u, and the related data such an in- or out-degrees.
  • Reduce(v1,v2): Takes two values for the same vertex and combines them into a single output. Invoke by edge processors on coprocessors, and global reducers on the host.
  • Apply(v): After reducing all local updates to the global array, apply runs on each vertex state in the array, allowing the graph algorithm to perform non-associative operations. Global reducers run this at the end of each iteration.

It all fits together something like this:

And here’s what PageRank looks like using the Mosaic programming model:


The team implemented seven popular graph algorithms and tested them on two different classes of machines – a gaming PC (vortex) and a workstation (ramjet), using six different real-world and synthetic datasets, including a synthetic trillion-edge graph following the distribution of Facebook’s social graph. The vortex machine has one Xeon Phi, whereas Ramjet has four.

The execution times are shown in the following table:

Mosaic shows 686-2,978 M edges/sec processing capability depending on datasets, which is even comparable to other in-memory engines (e.g. 695-1,390M edges/sec in Polymer) and distributed engines (e.g. 2,770-6,335 M edges/sec for McSherry et al.’s Pagerank-only in-memory cluster system.

On ramjet, Mosaic performs one iteration of Pagerank in 21 minutes.

Overall, here’s how Mosaic stacks up against some of the other options out there:

  • Compared to GPGPU systems, Mosaic is slower by a factor of up to 3.3x, but can scale to much larger graphs.
  • Compared to distributed out-of-core systems Mosaic is approximately one order of magnitude faster.
  • Mosaic is performance competitive with distributed in-memory systems (beats GraphX by 4.7x-6.5x). These systems pay heavily for distribution.

Converting datasets to the tile structure takes 2-4 minutes for datasets up to about 30 GB. For hyperlink14, with 21 M edges and 480 GB of data it takes 51 minutes. For the trillion edge graph (8,000 GB) it takes about 30 hours. However, after only a small number of iterations of Pagerank, Mosaic is beating systems that don’t have a preprocessing step, showing a quick return on investment.

Finally, let’s talk about the COST. It’s very nice to see the authors addressing this explicitly. On uk2007-05, ramjet matches a single-threaded host-only in-memory implementation with 31 Xeon Phi cores (and ultimately can go up to 4.6x faster than the single-threaded solution). For the twitter graph, Mosaic needs 18 Xeon Phi cores to match single-thread performance, and ultimately can go up to 3.86x faster.

Read the whole story
266 days ago
Coming back from distributed to single machine
Share this story

What TED Needs

1 Comment

Most people want, and gain value from, religious-like communities, strongly bonded by rituals, mutual aid, and implausible beliefs. (Patriotism and political ideologies can count here.) I once embraced that deeply and fully. But then I acquired a strong self-identity as an honest intellectual, which often conflicts with common religious practices. However, I get that my sort of intellectual identity is never going to be common. So religion will continue, even with ems. Realistically, the best widespread religion I’m going to get is one that at least celebrates intellectuals and their ideals, even if it doesn’t fully embrace them, and does so in a form that is accessible to a wide public.

I’ve given four TEDx talks so far, and will give another in two weeks. Ten days ago I had the honor of giving a talk on Age of Em at the annual TED conference in Vancouver (video not yet posted). And I have to say that the TED community seems to come about as as close as I can realistically expect to my ideal religion. It is high status, accessible to a wide public, and has a strong sense of a shared community, and of self-sacrifice for community ideals. It has lots of ritual, music, and art, and it celebrates innovation and intellectuals. It even gives lip service to many intellectual virtues. If borderline religious elements sometimes make me uncomfortable, well that’s my fault, not theirs.

The main TED event differs from other TEDx events. Next year the price will be near $10K just for registration, and even then you have to submit an application, and some are rejected. At that high price the main attendees are investors and CEOs looking to network with each other. As a result, it isn’t really a place to geek out talking ideas. But that seems mainly a result of TED’s great success, and overall it does seem to help the larger TED enterprise. Chris Anderson deserves enormous credit for shepherding all this success.

The most encouraging talk I heard at TED 2017 was by David Brenner on his efforts to disinfect human places. Apparently there are frequencies of ultraviolet (UV) light that don’t penetrate skin past the top layer of dead skin cells, but still penetrate all the way through almost all bacteria and viruses in the air and on smooth-enough surfaces. So we should be able to use special UV lights to easily disinfect surfaces around humans. For example, we might cheaply sterilize whole hospitals. And maybe also airports during pandemics. This seems an obvious no brainer that should have been possible anytime in the last century (assuming they’ve done penetration-depth vs. frequency measurements right). Yet Brenner has been working on this for five years and still seems far from getting regulatory approval. This seems to me a bad case of civilization and regulatory failure. Even so, the potential remains great.

The most discouraging talk I heard was by Jim Yong Kim, President of the World Bank Group. He talked about how he fought the World Bank for years, because they insisted on using cost-effectiveness criteria to pick medical investments. He showed us pictures of particular people helped by less cost-effective treatments, daring us to say they were not worth helping. And he said people in poor nations have status-based “aspirations” for the same sort of hospitals and schools found in rich nations, even if they aren’t cost-effective, and who are we to tell them no. Now that he runs the World Bank (nominated by Obama in 2012), his priorities can now win more. The audience cheered. 🙁

All strong religions seem to need some implausible beliefs, and perhaps for TED one of them is the idea we need only point out problems to good people, to have those problems solved. But if not, then what I think TED audiences most need to hear are basic reviews on the topics of market failure and regulatory costs.

At TED 2017 I heard many talks where speakers point out a way that our world is not ideal. For example, speakers talked about how tech firms compete to entice users to just pay attention to them, how cities seem to be spread out more than is ideal, and how inner city grocery stores have less fresh food. But speakers never attributed problems to a particular standard kind of market failure, much less suggest a particular institutional solution because it matched the kind of market failure it was to address. While speakers tend to imply government regulation and redistribution as solutions, they never consider the many ways that regulation and redistribution can go wrong and be costly.

It is as if TED audiences, who hear talks on a great specialized many areas of science and tech, were completely unaware of key long-established and strongly-relevant areas of scholarship. If TED audiences were instead well informed about institution design, market failures, and regulatory costs, then a speaker who pointed out a problem would be expected to place it within our standard classifications of ways that things can go wrong. They’d be expected to pick the standard kind of institutional solution to each kind of problem, or explain why their particular problem needs an unusual solution. And they’d be expected to address the standard ways that such a solution could be costly or go wrong. Perhaps even adjust their solution to deal with case-specific costs and failure modes.

None of this is about left vs. right, it is just about good policy analysis. But perhaps this is just a bridge too far. Until the wider public becomes informed about these things, maybe TED speakers must also assume that their audience is ignorant of them as well. But if TED wants to better help the world to actually solve its problems, this is what its audience most needs to hear.

Read the whole story
312 days ago
Institutional failure modes are too frequently ignored- just assume that all regulation is perfect and all problems look easy.
Share this story

kontextmaschine:I mean hashtag everyone I know on tumblr and especially all the atheists I respect...

1 Comment


I mean hashtag everyone I know on tumblr and especially all the atheists I respect have started casually/competitively invoking Christian and Jewish mythology like the Renaissance did Greek and Roman, that’s interesting

Interesting observation in context of this from Robin Hanson yesterday:

Religions often expose children to a mass of details, as in religious stories. Smart children can be especially engaged by these details because they like to show off their ability to remember and understand detail. Later on, such people can show off their ability to interpret these details in many ways, and to identify awkward and conflicting elements.

Even if the conflicts they find are so severe as to reasonably call into question the entire thing, by that time such people have invested so much in learning details of their religion that they’d lose a lot of ability to show off if they just left and never talked about it again. Some become vocally against their old religion, which lets them keep talking and showing off about it. But even in opposition, they are still then mostly defined by that religion.

Read the whole story
338 days ago
People become invested in what they know a lot about
Share this story

A Kaggler's Guide to Model Stacking in Practice

1 Comment
Guide to Model Stacking Meta Ensembling


Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. For this reason, stacking is most effective when the base models are significantly different. Here I provide a simple example and guide on how stacking is most often implemented in practice.

Feel free to follow this article using the related code and datasets here in the Machine Learning Problem Bible.

This tutorial was originally posted here on Ben's blog, GormAnalysis.



Suppose four people throw a combined 187 darts at a board. For 150 of those we get to see who threw each dart and where it landed. For the rest, we only get to see where the dart landed. Our task is to guess who threw each of the unlabelled darts based on their landing spot.


K-Nearest Neighbors (Base Model1)

Let’s make a sad attempt at solving this classification problem using a K-Nearest Neighbors model. In order to select the best value for K, we’ll use 5-fold Cross-Validation combined with Grid Search where K=(1, 2, … 30). In pseudo code:

  1. Partition the training data into five equal size folds. Call these test folds.
  2. For K = 1, 2, … 10
    1. For each test fold
      1. Combine the other four folds to be used as a training fold
      2. Fit a K-Nearest Neighbors model on the training fold (using the current value of K)
      3. Make predictions on the test fold and measure the resulting accuracy rate of the predictions
    2. Calculate the average accuracy rate from the five test fold predictions
  3. Keep the K value with the best average CV accuracy rate

With our fictitious data we find K=1 to have the best CV performance (67% accuracy). Using K=1, we now train a model on the entire training dataset and make predictions on the test dataset. Ultimately this will give us about 70% classification accuracy.

Support Vector Machine (Base Model2)

Now let’s make another sad attempt at solving the problem using a Support Vector Machine. Additionally, we’ll add a feature DistFromCenter that measures the distance each point lies from the center of the board to help make the data linearly separable. With R’s LiblineaR package we get two hyper parameters to tune:


  1. L2-regularized L2-loss support vector classification (dual)
  2. L2-regularized L2-loss support vector classification (primal)
  3. L2-regularized L1-loss support vector classification (dual)
  4. support vector classification by Crammer and Singer
  5. L1-regularized L2-loss support vector classification


Inverse of the regularization constant

The grid of parameter combinations we’ll test is the cartesian product of the 5 listed SVM types with cost values of (.01, .1, 1, 10, 100, 1000, 2000). That is

type cost
1 0.01
1 0.1
1 1
5 100
5 1000
5 2000

Using the same CV + Grid Search approach we used for our K-Nearest Neighbors model, here we find the best hyper-parameters to be type = 4 with cost = 1000. Again, we use these parameters to train a model on the full training dataset and make predictions on the test dataset. This’ll give us about 61% CV classification accuracy and 78% classification accuracy on the test dataset.

Stacking (Meta Ensembling)

Let’s take a look at the regions of the board each model would classify as Bob, Sue, Mark, or Kate.


Unsurprisingly, the SVM does a good job at classifying Bob’s throws and Sue’s throws but does poorly at separating Kate’s throws and Mark’s throws. The opposite appears to be true for the K-nearest neighbors model. HINT: Stacking these models will probably be fruitful.

There are a few schools of thought on how to actually implement stacking. Here’s my personal favorite applied to our example problem:

1. Partition the training data into five test folds


ID FoldID XCoord YCoord DistFromCenter Competitor
1 5 0.7 0.05 0.71 Sue
2 2 -0.4 -0.64 0.76 Bob
3 4 -0.14 0.82 0.83 Sue
183 2 -0.21 -0.61 0.64 Kate
186 1 -0.86 -0.17 0.87 Kate
187 2 -0.73 0.08 0.73 Sue

2. Create a dataset called train_meta with the same row Ids and fold Ids as the training dataset, with empty columns M1 and M2. Similarly create a dataset called test_meta with the same row Ids as the test dataset and empty columns M1 and M2


ID FoldID XCoord YCoord DistFromCenter M1 M2 Competitor
1 5 0.7 0.05 0.71 NA NA Sue
2 2 -0.4 -0.64 0.76 NA NA Bob
3 4 -0.14 0.82 0.83 NA NA Sue
183 2 -0.21 -0.61 0.64 NA NA Kate
186 1 -0.86 -0.17 0.87 NA NA Kate
187 2 -0.73 0.08 0.73 NA NA Sue


ID XCoord YCoord DistFromCenter M1 M2 Competitor
6 0.06 0.36 0.36 NA NA Mark
12 -0.77 -0.26 0.81 NA NA Sue
22 0.18 -0.54 0.57 NA NA Mark
178 0.01 0.83 0.83 NA NA Sue
184 0.58 0.2 0.62 NA NA Sue
185 0.11 -0.45 0.46 NA NA Mark

3. For each test fold
{Fold1, Fold2, … Fold5}

3.1 Combine the other four folds to be used as a training fold

train fold1

ID FoldID XCoord YCoord DistFromCenter Competitor
1 5 0.7 0.05 0.71 Sue
2 2 -0.4 -0.64 0.76 Bob
3 4 -0.14 0.82 0.83 Sue
181 5 -0.33 -0.57 0.66 Kate
183 2 -0.21 -0.61 0.64 Kate
187 2 -0.73 0.08 0.73 Sue

3.2 For each base model
M1: K-Nearest Neighbors (k = 1)
M2: Support Vector Machine (type = 4, cost = 1000)

3.2.1 Fit the base model to the training fold and make predictions on the test fold. Store these predictions in train_meta to be used as features for the stacking model

train_meta with M1 and M2 filled in for fold1

ID FoldID XCoord YCoord DistFromCenter M1 M2 Competitor
1 5 0.7 0.05 0.71 NA NA Sue
2 2 -0.4 -0.64 0.76 NA NA Bob
3 4 -0.14 0.82 0.83 NA NA Sue
183 2 -0.21 -0.61 0.64 NA NA Kate
186 1 -0.86 -0.17 0.87 Bob Bob Kate
187 2 -0.73 0.08 0.73 NA NA Sue

4. Fit each base model to the full training dataset and make predictions on the test dataset. Store these predictions inside test_meta


ID XCoord YCoord DistFromCenter M1 M2 Competitor
6 0.06 0.36 0.36 Mark Mark Mark
12 -0.77 -0.26 0.81 Kate Sue Sue
22 0.18 -0.54 0.57 Mark Sue Mark
178 0.01 0.83 0.83 Sue Sue Sue
184 0.58 0.2 0.62 Sue Mark Sue
185 0.11 -0.45 0.46 Mark Mark Mark

5. Fit a new model, S (i.e the stacking model) to train_meta, using M1 and M2 as features. Optionally, include other features from the original training dataset or engineered features

S: Logistic Regression (From LiblineaR package, type = 6, cost = 100). Fit to train_meta

6. Use the stacked model S to make final predictions on test_meta

test_meta with stacked model predictions

ID XCoord YCoord DistFromCenter M1 M2 Pred Competitor
6 0.06 0.36 0.36 Mark Mark Mark Mark
12 -0.77 -0.26 0.81 Kate Sue Sue Sue
22 0.18 -0.54 0.57 Mark Sue Mark Mark
178 0.01 0.83 0.83 Sue Sue Sue Sue
184 0.58 0.2 0.62 Sue Mark Sue Sue
185 0.11 -0.45 0.46 Mark Mark Mark Mark

The main point to take home is that we’re using the predictions of the base models as features (i.e. meta features) for the stacked model. So, the stacked model is able to discern where each model performs well and where each model performs poorly. It’s also important to note that the meta features in row i of train_meta are not dependent on the target value in row i because they were produced using information that excluded the target_i in the base models’ fitting procedure.

Alternatively, we could make predictions on the test dataset using each base model immediately after it gets fit to each test fold. In our case this would generate test-set predictions for five K-Nearest Neighbors models and five SVM models. Then we would average the predictions per model to generate our M1 and M2 meta features. One benefit to this is that it’s less time consuming than the first approach (since we don’t have to retrain each model on the full training dataset). It also helps that our train meta features and test meta features should follow a similar distribution. However, the test metas M1 and M2 are likely more accurate in the first approach since each base model was trained on the full training dataset (as opposed to 80% of the training dataset, five times in the 2nd approach).

Stacked Model Hyper Parameter Tuning

So, how do you tune the hyper parameters of the stacked model? Regarding the base models, we can tune their hyper parameters using Cross-Validation + Grid Search just like we did earlier. It doesn’t really matter what folds we use, but it’s usually convenient to use the same folds that we use for stacking. Tuning the hyper parameters of the stacked model is where things get interesting. In practice most people (including myself) simply use Cross Validation + Grid Search using the same exact CV folds used to generate the Meta Features. There’s a subtle flaw to this approach – can you spot it?

Indeed, there’s a small bit of data leakage in our stacking CV procedure. Consider the 1st round of Cross Validation for the stacked model. We fit a model S to {fold2, fold3, fold4, fold5}, make predictions on fold1 and evaluate performance. But the meta features in {fold2, fold3, fold4, fold5} are dependent on the target values in fold1. So, the target values we’re trying to predict are themselves embedded into the features we’re using to fit our model. This is leakage and in theory S could deduce information about the target values from the meta features in a way that would cause it to overfit the training data and not generalize well to out-of-bag samples. However, you have to work hard to conjure up an example where this leakage is significant enough to cause the stacked model to overfit. In practice, everyone ignores this theoretical hole (and frankly I think most people are unaware it even exists!).

Stacking Model Selection and Features

How do you know what model to choose as the stacker and what features to include with the meta features? In my opinion, this is more of an art than a science. Your best bet is to try different things and familiarize yourself with what works and what doesn’t. Another question is, what (if any) other features should you include in for the stacking model in addition to the meta features? Again this is somewhat of an art. Looking at our example, it’s pretty evident that DistFromCenter plays a part in determining which model will perform well. The KNN appears to do better at classifying darts thrown near the center and the SVM model does better at classifying darts thrown away from the center. Let’s take a shot at stacking our models using Logistic Regression. We’ll use the base model predictions as meta features and DistFromCenter as an additional feature.

Sure enough the stacked model performs better than both of the base models – 75% CV accuracy and 86% test accuracy. Now let’s take a look at its classification regions overlaying the training data, just like we did with the base models.


The takeaway here is that the Logistic Regression Stacked Model captures the best aspects of each base model which is why it performs better than either base model in isolation.

Stacking in Practice

To wrap this up, let’s talk about how, when, and why you might use stacking in the real world. Personally, I mostly use stacking in machine learning competitions on Kaggle. In general, stacking produces small gains with a lot of added complexity – not worth it for most businesses. But Stacking is almost always fruitful so it’s almost always used in top Kaggle solutions. In fact, stacking is really effective on Kaggle when you have a team of people trying to collaborate on a model. A single set of folds is agreed upon and then every team member builds their own model(s) using those folds. Then each model can be combined using a single stacking script. This is great because it prevents team members from stepping on each others toes, awkwardly trying to stitch their ideas into the same code base.

One last bit. Suppose we have dataset with (user, product) pairs and we want to predict the probability that a user will purchase a given product if he/she is presented an ad with that product. An effective feature might be something like, using the training data, what percent of the products advertised to a user did he actually purchase in the past? So, for the sample (user1, productA) in the training data, we want to tack on a feature like UserPurchasePercentage but we have to be careful not to introduce leakage into the data. We do this as follows:

  1. Split the training data into folds
  2. For each test fold
    1. Identify the unique set of users in the test fold
    2. Use the remaining folds to calculate UserPurchasePercentage (percent of advertised products each user purchased)
    3. Map UserPurchasePercentage back to the training data via (fold id, user id)

Now we can use UserPurchasePercentage as a feature for our gradient boosting model (or whatever model we want). Effectively what we’ve just done is built a predictive model that predicts user_i will purchase product_x with probability based on the percent of advertised products he purchased in the past and used those predictions as a meta feature for our real model. This is a subtle but valid and effective form of stacking – one which I often do implement in practice and on Kaggle.



_blog_headshot-circleI’m Ben Gorman – math nerd and data science enthusiast based in the New Orleans area. I spent roughly five years as the Senior Data Analyst for Strategic Comp before starting GormAnalysis. I love talking about data science, so never hesitate to shoot me an email if you have questions: As of September 2016, I’m a Kaggle Master ranked in the top 1% of competitors world-wide.

Read the whole story
418 days ago
Share this story

The Parable Of The Talents

2 Comments and 5 Shares

[Content note: scrupulosity and self-esteem triggers, IQ, brief discussion of weight and dieting. Not good for growth mindset.] dieting]


I sometimes blog about research into IQ and human intelligence. I think most readers of this blog already know IQ is 50% to 80% heritable genetic , and that it’s so important for intellectual pursuits that eminent scientists in some fields have average IQs around 150 to 160 . Since (remember, since IQ this high only appears in 1/10,000 people or so, it beggars coincidence to believe this represents anything but a very strong filter for IQ (or something correlated with it) in reaching that level. If you saw a group of dozens of people who were 7’0 tall on average, you’d assume it was a basketball team or some other group selected for height, height or something correlated with it, not a bunch of botanists who were all very tall by coincidence. coincidence).

A lot of people find this pretty depressing. Some worry that taking it seriously might damage the “growth mindset” people need to fully actualize their potential. This is important and I want to discuss it eventually, but not now. What I want to discuss now is people who feel personally depressed. For example, a comment from last week:

I’m sorry to leave self a self absorbed comment, but reading this really upset me and I just need to get this off my chest…How is a person supposed to stay sane in a culture that prizes intelligence above everything else – especially if, as Scott suggests, Human Intelligence Really Is the Key to the Future – when they themselves are not particularly intelligent and, apparently, have no potential to ever become intelligent? Right now I basically feel like pond scum.

I hear these kinds of responses every so often, so I should probably learn to expect them. I never do. They seem to me precisely backwards. There’s a moral gulf here, and I want to throw stories and intuitions at it until enough of them pile up at the bottom to make a passable bridge. But first, a comparison:

Some people think body weight is biologically/genetically determined. Other people think it’s based purely on willpower – how strictly you diet, how much you can bring yourself to exercise. These people get into some pretty acrimonious debates.

Overweight people, and especially people who feel unfairly stigmatized for being overweight, tend to cluster on the biologically determined side. And although not all believers in complete voluntary control of weight are mean to fat people, the people who are mean to fat people pretty much all insist that weight is voluntary and easily changeable.

Although there’s a lot of debate over the science here, there seems to be broad agreement on both sides that the more compassionate, sympathetic, progressive position, the position promoted by the kind of people who are really worried about stigma and self-esteem, is that weight is biologically determined.

And the same is true of mental illness. Sometimes I see depressed patients whose families really don’t get it. They say “Sure, my daughter feels down, but she needs to realize that’s no excuse for shirking her responsibilities. She needs to just pick herself up and get on with her life.” On the other hand, most depressed people say that their depression is more fundamental than that, not a thing that can be overcome by willpower, certainly not a thing you can just ‘shake off’.

Once again, the compassionate/sympathetic/progressive side of the debate is that depression is something like biological, and cannot easily be overcome with willpower and hard work.

One more example of this pattern. There are frequent political debates in which conservatives (or straw conservatives) argue that financial success is the result of hard work, so poor people are just too lazy to get out of poverty. Then a liberal (or straw liberal) protests that hard work has nothing to do with it, success is determined by accidents of birth like who your parents are and what your skin color is et cetera, so the poor are blameless in their own predicament.

I’m oversimplifying things, but again the compassionate/sympathetic/progressive side of the debate – and the side endorsed by many of the poor themselves – is supposed to be that success is due to accidents of birth, and the less compassionate side is that success depends on hard work and perseverance and grit and willpower.

The obvious pattern is that attributing outcomes to things like genes, biology, and accidents of birth is kind and sympathetic. Attributing them to who works harder and who’s “really trying” can stigmatize people who end up with bad outcomes and is generally viewed as Not A Nice Thing To Do.

And the weird thing, the thing I’ve never understood, is that intellectual achievement is the one domain that breaks this pattern.

Here it’s would-be hard-headed conservatives arguing that intellectual greatness comes from genetics and the accidents of birth and demanding we “accept” this “unpleasant truth”.

And it’s would-be compassionate progressives who are insisting that no, it depends on who works harder, claiming anybody can be brilliant if they really try, warning us not to “stigmatize” the less intelligent as “genetically inferior”.

I can come up with a few explanations for the sudden switch, but none of them are very principled and none of them, to me, seem to break the fundamental symmetry of the situation. I choose to maintain consistency by preserving the belief that overweight people, depressed people, and poor people aren’t fully to blame for their situation – and neither are unintelligent people. It’s accidents of birth all the way down. Intelligence is mostly genetic and determined at birth – and we’ve already determined in every other sphere that “mostly genetic and determined at birth” means you don’t have to feel bad if you got the short end of the stick.

Consider for a moment Srinivasa Ramanujan, one of the greatest mathematicians of all time. He grew up in poverty in a one-room house in small-town rural India. He taught himself mathematics by borrowing books from local college students and working through the problems on his own until he reached the end of the solveable ones and had nowhere else to go but inventing ways to solve the unsolveable ones.

There are a lot of poor people in the United States today whose life circumstances prevented their parents from reading books to them as a child, prevented them from getting into the best schools, prevented them from attending college, et cetera. And pretty much all of those people still got more educational opportunities than Ramanujan did.

And from there we can go in one of two directions. First, we can say that a lot of intelligence is innate, that Ramanujan was a genius, and that we mortals cannot be expected to replicate his accomplishments.

Or second, we can say those poor people are just not trying hard enough.

Take “innate ability” out of the picture, and if you meet a poor person on the street begging for food, saying he never had a chance, your reply must be “Well, if you’d just borrowed a couple of math textbooks from the local library at age 12, you would have been a Fields Medalist by now. I hear that pays pretty well.”

The best reason not to say that is that we view Ramanujan as intellectually gifted. But the very phrase tells us where we should classify that belief. Ramanujan’s genius is a “gift” in much the same way your parents giving you a trust fund on your eighteenth birthday is a “gift”, and it should be weighted accordingly in the moral calculus.


I shouldn’t pretend I’m worried about this for the sake of the poor. I’m worried for me.

My last IQ-ish test was my SATs in high school. I got a perfect score in Verbal, and a good-but-not-great score in Math.

And in high school English, I got A++s in all my classes, Principal’s Gold Medals, 100%s on tests, first prize in various state-wide essay contests, etc. In Math, I just barely by the skin of my teeth scraped together a pass in Calculus with a C-.

Every time I won some kind of prize in English my parents would praise me and say I was good and should feel good. My teachers would hold me up as an example and say other kids should try to be more like me. Meanwhile, when I would bring home a report card with a C- in math, my parents would have concerned faces and tell me they were disappointed and I wasn’t living up to my potential and I needed to work harder et cetera.

And I don’t know which part bothered me more.

Every time I was held up as an example in English class, I wanted to crawl under a rock and die. I didn’t do it! I didn’t study at all, half the time I did the homework in the car on the way to school, those essays for the statewide competition were thrown together on a lark without a trace of real effort. To praise me for any of it seemed and still seems utterly unjust.

On the other hand, to this day I believe I deserve a fricking statue for getting a C- in Calculus I. It should be in the center of the schoolyard, and have a plaque saying something like “Scott Alexander, who by making a herculean effort managed to pass Calculus I, even though they kept throwing random things after the little curly S sign and pretending it made sense.”

And without some notion of innate ability, I don’t know what to do with this experience. I don’t want to have to accept the blame for being a lazy person who just didn’t try hard enough in Math. But I really don’t want to have to accept the credit for being a virtuous and studious English student who worked harder than his peers. I know there were people who worked harder than I did in English, who poured their heart and soul into that course – and who still got Cs and Ds. To deny innate ability is to devalue their efforts and sacrifice, while simultaneously giving me credit I don’t deserve.

Meanwhile, there were some students who did better than I did in Math with seemingly zero effort. I didn’t begrudge those students. But if they’d started trying to say they had exactly the same level of innate ability as I did, and the only difference was they were trying while I was slacking off, then I sure as hell would have begrudged them. Especially if I knew they were lazing around on the beach while I was poring over a textbook.

I tend to think of social norms as contracts bargained between different groups. In the case of attitudes towards intelligence, those two groups are smart people and dumb people. Since I was both at once, I got to make the bargain with myself, which simplified the bargaining process immensely. The deal I came up with was that I wasn’t going to beat myself up over the areas I was bad at, but I also didn’t get to become too cocky about the areas I was good at. It was all genetic luck of the draw either way. In the meantime, I would try to press as hard as I could to exploit my strengths and cover up my deficiencies. So far I’ve found this to be a really healthy way of treating myself, and it’s the way I try to treat others as well.


The theme continues to be “Scott Relives His Childhood Inadequacies”. So:

When I was 6 and my brother was 4, our mom decided that as an Overachieving Jewish Mother she was contractually obligated to make both of us learn to play piano. She enrolled me in a Yamaha introductory piano class, and my younger brother in a Yamaha ‘cute little kids bang on the keyboard’ class.

A little while later, I noticed that my brother was now with me in my Introductory Piano class.

A little while later, I noticed that my brother was now by far the best student in my Introductory Piano Class, even though he had just started and was two or three years younger than anyone else there.

A little while later, Yamaha USA flew him to Japan to show him off before the Yamaha corporate honchos there.

Well, one thing led to another, and right now if you Google my brother’s name you get a bunch of articles like this one:

The evidence that Jeremy [Alexander] is among the top jazz pianists of his generation is quickly becoming overwhelming: at age 26, Alexander is the winner of the Nottingham International Jazz Piano Competition, a second-place finisher in the Montreux Jazz Festival Solo Piano Competition, a two-time finalist for the American Pianist Association’s Cole Porter Fellowship, and a two-time second-place finisher at the Phillips Jazz Competition. Alexander, who was recently named a Professor of Piano at Western Michigan University’s School of Music, made a sold-out solo debut at Carnegie Hall in 2012, performing Debussy’s Etudes in the first half and jazz improvisations in the second half.

Meanwhile, I was always a mediocre student at Yamaha. When the time came to try an instrument in elementary school, I went with the violin to see if maybe I’d find it more to my tastes than the piano. I was quickly sorted into the remedial class because I couldn’t figure out how to make my instrument stop sounding like a wounded cat. After a year or so of this, I decided to switch to fulfilling my music requirement through a choir, and everyone who’d had to listen to me breathed a sigh of relief.

Every so often I wonder if somewhere deep inside me there is the potential to be “among the top musicians of my generation.” I try to recollect whether my brother practiced harder than I did. My memories are hazy, but I don’t think he practiced much harder until well after his career as a child prodigy had taken off. The cycle seemed to be that every time he practiced, things came fluidly to him and he would produce beautiful music and everyone would be amazed. And this must have felt great, and incentivized him to practice more, and that made him even better, so that the beautiful music came even more fluidly, and the praise became more effusive, until eventually he chose a full-time career in music and became amazing. Meanwhile, when I started practicing it always sounded like wounded cats, and I would get very cautious praise like “Good job, Scott, it sounded like that cat was hurt a little less badly than usual,” and it made me frustrated, and want to practice less, which made me even worse, until eventually I quit in disgust.

On the other hand, I know people who want to get good at writing, and make a mighty resolution to write two hundred words a day every day, and then after the first week they find it’s too annoying and give up. These people think I’m amazing, and why shouldn’t they? I’ve written a few hundred to a few thousand words pretty much every day for the past ten years.

But as I’ve said before, this has taken exactly zero willpower. It’s more that I can’t stop even if I want to. Part of that is probably that when I write, I feel really good about having expressed exactly what it was I meant to say. Lots of people read it, they comment, they praise me, I feel good, I’m encouraged to keep writing, and it’s exactly the same virtuous cycle as my brother got from his piano practice.

And so I think it would be too easy to say something like “There’s no innate component at all. Your brother practiced piano really hard but almost never writes. You write all the time, but wimped out of practicing piano. So what do you expect? You both got what you deserved.”

I tried to practice piano as hard as he did. I really tried. But every moment was a struggle. I could keep it up for a while, and then we’d go on vacation, and there’d be no piano easily available, and I would be breathing a sigh of relief at having a ready-made excuse, and he’d be heading off to look for a piano somewhere to practice on. Meanwhile, I am writing this post in short breaks between running around hospital corridors responding to psychiatric emergencies, and there’s probably someone very impressed with that, someone saying “But you had such a great excuse to get out of your writing practice!”

I dunno. But I don’t think of myself as working hard at any of the things I am good at, in the sense of “exerting vast willpower to force myself kicking and screaming to do them”. It’s possible I do work hard, and that an outside observer would accuse me of eliding how hard I work, but it’s not a conscious elision and I don’t feel that way from the inside.

Ramanujan worked very hard at math. But I don’t think he thought of it as work. He obtained a scholarship to the local college, but dropped out almost immediately because he couldn’t make himself study any subject other than math. Then he got accepted to another college, and dropped out again because they made him study non-mathematical subjects and he failed a physiology class. Then he nearly starved to death because he had no money and no scholarship. To me, this doesn’t sound like a person who just happens to be very hard-working; if he had the ability to study other subjects he would have, for no reason other than that it would have allowed him to stay in college so he could keep studying math. It seems to me that in some sense Ramanujan was incapable of putting hard work into non-math subjects.

I really wanted to learn math and failed, but I did graduate with honors from medical school. Ramanujan really wanted to learn physiology and failed, but he did become one of history’s great mathematicians. So which one of us was the hard worker?

People used to ask me for writing advice. And I, in all earnestness, would say “Just transcribe your thoughts onto paper exactly like they sound in your head.” It turns out that doesn’t work for other people. Maybe it doesn’t work for me either, and it just feels like it does.

But you know what? When asked about one of his most famous discoveries, a method of simplifying a very difficult problem to a continued fraction, Ramanujan described his thought process as: “It is simple. The minute I heard the problem, I knew that the answer was a continued fraction. ‘Which continued fraction?’ I asked myself. Then the answer came to my mind”.

And again, maybe that’s just how it feels to him, and the real answer is “study math so hard that you flunk out of college twice, and eventually you develop so much intuition that you can solve problems without thinking about them.”

(or maybe the real answer is “have dreams where obscure Hindu gods appear to you as drops of blood and reveal mathematical formulae”. Ramanujan was weird).

But I still feel like there’s something going on here where the solution to me being bad at math and piano isn’t just “sweat blood and push through your brain’s aversion to these subjects until you make it stick”. When I read biographies of Ramanujan and other famous mathematicians, there’s no sense that they ever had to to do that with math. When I talk to my brother, I never get a sense that he had to do that with piano. And if I am good enough at writing to qualify to have an opinion on being good at things, then I don’t feel like I ever went through that process myself.

So this too is part of my deal with myself. I’ll try to do my best at things, but if there’s something I really hate, something where I have to go uphill every step of the way, then it’s okay to admit mediocrity. I won’t beat myself up for not forcing myself kicking and screaming to practice piano. And in return I won’t become too cocky about practicing writing a lot. It’s probably some kind of luck of the draw either way.


I said before that this wasn’t just about poor people, it was about me being selfishly worried for my own sake. I think I might have given the mistaken impression that I merely need to justify to myself why I can’t get an A in math or play the piano. But it’s much worse than that.

The rationalist community tends to get a lot of high-scrupulosity people, people who tend to beat themselves up for not doing more than they are. It’s why I push giving 10% to charity, not as some kind of amazing stretch goal that we need to guilt people into doing, but as a crutch, a sort of “don’t worry, you’re still okay if you only give ten percent”. It’s why there’s so much emphasis on “heroic responsibility” and how you, yes you, have to solve all the world’s problems personally. It’s why I see red when anyone accuses us of entitlement, since it goes about as well as calling an anorexic person fat.

And we really aren’t doing ourselves any favors. For example, Nick Bostrom writes:

Searching for a cure for aging is not just a nice thing that we should perhaps one day get around to. It is an urgent, screaming moral imperative. The sooner we start a focused research program, the sooner we will get results. It matters if we get the cure in 25 years rather than in 24 years: a population greater than that of Canada would die as a result.

If that bothers you, you definitely shouldn’t read Astronomical Waste.

Yet here I am, not doing anti-aging research. Why not?

Because I tried doing biology research a few times and it was really hard and made me miserable. You know how in every science class, when the teacher says “Okay, pour the white chemical into the grey chemical, and notice how it turns green and begins to bubble,” there’s always one student who pours the white chemical into the grey chemical, and it just forms a greyish-white mixture and sits there? That was me. I hated it, I didn’t have the dexterity or the precision of mind to do it well, and when I finally finished my required experimental science classes I was happy never to think about it again. Even the abstract intellectual part of it – the one where you go through data about genes and ligands and receptors in supercentenarians and shake it until data comes out – requires exactly the kind of math skills that I don’t have.

Insofar as this is a matter of innate aptitude – some people are cut out for biology research and I’m not one of them – all is well, and my decision to get a job I’m good at instead is entirely justified.

But insofar as there’s no such thing as innate aptitude, just hard work and grit – then by not being gritty enough, I’m a monster who’s complicit in the death of a population greater than that of Canada.

Insofar as there’s no such thing as innate aptitude, I have no excuse for not being Aubrey de Grey. Or if Aubrey de Grey doesn’t impress you much, Norman Borlaug. Or if you don’t know who either of those two people are, Elon Musk.

I once heard a friend, upon his first use of modafinil, wonder aloud if the way they felt on that stimulant was the way Elon Musk felt all the time. That tied a lot of things together for me, gave me an intuitive understanding of what it might “feel like from the inside” to be Elon Musk. And it gave me a good tool to discuss biological variation with. Most of us agree that people on stimulants can perform in ways it’s difficult for people off stimulants to match. Most of us agree that there’s nothing magical about stimulants, just changes to the levels of dopamine, histamine, norepinephrine et cetera in the brain. And most of us agree there’s a lot of natural variation in these chemicals anyone. So “me on stimulants is that guy’s normal” seems like a good way of cutting through some of the philosophical difficulties around this issue.

…which is all kind of a big tangent. The point I want to make is that for me, what’s at stake in talking about natural variations in ability isn’t just whether I have to feel like a failure for not getting an A in high school calculus, or not being as good at music as my brother. It’s whether I’m a failure for not being Elon Musk. Specifically, it’s whether I can say “No, I’m really not cut out to be Elon Musk” and go do something else I’m better at without worrying that I’m killing everyone in Canada.


The proverb says: “Everyone has somebody better off than they are and somebody worse off than they are, with two exceptions.” When we accept that with one exception we’re all in the this “not Elon Musk” boat together (with one exception) together, a lot of the status games around innate ability start to seem less important.

Every so often an overly kind commenter here praises my intelligence and says they feel intellectually inadequate compared to me, that they wish they could be at my level. But at my level, I spend my time feeling intellectually inadequate compared to Scott Aaronson. Scott Aaronson describes feeling “in awe” of Terence Tao and frequently struggling to understand him. Terence Tao – well, I don’t know if he’s religious, but maybe he feels intellectually inadequate compared to God. And God feels intellectually inadequate compared to Johann von Neumann.

So there’s not much point in me feeling inadequate compared to my brother, because even if I was as good at music as my brother, I’d probably just feel inadequate for not being Mozart.

And asking “Well what if you just worked harder?” can elide small distinctions, but not bigger ones. If my only goal is short-term preservation of my self-esteem, I can imagine that if only things had gone a little differently I could have practiced more and ended up as talented as my brother. It’s a lot harder for me to imagine the course of events where I do something different and become Mozart. Only one in a billion people reach a Mozart level of achievement; why would it be me?

If I loved music for its own sake and wanted to be a talented musician so I could express the melodies dancing within my heart, then none of this matters. But insofar as I want to be good at music because I feel bad that other people are better than me at music, that’s a road without an end.

This is also how I feel of when some people on this blog complain they feel dumb for not being as smart as some of the other commenters on this blog.

I happen to have all of your IQ scores in a spreadsheet right here (remember that survey you took?). Not a single person is below the population average. The first percentile for IQ here – the one such that 1% of respondents are lower and 99% of respondents are higher – is – corresponds to the 85th percentile of the general population. So even if you’re in the first percentile here, you’re still pretty high up in the broader scheme of things.

At that point we’re back on the road without end. I am pretty sure we can raise your IQ as much as you want and you will still feel like pond scum. If we raise it twenty points, you’ll try reading Quantum Computing since Democritus Since Democritus and feel like pond scum. If we raise it forty, you’ll just go to Terence Tao’s blog and feel like pond scum there. Maybe if you were literally the highest-IQ person in the entire world you would feel good about yourself, but any system where only one person in the world is allowed to feel good about themselves at a time is a bad system.

People say we should stop talking about ability differences so that stupid people don’t feel bad. I say that there’s more than enough room for everybody to feel bad, smart and stupid alike, and not talking about it won’t help. What will help is fundamentally uncoupling perception of intelligence from perception of self-worth.

I work with psychiatric patients who tend to have cognitive difficulties. Starting out in the Detroit ghetto doesn’t do them any favors, and then they get conditions like bipolar disorder and schizophrenia that actively lower IQ for poorly understood neurological reasons.

The standard psychiatric evaluation includes an assessment of cognitive ability; the one I use is a quick test with three questions. The questions are – “What is 100 minus 7?”, “What do an apple and an orange have in common?”, and “Remember these three words for one minute, then repeat them back to me: house, blue, and tulip”.

There are a lot of people – and I don’t mean floridly psychotic people who don’t know their own name, I mean ordinary reasonable people just like you and me – who can’t answer these questions. And we know why they can’t answer these questions, and it is pretty darned biological.

And if our answer to “I feel dumb and worthless because my IQ isn’t high enough” is “don’t worry, you’re not worthless, I’m sure you can be a great scientist if you just try hard enough”, then we are implicitly throwing under the bus all of these people who are definitely not going to be great scientists no matter how hard they try. Talking about trying harder can obfuscate the little differences, but once we’re talking about the homeless schizophrenic guy from Detroit who can’t tell me 100 minus 7 to save his life, you can’t just magic the problem away with a wave of your hand and say “I’m sure he can be the next Ramanujan if he keeps a positive attitude!” You either need to condemn him as worthless or else stop fricking tying worth to innate intellectual ability .

This is getting pretty close to what I was talking about in my post on burdens. When I get a suicidal patient who thinks they’re a burden on society, it’s nice to be able to point out ten important things they’ve done for society recently and prove them wrong. But sometimes it’s not that easy, and the only thing you can say is “f#@k “f#@ck that s#!t”. Yes, society has organized itself in a way that excludes and impoverishes a bunch of people who could have been perfectly happy in the state of nature picking berries and hunting aurochs. It’s not your fault, and if they’re going to give you compensation you take it. And we had better make this perfectly clear now, so that when everything becomes automated and run by robots and we’re all behind the curve, everybody agrees that us continuing to exist is still okay.

Likewise with intellectual ability. When someone feels sad because they can’t be a great scientist, it is nice to be able to point out all of their intellectual strengths and tell them “Yes you can, if only you put your mind to it!” But this is often not true. At that point you have to say “f@#k “fuck it” and tell them to stop tying their self-worth to being a great scientist. And we had better establish that now, before transhumanists succeed in creating superintelligence and we all have to come to terms with our intellectual inferiority.


But I think the situation can also be somewhat rosier than that.

Ozy once told me that the law of comparative advantage was one of the most inspirational things they had ever read. This was sufficiently strange that I demanded an explanation.

Ozy said that it proves everyone can contribute. Even if you are worse than everyone else at everything, you can still participate in global trade and other people will pay you money. It may not be very much money, but it will be some, and it will be a measure of how your actions are making other people better off and they are grateful for your existence.

(in real life this doesn’t work for a couple of reasons, most notably the minimum wage, but who cares about real life when we have a theory?)

After some thought, I was also inspired by this.

I’m never going to be a great mathematician or Elon Musk. But if I pursue my comparative advantage, which right now is medicine, I can still make money. And if I feel like it, I can donate it to mathematics research. Or anti-aging research. Or the same people Elon Musk donates his money to. They will use it to hire smart people with important talents that I lack, and I will be at least partially responsible for those people’s successes.

If I had an IQ of 70, I think I would still want to pursue my comparative advantage – even if that was ditch-digging, or whatever, and donate that money to important causes. It might not be very much money, but it would be some.

Our modern word “talent” comes from the Greek word talenton, a certain amount of precious metal sometimes used as a denomination of money. The etymology passes through a parable of Jesus’. A master calls three servants to him and gives the first five talents, the second two talents, and the third one talent. The first two servants invest the money and double it. The third literally buries it in a hole. The master comes back later and praises the first two servants, but sends the third servant to Hell (metaphor? what metaphor?). metaphor?)

Various people have come up with various interpretations, but the most popular says that God gives all of us different amounts of resources, and He will judge us based on how well we use these resources rather than on how many He gave us. It would be stupid to give your first servant five loads of silver, then your second servant two loads of silver, then immediately start chewing out the second servant for having less silver than the first one. And if both servants invested their silver wisely, it would be silly to chew out the second one for ending up with less profit when he started with less seed capital. The moral seems to be that if you take what God gives you and use it wisely, you’re fine.

The modern word “talent” comes from this parable. It implies “a thing God has given you which you can invest and give back”.

So if I were a ditch-digger, I think I would dig ditches, donate a portion of the small amount I made, and trust that I had done what I could with the talents I was given.


The Jews also talk about how God judges you for your gifts. Rabbi Zusya once said that when he died, he wasn’t worried that God would ask him “Why weren’t you Moses?” or “Why weren’t you Solomon?” But he did worry that God might ask “Why weren’t you Rabbi Zusya?”

And this is part of why it’s important for me to believe in innate ability, and especially differences in innate ability. If everything comes down to hard work and positive attitude, then God has every right to ask me “Why weren’t you Srinivasa Ramanujan?” or “Why weren’t you Elon Musk?”

If everyone is legitimately a different person with a different brain and different talents and abilities, then all God gets to ask me is whether or not I was Scott Alexander.

This seems like a gratifyingly low bar.

[more to come on this subject later]

Read the whole story
1139 days ago
Share this story
1 public comment
1140 days ago
Nice essay on genetic gifts
Princeton, NJ or NYC
Next Page of Stories