Solidity anti-patterns: Fun with inheritance DAG abuse
It’s hot in Tokyo, but the air conditioner is doing its best. I sit in a friend’s tiny studio apartment. Tired, overworked, behind on deadlines. It’s a feeling that most in the Ethereum space know well. I promised myself that this was the week I wouldn’t work, but that wasn’t happening. The 1st Underhanded Solidity Coding Contest is coming to a close, and after all I had an idea. Months earlier, during a contract review for a client, I had an idea of a new form of antipattern that I felt was subtle, devious, and hard enough to notice that it might even make a wave in Ethereum at some point. I’d seen production, open source contracts publicly released on Github where the pattern was present in full, and often not exploitable by what I can only imagine is chance. But I’m an academic by trade, and there wasn’t quite enough here for a paper. So what better place for this exploit than the USC!
So I sat down for a few hours of hacking. You can see my full entry here, and read about the winners of the contest here. I highly recommend that if you’re serious about smart contract security, you read all the entries carefully. For my part, over the coming weeks until Devcon, I will read, categorize, and process every single entry. The ones I feel are unique or representative, I’ll upload to Hack This Contract. Hopefully a new generation of auditors can learn from all the wonderful entries and our collective efforts.
The Solidity inheritance DAG
The key to understanding this post is fully understanding inheritance in Solidity. Solidity as a language supports multiple inheritance, which is an incredibly controversial feature in language design. The associated documentation has a nice summary of inheritance in Solidity. Specifically, Solidity uses something called C3 linearization. If you are a Solidity developer who’s never heard of “C3 linearization” and is wondering what that means to you, I highly suggest you do not pass go before making yourself fully familiar with what this means. Another important piece of our entry is the interactions of Solidity’s dynamic dispatch, explained in this same documentation, with the C3 linearization of our inheritance graph produced when our entry is compiled.
To understand what this means for our discussion, consider a hypothetical crowdsale for MDT, “the Merde Token”. Let’s say this token is intended to have a whitelist pool of preferred investors able to buy in the pre-sale (before the first block), along with a hard cap of the number of MDT tokens that can be distributed. Consider a base Crowdsale library class, which handles most of the functionality of the crowdsale. Consider then a “CappedCrowdsale is Crowdsale” class, that adds cap functionality, and a “WhitelistedCrowdsale is Crowdsale” class, that allows whitelisted investors to invest early in the crowdsale.
Our MDTCrowdsale class now has two choices given to it by multiple inheritance, to achieve its goal of combining a cap with a whitelist. It can either express its intent as “MDTCrowdsale is CappedCrowdsale, WhitelistedCrowdsale” or as “MDTCrowdsale is WhitelistedCrowdsale, CappedCrowdsale”. Can you spot the antipattern yet? (hint: C3 linearization)
The Contest Entry Explored
You can read, run, and explore my contest entry here. Unfortunately I am clearly bestowed with a complete inability to read and follow rules, and I miss the instructions of including a spoiler or full solution in my README. Derp. So without further ado, here is my spoiler.
What we have in this entry is an MDTCrowdsale laid out as described above, and specified as “MDTCrowdsale is CappedCrowdsale, WhitelistedCrowdsale”. The entire contract is 16 lines, and I encourage you to read it.
Here is the inheritance diagram for our crowdsale contract:
and here is its linearization:
To find out whether a user can buy into the crowdsale, the base crowdsale class Crowdsale.sol (from which both Whitelisted and Capped crowdsales inherit) calls the validPurchase method, which expresses valid purchase conditions as a boolean clause. Because of the dynamic dispatch in Solidity, this means that at runtime, the contract will check whether a buyer is eligible starting from the most specific / derived class.
So MDTCrowdsale is checked for a validPurchase method, which it does not have. Next, WhitelistedCrowdsale is checked: bingo, WhitelistedCrowdsale indeed implements the validPurchase method. The first thing this validPurchase does is to call super.validPurchase. Perhaps the developer of WhitelistedCrowdsale expects this to call the stated superclass Crowdsale’s validPurchase method, but because of C3 linearization the CappedCrowdsale method is called instead. Again super.validPurchase is called, this time invoking a call to the Crowdsale validPurchase method.
Confused yet? The overall result of calling validPurchase is this (the first call is on the bottom):
The arrows between super and the superclass method that is called represent the result of looking up the super class method over the inheritance DAG produced by C3 linearization. Notice that even though WhitelistedCrowdsale inherits from Crowdsale directly, this linearization means that its superclass is actually CappedCrowdsale. This causes a problem though: WhitelistedCrowdsale is developed, tested, and conceptualized to work with Crowdsale, not with CappedCrowdsale.
Let’s take a look at the full boolean expression that results from all the validPurchase calls above, on the right side of the diagram. It looks something like ((withinPeriod && nonZeroPurchase) && withinCap) || (whitelist[msg.sender] && !hasEnded()). See the issue here? Any whitelisted investor can suddenly violate the cap at will! The interaction between this strange inheritance structure and C3 linearization make our contract very easily vulnerable to the total destruction of its stated invariants by exposing a top-level disjunction that the author did not intend.
The contest version of the code has one more gotcha. Ostensibly, our clause above makes sure the crowdsale hasn’t ended if an investor is whitelisted. Dynamic dispatch means the most specific instance of hasEnded is used, in this case in CappedCrowdsale. This method returns true when the capReached variable is set, ending the crowdsale as soon as the cap is reached. But in the first transaction violating the cap, capReached will be false, and hasEnded won’t notice that the cap is being exceeded. This cap-busting transaction will however end the crowdsale and prevent all future buy-ins.
What’s worse than how confusing the above is to reason about? No amount of testing would likely catch this issue. You can write a test suite with full statement and decision coverage and cover the code in both Capped and Whitelisted crowdsale, testing these features independently like any good unit tester. Full coverage achieved, but without necessarily testing the case where an investor who is on the whitelist intentionally violates the cap.
As for deniability? This exploit is 100% deniable by a malicious developer, and requires introducing no additional complexity to the target contract that may raise red flags in the event of a public exploit. The developer can just claim to have screwed up the import order, and tested thoroughly but not thoroughly enough. There is absolutely no way to tell even intuitively whether this is true, and whether the exploit is deliberately or maliciously installed.
A relatively recent movement in Solidity library development advocates for inheritance as the cornerstone of reusable, secure software libraries, using these patterns in widely used core libraries that secure billions in assets and contracts. This heavy focus on inheritance makes our anti-pattern more likely to show up in practice, as newer smart contract developers are encouraged to create what are often amalgamations of libraries created independently by different teams, and whose interactions are unclear.
In this case, the fix is simple: simply swapping the import order of the Capped and Whitelisted crowdsale contracts in our source file solved the problem. Simply change “MDTCrowdsale is CappedCrowdsale, WhitelistedCrowdsale” to “MDTCrowdsale is WhitelistedCrowdsale, CappedCrowdsale”.
The C3 linearization now looks like this:
while the corresponding call to validPurchase now looks like:
Now, the clause appropriately enforces both the cap and the whitelist, making sure the cap is checked at the outermost level and always enforced regardless of the whitelist clause outcome.
An easy fix!
Fortunately, all the public production contracts I’ve examined on Github have done this right. Whether through intentional developer reasoning or luck, it’s hard to say. That being said, I still believe this pattern is a serious problem: even in working code, later changes that adds inheritance-based libraries developed separately may introduce unforeseen or surprising side effects. Naive programmers may not understand the importance of ordering their import statements according to their ideal vision of what the inheritance DAG for a contract should look like. Auditors may miss these subtleties as well, especially if the code both works and has been tested thoroughly.
It’s worth noting that shadowing / inheritance based antipatterns have been noted in Solidity before, including in this retrospective on the Underhanded contest and this issue on Github. That being said, I do not think their analyses fully cover this particular issue, in which the shadowing is both intentional and encouraged for reusable libraries offering extensible behavior. The stated solution (suggested on September 18, after the close of the contest) of making function references explicit may somewhat force programmers of contracts like our crowdsale to reason about inheritance order without relying on default linearization behavior, but even refactoring our example to refer to inherited methods explicitly may not have made the clause reordering issue in this example entirely clear.
Furthermore, previous comments on the issue, such as one which reflected “I assume this issue is about a warning in situations where functions are overwritten through multiple inheritance where there is no common base contract that also has the function”, do not entirely capture our situation in which there is a common base contract that also has the function and the issues are introduced by linearization order rather than simple shadowing.
So, I believe multiple overrides of the a function in complex inheritance hierarchies could potentially interact in tricky ways. Inheritance hierarchies aren’t generally something programmers pay deep attention too, and when they do it’s generally with an eye on functionality rather than security. In smart contracts, these inheritance patterns can shadow functions, reorder clauses in boolean expressions arbitrarily, or otherwise confuse programmers into writing exploitable contracts. Making this problem worse, testing is not sufficient to uncover such vulnerabilities, as in my example. And these vulnerabilities are fully deniable, because they’re tricky to reason about, tricky to test for, and simple to muck up.
Overall, I think it would be a great addition to the Solidity compiler to add warnings whenever classes on the same level in the inheritance hierarchy override the same method. I also think encouraging modular library development through inheritance has clear and severe pitfalls, especially when novice smart contract developers can compose these libraries arbitrarily and easily shadow behavior that, in their mind, was both clearly tested for and included in the contract. As a community, I also believe that we need to start treating inheritance as security critical in complex contracts, with a general awareness that there be dragons here.
The solution to this problem is not a simple one, and requires deep thinking on behalf of Ethereum’s language design and security community. I encourage you all to mull it over, and let me know what you think in the comments here! (or preferably on reddit when I inevitably submit this)
It is worth noting one solution proposed by Vitalik Buterin in his implementation of Viper, which is to remove inheritance as a language feature entirely and sidestep the debate between single and multiple inheritance. With the stated goal of making a contract’s behavior entirely contained within a single file, Viper takes a very different approach to reusable libraries by encouraging their development as separate intercommunicating contracts, not a single contract combining multiple functionalities with inheritance. There are certainly tradeoffs represented by this design decision, though it does completely avoid all the antipatterns leveraged in the creation of my entry.
Fundamentally, an open question remains: is it possible to encourage (dare I say force?) a developer to understand the full space of behaviors in a contract while also encouraging them to use reusable libraries that may themselves interact with each other, may be developed by different teams or individual developers, and may be integrated at different phases in a contract’s development lifecycle? The answer remains unclear.
I’d also like to take one last opportunity to thank everyone in the space who supported the Underhanded Solidity Contest. Nick Johnson, Christian Reitwiessner, Matthew Di Ferrante, Raine Revere, Reto Trinkler, Yudi Levi, Melonport, iExec, and of course the Ethereum Foundation, you are the real MVPs. Thank you. And to all the fabulous entrants to this contest: it’s a continued honor to play in our shared sandbox.
Keep hacking, keep breaking, and let’s make secure smart contracts happen for everyone.