Diaspora* version 0.7.0.0 released

August 19, 2017, 11:35 pm

≫ Next: Generating a Useful Theory of Software Engineering [pdf]

≪ Previous: Diaspora federation protocol 0.2.0 paves the way for account migration

August 27th marks five years since diaspora*, the open, privacy-oriented social network, was placed into the hands of its community by its founders. One year ago the community released diaspora* version 0.6, the result of a huge effort of refactoring the old code to make it perform better, as well as redesigning diaspora*'s interface and introducing new features. One year later, we are proud to announce the release of diaspora* version 0.7. Since the last major release, 28 contributors have added 28675 lines of code and removed 20019 lines, which marks the release of diaspora* version 0.7 as one of the biggest versions diaspora*’s community has ever released.

diaspora* version 0.7.0.0 released

Our latest release contains some important changes, particularly ‘under the hood’.

It is now possible to mention people in comments as well as in posts – a long-awaited feature.
The markdown editor, with previews, is available on comments and conversations, bringing them into line with the publisher and making it a lot easier to add formatting, links and images to comments and conversations.
This markdown editor is now also available on the mobile version of diaspora*, for posts, comments and conversations!
It includes Federation v0.2.0, which is amazing enough that it got its own blog post.
And, last but certainly not least, this new release will include the first of the two steps towards a full account migration feature!

Version 0.2.0 of our federation protocol, created by Benjamin Neff (SuperTux88) with help from Senya (cmrd-senya), has started the process of including new functionality. It also provides underlying support for secure and reliable account migration.

Important reminder to podmins: This new federation protocol is incompatible with versions of diaspora* older than 0.6.3.0. If you are still running an earlier version, your server will no longer be able to fully communicate with servers running the latest software.

Senya has also been hard at work creating the first stages of the much-needed account migration feature! With the release of this version, it will be possible to fully export your account data which will become importable in a future diaspora* release. We also started working on implementing federation methods to enable pods to correctly handle account migrations. The next step will be to create this secure account importing, which can be introduced once the majority of pods in our network have updated to version 0.7. These steps cannot be introduced in the same release as the network first needs to upgrade so when the first users start to import their archives, a maximum of pods will be able to understand the migration message.

Since last year's launch of version 0.6.0.0, we achieved a pretty impressive list of changes!

Additions and enhancements since version 0.6.0.0:

Automatically pull new notifications every 5 minutes
Add a user setting for default post visibility
Use guid instead of id in permalinks and in single-post view
Links to streams of posts I have liked or commented on
Access to “My aspects” and “Followed tags” pages on mobile
Improve color themes and add a “Dark” color theme
Enable collapsing of notification threads in your mail client
OpenGraph video support
Improve error handling on mobile
Admin pages for mobile users
NodeInfo 2.0
Stop communication with pods that have been offline for an extended period of time
Support for an optional Content-Security-Policy header
Support for Liberapay donations
Community guidelines
Links to our Discourse forum

Here’s a quick round-up of the major changes coming your way in version 0.7:

Interface

Mentions in comments
Improve Mentions: display @ before mentions; simplify mentions in the publisher
Internationalization for color themes
Refactoring single-post view interactions
Update help pages
Simplified publisher preview
Add markdown editor for comments and conversations
Support cmd+enter to submit posts, comments and conversations

Account migration

Update the user data export archive format
Reset stuck exports and handle errors
Add support for receiving account migrations

Federation

Switch to new federation protocol
Fix order of comments across pods

Mobile

Always link comment count text on mobile
Include count in mobile post action links
Support direct links to comments on mobile
Improve responsive header in desktop version
Add markdown editor for posts, comments and conversations on mobile
Mark as "Mobile Web App Capable" on Android

Internals

Upgrade to jQuery 3 and Rails 5.1
Send public profiles publicly
Change sender for mails
Add some missing indexes and clean up the data base if needed
Improve stream when ignoring a person who posts a lot of tagged posts

Update instructions are available as usual in the wiki. For those of you who have been testing the release candidate, run git checkout master before the update to get back to the stable release branch.

↧

Generating a Useful Theory of Software Engineering [pdf]

August 19, 2017, 10:55 am

≫ Next: Ethereum Proof of Stake FAQ

≪ Previous: Diaspora* version 0.7.0.0 released

Download PDF

↧

Ethereum Proof of Stake FAQ

August 19, 2017, 11:40 am

≫ Next: Wikipedia Vandal Early Detection: From User Behavior to User Embedding [pdf]

≪ Previous: Generating a Useful Theory of Software Engineering [pdf]

What is Proof of Stake

Proof of stake is a category of consensus algorithms for public blockchains, which Ethereum's upcoming Casper algorithm is a part of. It serves a similar function to the proof of work which underpins the security behind Bitcoin, the current version of Ethereum and many other blockchains, but has significant advantages in terms of security and energy efficiency.

In general, a proof of stake algorithm looks as follows. The blockchain keeps track of a set of validators, and anyone who holds the blockchain's base cryptocurrency (in Ethereum's case, ether) can become a validator by sending a special type of transaction that locks up their ether into a deposit. The process of creating and agreeing to new blocks is then done through a consensus algorithm that all current validators can participate in.

There are many kinds of consensus algorithms, and many ways to assign rewards to validators who participate in the consensus algorithm, so there are many "flavors" of proof of stake. From an algorithmic perspective, there are two major types: chain-based proof of stake and BFT-style proof of stake.

In chain-based proof of stake, the algorithm pseudo-randomly selects a validator during each time slot (eg. every period of 10 seconds might be a time slot), and assigns that validator the right to create a single block, and this block must point to some previous block (normally the block at the end of the previously longest chain), and so over time most blocks converge into a single constantly growing chain.

In BFT-style proof of stake, validators are randomly assigned the right to propose blocks, but agreeing on which block is canonical is done through a multi-round process where every validator sends a "vote" for some specific block during each round, and at the end of the process all (honest and online) validators permanently agree on whether or not any given block is part of the chain. Note that blocks may still be chained together; the key difference is that consensus on a block can come within one block, and does not depend on the length or size of the chain after it.

What are the benefits of proof of stake as opposed to proof of work?

See A Proof of Stake Design Philosophy for a more long-form argument.

In short:

No need to consume large quantities of electricity in order to secure a blockchain (eg. it's estimated that both Bitcoin and Ethereum burn over $1 million worth of electricity and hardware costs per day as part of their consensus mechanism).
Because of the lack of high electricity consumption, there is not as much need to issue as many new coins in order to motivate participants to keep participating in the network. It may theoretically even be possible to have negative net issuance, where a portion of transaction fees is "burned" and so the supply goes down over time.
Proof of stake opens the door to a wider array of techniques that use game-theoretic mechanism design in order to better discourage centralized cartels from forming and, if they do form, from acting in ways that are harmful to the network (eg. like selfish mining in proof of work).
Reduced centralization risks, as economies of scale are much less of an issue. $10 million of coins will get you exactly 10 times higher returns than $1 million of coins, without any additional disproportionate gains because at the higher level you can afford better mass-production equipment.
Ability to use economic penalties to make various forms of 51% attacks vastly more expensive to carry out than proof of work - to paraphrase Vlad Zamfir, "it's as though your ASIC farm burned down if you participated in a 51% attack".

How does proof of stake fit into traditional Byzantine fault tolerance research?

There are several fundamental results from Byzantine fault tolerance research that apply to all consensus algorithms, including traditional consensus algorithms like PBFT but also any proof of stake algorithm and, with the appropriate mathematical modeling, proof of work.

The key results include:

CAP theorem - "in the cases that a network partition takes place, you have to choose either consistency or availability, you cannot have both". The intuitive argument is simple: if the network splits in half, and in one half I send a transaction "send my 10 coins to A" and in the other I send a transaction "send my 10 coins to B", then either the system is unavailable, as one or both transactions will not be processed, or it becomes inconsistent, as one half of the network will see the first transaction completed and the other half will see the second transaction completed. Note that the CAP theorem has nothing to do with scalability; it applies to sharded and non-sharded systems equally.
FLP impossibility - in an asynchronous setting (ie. there are no guaranteed bounds on network latency even between correctly functioning nodes), it is not possible to create an algorithm which is guaranteed to reach consensus in any specific finite amount of time if even a single faulty/dishonest node is present. Note that this does NOT rule out "Las Vegas" algorithms that have some probability each round of achieving consensus and thus will achieve consensus within T second with probability exponentially approaching 1 as T grows; this is in fact the "escape hatch" that many successful consensus algorithms use.
Bounds on fault tolerance - from the DLS paper we have: (i) protocols running in a partially synchronous network model (ie. there is a bound on network latency but we do not know ahead of time what it is) can tolerate up to 1/3 arbitrary (ie. "Byzantine") faults, (ii) deterministic protocols in an asynchronous model (ie. no bounds on network latency) cannot tolerate faults (although their paper fails to mention that randomized algorithms can with up to 1/3 fault tolerance), (iii) protocols in a synchronous model (ie. network latency is guaranteed to be less than a known d) can, surprisingly, tolerate up to 100% fault tolerance, although there are restrictions on what can happen when more than or equal to 1/2 of nodes are faulty. Note that the "authenticated Byzantine" model is the one worth considering, not the "Byzantine" one; the "authenticated" part essentially means that we can use public key cryptography in our algorithms, which is in modern times very well-researched and very cheap.

Proof of work has been rigorously analyzed by Andrew Miller and others and fits into the picture as an algorithm reliant on a synchronous network model. We can model the network as being made up of a near-infinite number of nodes, with each node representing a very small unit of computing power and having a very small probability of being able to create a block in a given period. In this model, the protocol has 50% fault tolerance assuming zero network latency, ~46% (Ethereum) and ~49.5% (Bitcoin) fault tolerance under actually observed conditions, but goes down to 33% if network latency is equal to the block time, and reduces to zero as network latency approaches infinity.

Proof of stake consensus fits more directly into the Byzantine fault tolerant consensus mould, as all validators have known identities (stable Ethereum addresses) and the network keeps track of the total size of the validator set. There are two general lines of proof of stake research, one looking at synchronous network models and one looking at partially asynchronous network models. "Chain-based" proof of stake algorithms almost always rely on synchronous network models, and their security can be formally proven within these models similarly to how security of proof of work algorithms can be proven. A line of research connecting traditional Byzantine fault tolerant consensus in partially synchronous networks to proof of stake also exists, but is more complex to explain; it will be covered in more detail in later sections.

Proof of work algorithms and chain-based proof of stake algorithms choose availability over consistency, but BFT-style consensus algorithms lean more toward consistency; Tendermint chooses consistency explicitly, and Casper uses a hybrid model that prefers availability but provides as much consistency as possible and makes both on-chain applications and clients aware of how strong the consistency guarantee is at any given time.

Note that Ittay Eyal and Emin Gun Sirer's selfish mining discovery, which places 25% and 33% bounds on the incentive compatibility of Bitcoin mining depending on the network model (ie. mining is only incentive compatible if collusions larger than 25% or 33% are impossible) has NOTHING to do with results from traditional consensus algorithm research, which does not touch incentive compatibility.

What is the "nothing at stake" problem and how can it be fixed?

In many early (all chain-based) proof of stake algorithms, including Peercoin, there are only rewards for producing blocks, and no penalties. This has the unfortunate consequence that, in the case that there are multiple competing chains, it is in a validator's incentive to try to make blocks on top of every chain at once, just to be sure:

In proof of work, doing so would require splitting one's computing power in half, and so would not be lucrative:

The result is that if all actors are narrowly economically rational, then even if there are no attackers, a blockchain may never reach consensus. If there is an attacker, then the attacker need only overpower altruistic nodes (who would exclusively stake on the original chain), and not rational nodes (who would stake on both the original chain and the attacker's chain), in contrast to proof of work, where the attacker must overpower both altruists and rational nodes (or at least credibly threaten to: see P + epsilon attacks).

Some argue that stakeholders have an incentive to act correctly and only stake on the longest chain in order to "preserve the value of their investment", however this ignores that this incentive suffers from tragedy of the commons problems: each individual stakeholder might only have a 1% chance of being "pivotal" (ie. being in a situation where if they participate in an attack then it succeeds and if they do not participate it fails), and so the bribe needed to convince them personally to join an attack would be only 1% of the size of their deposit; hence, the required combined bribe would be only 0.5-1% of the total sum of all deposits. Additionally, this argument implies that any zero-chance-of-failure situation is not a stable equilibrium, as if the chance of failure is zero then everyone has a 0% chance of being pivotal.

This can be solved via two strategies. The first, described in broad terms under the name "Slasher" here and developed further by Iddo Bentov here, involves penalizing validators if they simultaneously create blocks on multiple chains, by means of including proof of misbehavior (ie. two conflicting signed block headers) into the blockchain as a later point in time at which point the malfeasant validator's deposit is deducted appropriately. This changes the incentive structure thus:

Note that for this algorithm to work, the validator set needs to be determined well ahead of time. Otherwise, if a validator has 1% of the stake, then if there are two branches A and B then 0.99% of the time the validator will be eligible to stake only on A and not on B, 0.99% of the time the validator will be eligible to stake on B and not on A, and only 0.01% of the time will the validator will be eligible to stake on both. Hence, the validator can with 99% efficiency probabilistically double-stake: stake on A if possible, stake on B if possible, and only if the choice between both is open stake on the longer chain. This can only be avoided if the validator selection is the same for every block on both branches, which requires the validators to be selected at a time before the fork takes place.

This has its own flaws, including requiring nodes to be frequently online to get a secure view of the blockchain, and opening up medium-range validator collusion risks (ie. situations where, for example, 25 out of 30 consecutive validators get together and agree ahead of time to implement a 51% attack on the previous 19 blocks), but if these risks are deemed acceptable then it works well.

The second strategy is to simply punish validators for creating blocks on the wrong chain. That is, if there are two competing chains, A and B, then if a validator creates a block on B, they get a reward of +R on B, but the block header can be included into A (in Casper this is called a "dunkle") and on A the validator suffers a penalty of -F (possibly F = R). This changes the economic calculation thus:

The intuition here is that we can replicate the economics of proof of work inside of proof of stake. In proof of work, there is also a penalty for creating a block on the wrong chain, but this penalty is implicit in the external environment: miners have to spend extra electricity and obtain or rent extra hardware. Here, we simply make the penalties explicit. This mechanism has the disadvantage that it imposes slightly more risk on validators (although the effect should be smoothed out over time), but has the advantage that it does not require validators to be known ahead of time.

That shows how chain-based algorithms solve nothing-at-stake. Now how do BFT-style proof of stake algorithms work?

BFT-style (partially synchronous) proof of stake algorithms allow validators to "vote" on blocks by sending one or more types of signed messages, and specify two kinds of rules:

Finality conditions - rules that determine when a given hash can be considered finalized.
Slashing conditions - rules that determine when a given validator can be deemed beyond reasonable doubt to have misbehaved (eg. voting for multiple conflicting blocks at the same time). If a validator triggers one of these rules, their entire deposit gets deleted.

To illustrate the different forms that slashing conditions can take, we will give two examples of slashing conditions (hereinafter, "2/3 of all validators" is shorthand for "2/3 of all validators weighted by deposited coins", and likewise for other fractions and percentages). In these examples, "PREPARE" and "COMMIT" should be understood as simply referring to two types of messages that validators can send.

If MESSAGES contains messages of the form ["COMMIT", HASH1, view] and ["COMMIT", HASH2, view] for the same view but differing HASH1 and HASH2 signed by the same validator, then that validator is slashed.
If MESSAGES contains a message of the form ["COMMIT", HASH, view1], then UNLESS either view1 = -1 or there also exist messages of the form ["PREPARE", HASH, view1, view2] for some specific view2, where view2 < view1, signed by 2/3 of all validators, then the validator that made the COMMIT is slashed.

There are two important desiderata for a suitable set of slashing conditions to have:

Accountable safety - if conflicting HASH1 and HASH2 (ie. HASH1 and HASH2 are different, and neither is a descendant of the other) are finalized, then at least 1/3 of all validators must have violated some slashing condition.
Plausible liveness - unless at least 1/3 of all validators have violated some slashing condition, there exists a set of messages that 2/3 of validators can produce that finalize some value.

If we have a set of slashing conditions that satisfies both properties, then we can incentivize participants to send messages, and start benefiting from economic finality.

What is "economic finality" in general?

Economic finality is the idea that once a block is finalized, or more generally once enough messages of certain types have been signed, then the only way that at any point in the future the canonical history will contain a conflicting block is if a large number of people are willing to burn very large amounts of money. If a node sees that this condition has been met for a given block, then they have a very economically strong assurance that that block will always be part of the canonical history that everyone agrees on.

There are two "flavors" of economic finality:

A block can be economically finalized if a sufficient number of validators have signed cryptoeconomic claims of the form "I agree to lose X in all histories where block B is not included". This gives clients assurance that either (i) B is part of the canonical chain, or (ii) validators lost a large amount of money in order to trick them into thinking that this is the case.
A block can be economically finalized if a sufficient number of validators have signed messages expressing support for block B, and there is a mathematical proof that if some B' != B is also finalized under the same definition then validators lose a large amount of money. If clients see this, and also validate the chain, and validity plus finality is a sufficient condition for precedence in the canonical fork choice rule, then they get an assurance that either (i) B is part of the canonical chain, or (ii) validators lost a large amount of money in making a conflicting chain that was also finalized.

The two approaches to finality inherit from the two solutions to the nothing at stake problem: finality by penalizing incorrectness, and finality by penalizing equivocation. The main benefit of the first approach is that it is more light-client friendly and is simpler to reason about, and the main benefits of the second approach are that (i) it's easier to see that honest validators will not be punished, and (ii) griefing factors are more favorable to honest validators.

Casper follows the second flavor, though it is possible that an on-chain mechanism will be added where validators can voluntarily opt-in to signing finality messages of the first flavor, thereby enabling much more efficient light clients.

So how does this relate to Byzantine fault tolerance theory?

Traditional byzantine fault tolerance theory posits similar safety and liveness desiderata, except with some differences. First of all, traditional byzantine fault tolerance theory simply requires that safety is achieved if 2/3 of validators are honest. This is a strictly easier model to work in; traditional fault tolerance tries to prove "if mechanism M has a safety failure, then at least 1/3 of nodes are faulty", whereas our model tries to prove "if mechanism M has a safety failure, then at least 1/3 of nodes are faulty, and you know which ones, even if you were offline at the time the failure took place". From a liveness perspective, our model is the easier one, as we do not demand a proof that the network will come to consensus, we just demand a proof that it does not get stuck.

Fortunately, we can show the additional accountability requirement is not a particularly difficult one; in fact, with the right "protocol armor", we can convert any traditional partially synchronous or asynchronous Byzantine fault-tolerant algorithm into an accountable algorithm. The proof of this basically boils down to the fact that faults can be exhaustively categorized into a few classes, and each one of these classes is either accountable (ie. if you commit that type of fault you can get caught, so we can make a slashing condition for it) or indistinguishable from latency (note that even the fault of sending messages too early is indistinguishable from latency, as one can model it by speeding up everyone's clocks and assigning the messages that weren't sent too early a higher latency).

What is "weak subjectivity"?

It is important to note that the mechanism of using deposits to ensure there is "something at stake" does lead to one change in the security model. Suppose that deposits are locked for four months, and can later be withdrawn. Suppose that an attempted 51% attack happens that reverts 10 days worth of transactions. The blocks created by the attackers can simply be imported into the main chain as proof-of-malfeasance (or "dunkles") and the validators can be punished. However, suppose that such an attack happens after six months. Then, even though the blocks can certainly be re-imported, by that time the malfeasant validators will be able to withdraw their deposits on the main chain, and so they cannot be punished.

To solve this problem, we introduce a "revert limit" - a rule that nodes must simply refuse to revert further back in time than the deposit length (ie. in our example, four months). Note that this rule is different from every other consensus rule in the protocol, in that it means that nodes may come to different conclusions depending on when they saw certain messages. The time that a node saw a given message may be different between different nodes; hence we consider this rule "subjective" (alternatively, one well-versed in Byzantine fault tolerance theory may view it as a kind of synchrony assumption).

However, the "subjectivity" here is very weak: in order for a node to get on the "wrong" chain, they must receive the original message four months later than they otherwise would have. This is only possible in two cases:

When a node connects to the blockchain for the first time.
If a node has been offline for more than four months.

We can solve (1) by making it the user's responsibility to authenticate the latest state out of band. They can do this by asking their friends, block explorers, businesses that they interact with, etc. for a recent block hash in the chain that they see as the canonical one. In practice, such a block hash may well simply come as part of the software they use to verify the blockchain; an attacker that can corrupt the checkpoint in the software can arguably just as easily corrupt the software itself, and no amount of pure cryptoeconomic verification can solve that problem.

Note that all of this is a problem only in the very limited case where a majority of previous stakeholders collude to attack the network and create an alternate chain; most of the time we expect there will only be one canonical chain to choose from.

Also, note that weak subjectivity assumptions exist in proof of work chains as well if the chain does any hard forks. Bitcoin has previously pulled off a hard fork with 2 months lead time through bitcoind 0.8.1, which fixed a database issue that made certain kinds of large blocks invalid and thereby allowed clients to process blocks that bitcoind 0.7 could not process, and so users had to download a new version of the software with 2 months' notice. This is itself a kind of weak subjectivity assumption, as users had to "log on" within that two-month period to download the update to stay on the correct chain.

Additionally, the social authentication can if needed even be automated in several ways. One is to bake it into natural user workflow: a BIP 70-style payment request could include a recent block hash, and the user's client software would make sure that they are on the same chain as the vendor before approving a payment (or for that matter, any on-chain interaction). The other is to use Jeff Coleman's universal hash time. If UHT is used, then a successful attack chain would need to be generated secretly at the same time as the legitimate chain was being built, requiring a majority of validators to secretly collude for that long.

Does weak subjectivity mean that a proof of stake chain must be "anchored" into a proof of work chain to be secure?

In short, no.

Elaborate?

Weak subjectivity by itself is a rather small addition to the security assumptions in a blockchain in practice, and definitely does not necessitate some proof-of-work-based outside source of truth to supplement it. To see why, consider the kind of situation where weak subjectivity by itself would compromise a blockchain's security. In such a world, powerful corporate or nation state actors would have the ability to somehow convince an entire community that block hash B was the block hash of block XXXYYY when most of them saw at the time and have stored in their own computers that the block hash of block XXXYYY was A, but for some reason such powerful actors would not have the ability to trick users into accepting a different location from where they download their client software.

Furthermore, the "anchoring" that advocates of such a scheme describe is not even all that secure. All anchoring proves is that a given block hash was produced at time T' < T; it does not prove that it was published at that time. Hence, a PoS chain anchored into a PoW chain could simply be attacked by a majority coalition that produces both chains in parallel, anchors both, publishes one, and then four months later publishes the other one.

One could get around this by embedding a fully-functional "light client" of the PoS chain into the PoW chain, which would reject the double-anchoring, but this would require the PoW chain to be feature-rich enough to be able to implement such a client - a property which most actually existing proof of work chains do not possess.

Can one economically penalize censorship in proof of stake?

Unlike reverts, censorship is much more difficult to prove. The blockchain itself cannot directly tell the difference between "user A tried to send transaction X but it was unfairly censored", "user A tried to send transaction X but it never got in because the transaction fee was insufficient" and "user A never tried to send transaction X at all". However, there are a number of techniques that can be used to mitigate censorship issues.

The first is censorship resistance by halting problem. In the weaker version of this scheme, the protocol is designed to be Turing-complete in such a way that a validator cannot even tell whether or not a given transaction will lead to an undesired action without spending a large amount of processing power executing the transaction, and thus opening itself up to denial-of-service attacks. This is what prevented the DAO soft fork.

In the stronger version of the scheme, transactions can trigger guaranteed effects at some point in the near to mid-term future. Hence, a user could send multiple transactions which interact with each other and with predicted third-party information to lead to some future event, but the validators cannot possibly tell that this is going to happen until the transactions are already included (and economically finalized) and it is far too late to stop them; even if all future transactions are excluded, the event that validators wish to halt would still take place. Note that in this scheme, validators could still try to prevent all transactions, or perhaps all transactions that do not come packaged with some formal proof that they do not lead to anything undesired, but this would entail forbidding a very wide class of transactions to the point of essentially breaking the entire system, which would cause validators to lose value as the price of the cryptocurrency in which their deposits are denominated would drop.

The second, described by Adam Back here, is to require transactions to be timelock-encrypted. Hence, validators will include the transactions without knowing the contents, and only later could the contents automatically be revealed, by which point once again it would be far too late to un-include the transactions. If validators were sufficiently malicious, however, they could simply only agree to include transactions that come with a cryptographic proof (eg. ZK-SNARK) of what the decrypted version is; this would force users to download new client software, but an adversary could conveniently provide such client software for easy download, and in a game-theoretic model users would have the incentive to play along.

Perhaps the best that can be said in a proof-of-stake context is that users could also install a software update that includes a hard fork that deletes the malicious validators and this is not that much harder than installing a software update to make their transactions "censorship-friendly". Hence, all in all this scheme is also moderately effective, though it does come at the cost of slowing interaction with the blockchain down (note that the scheme must be mandatory to be effective; otherwise malicious validators could much more easily simply filter encrypted transactions without filtering the quicker unencrypted transactions).

A third alternative is to include censorship detection in the fork choice rule. The idea is simple. Nodes watch the network for transactions, and if they see a transaction that has a sufficiently high fee for a sufficient amount of time, then they assign a lower "score" to blockchains that do not include this transaction. If all nodes follow this strategy, then eventually a minority chain would automatically coalesce that includes the transactions, and all honest online nodes would follow it. The main weakness of such a scheme is that offline nodes would still follow the majority branch, and if the censorship is temporary and they log back on after the censorship ends then they would end up on a different branch from online nodes. Hence, this scheme should be viewed more as a tool to facilitate automated emergency coordination on a hard fork than something that would play an active role in day-to-day fork choice.

How does validator selection work, and what is stake grinding?

In any chain-based proof of stake algorithm, there is a need for some mechanism which randomly selects which validator out of the currently active validator set can make the next block. For example, if the currently active validator set consists of Alice with 40 ether, Bob with 30 ether, Charlie with 20 ether and David with 10 ether, then you want there to be a 40% chance that Alice will be the next block creator, 30% chance that Bob will be, etc (in practice, you want to randomly select not just one validator, but rather an infinite sequence of validators, so that if Alice doesn't show up there is someone who can replace her after some time, but this doesn't change the fundamental problem). In non-chain-based algorithms randomness is also often needed for different reasons.

"Stake grinding" is a class of attack where a validator performs some computation or takes some other step to try to bias the randomness in their own favor. For example:

In Peercoin, a validator could "grind" through many combinations of parameters and find favorable parameters that would increase the probability of their coins generating a valid block.
In one now-defunct implementation, the randomness for block N+1 was dependent on the signature of block N. This allowed a validator to repeatedly produce new signatures until they found one that allowed them to get the next block, thereby seizing control of the system forever.
In NXT, the randomness for block N+1 is dependent on the validator that creates block N. This allows a validator to manipulate the randomness by simply skipping an opportunity to create a block. This carries an opportunity cost equal to the block reward, but sometimes the new random seed would give the validator an above-average number of blocks over the next few dozen blocks. See here for a more detailed analysis.

(1) and (2) are easy to solve; the general approach is to require validators to deposit their coins well in advance, and not to use information that can be easily manipulated as source data for the randomness. There are several main strategies for solving problems like (3). The first is to use schemes based on secret sharing or deterministic threshold signatures and have validators collaboratively generate the random value. These schemes are robust against all manipulation unless a majority of validators collude (in some cases though, depending on the implementation, between 33-50% of validators can interfere in the operation, leading to the protocol having a 67% liveness assumption).

The second is to use cryptoeconomic schemes where validators commit to information (ie. publish sha3(x)) well in advance, and then must publish x in the block; x is then added into the randomness pool. There are two theoretical attack vectors against this:

Manipulate x at commitment time. This is impractical because the randomness result would take many actors' values into account, and if even one of them is honest then the output will be a uniform distribution. A uniform distribution XORed together with arbitrarily many arbitrarily biased distributions still gives a uniform distribution.
Selectively avoid publishing blocks. However, this attack costs one block reward of opportunity cost, and because the scheme prevents anyone from seeing any future validators except for the next, it almost never provides more than one block reward worth of revenue. The only exception is the case where, if a validator skips, the next validator in line AND the first child of that validator will both be the same validator; if these situations are a grave concern then we can punish skipping further via an explicit skipping penalty.

The third is to use Iddo Bentov's "majority beacon", which generates a random number by taking the bit-majority of the previous N random numbers generated through some other beacon (ie. the first bit of the result is 1 if the majority of the first bits in the source numbers is 1 and otherwise it's 0, the second bit of the result is 1 if the majority of the second bits in the source numbers is 1 and otherwise it's 0, etc). This gives a cost-of-exploitation of ~C * sqrt(N) where C is the cost of exploitation of the underlying beacons. Hence, all in all, many known solutions to stake grinding exist; the problem is more like differential cryptanalysis than the halting problem - an annoyance that proof of stake designers eventually understood and now know how to overcome, not a fundamental and inescapable flaw.

What would the equivalent of a 51% attack against Casper look like?

The most basic form of "51% attack" is a simple finality reversion: validators that already finalized block A then finalize some competing block A', thereby breaking the blockchain's finality guarantee. In this case, there now exist two incompatible finalized histories, creating a split of the blockchain, that full nodes would be willing to accept, and so it is up to the community to coordinate out of band to focus on one of the branches and ignore the other(s).

This coordination could take place on social media, through private channels between block explorer providers, businesses and exchanges, various online discussion forms, and the like. The principle according to which the decision would be made is "whichever one was finalized first is the real one". Another alternative is to rely on "market consensus": both branches would be briefly being traded on exchanges for a very short period of time, until network effects rapidly make one branch much more valuable with the others. In this case, the "first finalized chain wins" principle would be a Schelling point for what the market would choose. It's very possible that a combination of both approaches will get used in practice.

Once there is consensus on which chain is real, users (ie. validators and light and full node operators) would be able to manually insert the winning block hash into their client software through a special option in the interface, and their nodes would then ignore all other chains. No matter which chain wins, there exists evidence that can immediately be used to destroy at least 1/3 of the validators' deposits.

Another kind of attack is liveness denial: instead of trying to revert blocks, a cartel of >=34% of validators could simply refuse to finalize any more blocks. In this case, blocks would never finalize. Casper uses a hybrid chain/BFT-style consensus, and so the blockchain would still grow, but it would have a much lower level of security. If no blocks are finalized for some long period of time (eg. 1 day), then there are several options:

The protocol can include an automatic feature to rotate the validator set. Blocks under the new validator set would finalize, but clients would get an indication that the new finalized blocks are in some sense suspect, as it's very possible that the old validator set will resume operating and finalize some other blocks. Clients could then manually override this warning once it's clear that the old validator set is not coming back online. There would be a protocol rule that under such an event all old validators that did not try to participate in the consensus process take a large penalty to their deposits.
A hard fork is used to add in new validators and delete the attackers' balances.

In case (2), the fork would once again be coordinated via social consensus and possibly via market consensus (ie. the branch with the old and new validator set briefly both being traded on exchanges). In the latter case, there is a strong argument that the market would want to choose the branch where "the good guys win", as such a chain has validators that have demonstrated their goodwill (or at least, their alignment with the interest of the users) and so is a more useful chain for application developers.

Note that there is a spectrum of response strategies here between social coordination and in-protocol automation, and it is generally considered desirable to push as far toward automated resolution as possible so as to minimize the risk of simultaneous 51% attacks and attacks on the social layer (and market consensus tools such as exchanges). One can imagine an implementation of (1) where nodes automatically accept a switch to a new validator set if they do not see a new block being committed for a long enough time, which would reduce the need for social coordination but at the cost of requiring those nodes that do not wish to rely on social coordination to remain constantly online. In either case, a solution can be designed where attackers take a large hit to their deposits.

A more insidious kind of attack is a censorship attack, where >= 34% of validators refuse to finalize blocks that contain certain kinds of transactions that they do not like, but otherwise the blockchain keeps going and blocks keep getting finalized. This could range from a mild censorship attack which only censors to interfere with a few specific applications (eg. selectively censoring transactions in something like Raiden or the lightning network is a fairly easy way for a cartel to steal money) to an attack that blocks all transactions.

There are two sub-cases. The first is where the attacker has 34-67% of the stake. Here, we can program validators to refuse to finalize or build on blocks that they subjectively believe are clearly censoring transactions, which turns this kind of attack into a more standard liveness attack. The more dangerous case is where the attacker has more than 67% of the stake. Here, the attacker can freely block any transactions they wish to block and refuse to build on any blocks that do contain such transactions.

There are two lines of defense. First, because Ethereum is Turing-complete it is naturally somewhat resistant to censorship as censoring transactions that have a certain effect is in some ways similar to solving the halting problem. Because there is a gas limit, it is not literally impossible, though the "easy" ways to do it do open up denial-of-service attack vulnerabilities.

This resistance is not perfect, and there are ways to improve it. The most interesting approach is to add in-protocol features where transactions can automatically schedule future events, as it would be extremely difficult to try to foresee what the result of executing scheduled events and the events resulting from those scheduled events would be ahead of time. Validators could then use obfuscated sequences of scheduled events to deposit their ether, and dilute the attacker to below 33%.

Second, one can introduce the notion of an "active fork choice rule", where part of the process for determining whether or not a given chain is valid is trying to interact with it and verifying that it is not trying to censor you. The most effective way to do this would be for nodes to repeatedly send a transaction to schedule depositing their ether and then cancel the deposit at the last moment. If nodes detect censorship, they could then follow through with the deposit, and so temporarily join the validator pool en masse, diluting the attacker to below 33%. If the validator cartel censors their attempts to deposit, then nodes running this "active fork choice rule" would not recognize the chain as valid; this would collapse the censorship attack into a liveness denial attack, at which point it can be resolved through the same means as other liveness denial attacks.

That sounds like a lot of reliance on out-of-band social coordination; is that not dangerous?

Attacks against Casper are extremely expensive; as we will see below, attacks against Casper cost as much, if not more, than the cost of buying enough mining power in a proof of work chain to permanently 51% attack it over and over again to the point of uselessness. Hence, the recovery techniques described above will only be used in very extreme circumstances; in fact, advocates of proof of work also generally express willingness to use social coordination in similar circumstances by, for example, changing the proof of work algorithm. Hence, it is not even clear that the need for social coordination in proof of stake is larger than it is in proof of work.

In reality, we expect the amount of social coordination required to be near-zero, as attackers will realize that it is not in their benefit to burn such large amounts of money to simply take a blockchain offline for one or two days.

Doesn't MC => MR mean that all consensus algorithms with a given security level are equally efficient (or in other words, equally wasteful)?

This is an argument that many have raised, perhaps best explained by Paul Sztorc in this article. Essentially, if you create a way for people to earn $100, then people will be willing to spend anywhere up to $99.9 (including the cost of their own labor) in order to get it; marginal cost approaches marginal revenue. Hence, the theory goes, any algorithm with a given block reward will be equally "wasteful" in terms of the quantity of socially unproductive activity that is carried out in order to try to get the reward.

There are three flaws with this:

It's not enough to simply say that marginal cost approaches marginal revenue; one must also posit a plausible mechanism by which someone can actually expend that cost. For example, if tomorrow I announce that every day from then on I will give $100 to a randomly selected one of a given list of ten people (using my laptop's /dev/urandom as randomness), then there is simply no way for anyone to send $99 to try to get at that randomness. Either they are not in the list of ten, in which case they have no chance no matter what they do, or they are in the list of ten, in which case they don't have any reasonable way to manipulate my randomness so they're stuck with getting the expected-value $10 per day.
MC => MR does NOT imply total cost approaches total revenue. For example, suppose that there is an algorithm which pseudorandomly selects 1000 validators out of some very large set (each validator getting a reward of $1), you have 10% of the stake so on average you get 100, and at a cost of $1 you can force the randomness to reset (and you can repeat this an unlimited number of times). Due to the central limit theorem, the standard deviation of your reward is $10, and based on other known results in math the expected maximum of N random samples is slightly under M + S * sqrt(2 * log(N)) where M is the mean and S is the standard deviation. Hence the reward for making additional trials (ie. increasing N) drops off sharply, eg. with 0 re-trials your expected reward is $100, with one re-trial it's $105.5, with two it's $108.5, with three it's $110.3, with four it's $111.6, with five it's $112.6 and with six it's $113.5. Hence, after five retrials it stops being worth it. As a result, an economically motivated attacker with ten percent of stake will inefficiently spend $5 to get an additional revenue of $13, though the total revenue is $113. If the exploitable mechanisms only expose small opportunities, the economic loss will be small; it is decidedly NOT the case that a single drop of exploitability brings the entire flood of PoW-level economic waste rushing back in. This point will also be very relevant in our below discussion on capital lockup costs.
Proof of stake can be secured with much lower total rewards than proof of work.

What about capital lockup costs?

Locking up X ether in a deposit is not free; it entails a sacrifice of optionality for the ether holder. Right now, if I have 1000 ether, I can do whatever I want with it; if I lock it up in a deposit, then it's stuck there for months, and I do not have, for example, the insurance utility of the money being there to pay for sudden unexpected expenses. I also lose some freedom to change my token allocations away from ether within that timeframe; I could simulate selling ether by shorting an amount equivalent to the deposit on an exchange, but this itself carries costs including exchange fees and paying interest. Some might argue: isn't this capital lockup inefficiency really just a highly indirect way of achieving the exact same level of economic inefficiency as exists in proof of work? The answer is no, for both reasons (2) and (3) above.

Let us start with (3) first. Consider a model where proof of stake deposits are infinite-term, ASICs last forever, ASIC technology is fixed (ie. no Moore's law) and electricity costs are zero. Let's say the equilibrium interest rate is 5% per annum. In a proof of work blockchain, I can take $1000, convert it into a miner, and the miner will pay me $50 in rewards per year forever. In a proof of stake blockchain, I would buy $1000 of coins, deposit them (ie. losing them forever), and get $50 in rewards per year forever. So far, the situation looks completely symmetrical (technically, even here, in the proof of stake case my destruction of coins isn't fully socially destructive as it makes others' coins worth more, but we can leave that aside for the moment). The cost of a "Maginot-line" 51% attack (ie. buying up more hardware than the rest of the network) increases by $1000 in both cases.

Now, let's perform the following changes to our model in turn:

Moore's law exists, ASICs depreciate by 50% every 2.772 years (that's a continuously-compounded 25% per annum; picked to make the numbers simpler). If I want to retain the same "pay once, get money forever" behavior, I can do so: I would put $1000 into a fund, where $167 would go into an ASIC and the remaining $833 would go into investments at 5% interest; the $41.67 dividends per year would be just enough to keep renewing the ASIC hardware (assuming technological development is fully continuous, once again to make the math simpler). Rewards would go down to $8.33 per year; hence, 83.3% of miners will drop out until the system comes back into equilibrium with me earning $50 per year, and so the Maginot-line cost of an attack on PoW given the same rewards drops by a factor of 6.
Electricity plus maintenance makes up 1/3 of mining costs. We estimate the 1/3 from recent mining statistics: one of Bitfury's new data centers consumes 0.06 joules per gigahash, or 60 J/TH or 0.000017 kWh/TH, and if we assume the entire Bitcoin network has similar efficiencies we get 27.9 kWh per second given 1.67 million TH/s total Bitcoin hashpower. Electricity in China costs $0.11 per kWh, so that's about $3 per second, or $260,000 per day. Bitcoin block rewards plus fees are $600 per BTC * 13 BTC per block * 144 blocks per day = $1.12m per day. Thus electricity itself would make up 23% of costs, and we can back-of-the-envelope estimate maintenance at 10% to give a clean 1/3 ongoing costs, 2/3 fixed costs split. This means that out of your $1000 fund, only $111 would go into the ASIC, $55 would go into paying ongoing costs, and $833 would go into hardware investments; hence the Maginot-line cost of attack is 9x lower than in our original setting.
Deposits are temporary, not permanent. Sure, if I voluntarily keep staking forever, then this changes nothing. However, I regain some of the optionality that I had before; I could quit within a medium timeframe (say, 4 months) at any time. This means that I would be willing to put more than $1000 of ether in for the $50 per year gain; perhaps in equilibrium it would be something like $3000. Hence, the cost of the Maginot line attack on PoS increases by a factor of three, and so on net PoS gives 27x more security than PoW for the same cost.

The above included a large amount of simplified modeling, however it serves to show how multiple factors stack up heavily in favor of PoS in such a way that PoS gets more bang for its buck in terms of security. The meta-argument for why this perhaps suspiciously multifactorial argument leans so heavily in favor of PoS is simple: in PoW, we are working directly with the laws of physics. In PoS, we are able to design the protocol in such a way that it has the precise properties that we want - in short, we can optimize the laws of physics in our favor. The "hidden trapdoor" that gives us (3) is the change in the security model, specifically the introduction of weak subjectivity.

Now, we can talk about the marginal/total distinction. In the case of capital lockup costs, this is very important. For example, consider a case where you have $100,000 of ether. You probably intend to hold a large portion of it for a long time; hence, locking up even $50,000 of the ether should be nearly free. Locking up $80,000 would be slightly more inconvenient, but $20,000 of breathing room still gives you a large space to maneuver. Locking up $90,000 is more problematic, $99,000 is very problematic, and locking up all $100,000 is absurd, as it means you would not even have a single bit of ether left to pay basic transaction fees. Hence, your marginal costs increase quickly. We can show the difference between this state of affairs and the state of affairs in proof of work as follows:

Hence, the total cost of proof of stake is potentially much lower than the marginal cost of depositing 1 more ETH into the system multiplied by the amount of ether currently deposited.

Note that this component of the argument unfortunately does not fully translate into reduction of the "safe level of issuance". It does help us because it shows that we can get substantial proof of stake participation even if we keep issuance very low; however, it also means that a large portion of the gains will simply be borne by validators as economic surplus.

Will exchanges in proof of stake pose a similar centralization risk to pools in proof of work?

From a centralization perspective, in both Bitcoin and Ethereum it's the case that roughly three pools are needed to coordinate on a 51% attack (4 in Bitcoin, 3 in Ethereum at the time of this writing). In PoS, if we assume 30% participation including all exchanges, then three exchanges would be enough to make a 51% attack; if participation goes up to 40% then the required number goes up to eight. However, exchanges will not be able to participate with all of their ether; the reason is that they need to accomodate withdrawals.

Additionally, pooling in PoS is discouraged because it has a much higher trust requirement - a proof of stake pool can pretend to be hacked, destroy its participants' deposits and claim a reward for it. On the other hand, the ability to earn interest on one's coins without oneself running a node, even if trust is required, is something that many may find attractive; all in all, the centralization balance is an empirical question for which the answer is unclear until the system is actually running for a substantial period of time. With sharding, we expect pooling incentives to reduce further, as (i) there is even less concern about variance, and (ii) in a sharded model, transaction verification load is proportional to the amount of capital that one puts in, and so there are no direct infrastructure savings from pooling.

A final point is that centralization is less harmful in proof of stake than in proof of work, as there are much cheaper ways to recover from successful 51% attacks; one does not need to switch to a new mining algorithm.

Can proof of stake be used in private/consortium chains?

Generally, yes; any proof of stake algorithm can be used as a consensus algorithm in private/consortium chain settings. The only change is that the way the validator set is selected would be different: it would start off as a set of trusted users that everyone agrees on, and then it would be up to the validator set to vote on adding in new validators.

↧

Wikipedia Vandal Early Detection: From User Behavior to User Embedding [pdf]

August 19, 2017, 9:38 am

≫ Next: YAGNI, Cargo Cult and Overengineering

≪ Previous: Ethereum Proof of Stake FAQ

Download PDF

↧

YAGNI, Cargo Cult and Overengineering

August 19, 2017, 11:46 pm

≫ Next: Lispers

≪ Previous: Wikipedia Vandal Early Detection: From User Behavior to User Embedding [pdf]

19 Aug 2017

It was April. Year was probably was 2010. The cold, snowy winter was finally coming to an end and the spring was almost in the air. I was preparing for my final exams. The review lectures were going on for the RDBMS course that I was enrolled in at my university.

uofc

Around the same time, I had started hearing and reading about the shinny, new technology that was going to change the way we use databases. The NoSQL movement was gaining momentum. I was reading blogs about how MongoDB is big time outperforming ancient, non web scale relational databases.

After the lecture, being a smart-ass that I’m, I asked my professor:

Me: So, between RDBMS and NoSQL databases, which one do you think is the best?
Professor: Well, it depends.
Me: Depends on what?
Professor: Depends on what you are trying to achieve. Both have their pros and cons. You pick the right tool for the job.
Me: But MySQL can’t really scale.
Professor: How do you think we got this far? Send me an email and I’ll send you some papers and practical uses of MySQL handling very high load.

SQL was hard for my brain, especially the joins. I loved NoSQL. Simple key->value model without any joins! Stone-aged RDBMS systems that were designed in 1960’s were simply not enough to keep up with modern demands. I had lost all interest in RDBMS and predicted they’ll just die off in the next few years.

I never emailed my professor.

It’s 2012. We’re redesigning my employer’s flagship product. The first version was a monolith that used the boring MySQL. Spending too much time reading blogs and Hacker News comments section, we convinced ourselves that we need to go big and modern:

Break monolith into service-oriented architecture, aka, the SOA.
Replace MySQL with Cassandra (MySQL to Redis to Cassandra)

And we built it.

There was nothing wrong with the new system… except one major flaw. It was too complex for a small startup team to maintain. We had built a Formula One race car, that makes frequent pit-stops and requires very specialized maintenance, when we needed a Toyota Corolla that goes on for years and years on just the oil change.

Fast forward to 2017. It feels like almost all software developers I interview these days, have hands-on experience with microservices architecture and many have even actually used it in production.

A grey San Francisco afternoon. I’m conducting an on-site interview. Masters degree and 3 years of experience at a startup that looked like it didn’t make it. I asked him to tell me about the system he built.

Guy: We built the system using the microservices architecture. We had lots of small services which made our life really easy…
Me: Go on…
Guy: Data was written to the BI system through a Kafka cluster. Hadoop and MapReduce system was built to process data for analytics. The system was super scalable.

I pressed him to tell me drawbacks of microservices architecture and problems it introduces. The guy tried to hand-wave his way through and was convinced, just like I was in 2010 about NoSQL databases, that there are absolutely no issues with microservices architecture.

I’m not saying microservices architecture is bad. It’s beneficial, but only to a few organizations in the world. Organizations who have very complex systems that justify the operational burden, the overhead, it introduces. Martin Fowler, who coined the term microservices, warns us of the “microservices premium”:

The fulcrum of whether or not to use microservices is the complexity of the system you’re contemplating. The microservices approach is all about handling a complex system, but in order to do so the approach introduces its own set of complexities. When you use microservices you have to work on automated deployment, monitoring, dealing with failure, eventual consistency, and other factors that a distributed system introduces. There are well-known ways to cope with all this, but it’s extra effort, and nobody I know in software development seems to have acres of free time.
So my primary guideline would be don’t even consider microservices unless you have a system that’s too complex to manage as a monolith. The majority of software systems should be built as a single monolithic application. Do pay attention to good modularity within that monolith, but don’t try to separate it into separate services.

Martin’s being generous. I believe that only very few, very large organizations really benefit from microservices. Netflix is one of such organizations. Their system grew too large and too complex to justify switching to microservices.

In almost all cases, you can’t go wrong by building a monolith first. You break your architecture into services-oriented or… microservices when the benefits outweigh the complexity.

The guy also mentioned Kafka. System that handles 2 million writes a second at LinkedIn:

Me: How much data do you stream to Kafka roughly?
Guy: We could stream gigabytes of logs…
Me: How much data are you streaming right now?
Guy: Not a whole lot right now because we only have 3 customers. But the system could scale up to support millions and millions of users. Also, with both Kafka and Hadoop clusters, we get fool-proof fault-tolerance
Me: How big is the team and company?
Guy: Overall, I guess there were (less than 10) people in the company. Engineering was about 5.

At this point, I was tempted to ask if he had ever heard of YAGNI:

Yagni … stands for “You Aren’t Gonna Need It”. It is a mantra from Extreme Programming … It’s a statement that some capability we presume our software needs in the future should not be built now because “you aren’t gonna need it”.

People sometimes have honest intentions and they don’t introduce new tools, libraries, frameworks, just for the sake of enhancing their resumes. Sometimes they simply speculate enormous growth and try to do everything up front to not have to do this work later.

The common reason why people build presumptive features is because they think it will be cheaper to build it now rather than build it later. But that cost comparison has to be made at least against the cost of delay, preferably factoring in the probability that you’re building an unnecessary feature, for which your odds are at least ⅔.

People know complexity is bad. No one likes to see bugs filed on JIRA or get PagerDuty alerts at 3 a.m. that something is wrong with Cassandra cluster. But why do software developers still do it? Why do they choose to build complex systems without proper investigation?

Are there other reasons for building complex features besides preparing for a hypothetical future?

I’m sure majority of you are familiar with the term cargo cult software development. Teams who slavishly and blindly follow techniques of large companies they idolize like Google, Apple or Amazon, in the hopes that they’ll achieve similar success by emulating their idols.

Just like the South Sea natives who built primitive runways, prayed and performed rituals in the hopes that planes would come in and bring cargo, the food and supplies. Richard Feynman warned graduates at the California Institute of Technology to not fall victim to the cargo cult thinking:

In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas — he’s the controller — and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes landed.

cargo-cult

It reminded me of the time when we took a perfectly good monolith and created service-oriented architecture (SOA). It didn’t work. No planes landed.

Whatever the reasons:

not wanting to be left out: the new Javascript framework from this week will be the next hottest thing.
enhancing resume with buzzwords: Monolith won’t impress recruiters / interviewers.
imitating heroes: Facebook does it, Google does it, Twitter does it.
technology that could keep up with the projected 5000% YoY company growth: YAGNI.
latest tools to convince people that you’re a proper SF bay area tech company.

Cargo-cult engineering just doesn’t work. You are not Google. What works for them, may not work for your much smaller company. Google actually needed MapReduce because they wanted to regenerate indexes for the entire World Wide Web, or something like that. They needed fault tolerance from thousands of commodity servers. They had 20 petabytes of data to process.

20 petabytes is just enormous. In terms of the number of disk drives, here’s what half of that, 10 petabytes, would look like:

10petabytes

To avoid falling in the cargo-cult trap, I have learned to do the following:

Focus on the problem first, not the solution. Don’t pick any tool until you have fully understood what you are trying to achieve or solve. Don’t give up solving the actual problem and make it all about learning and using the shinny new tech.
Keep it simple. It’s an over-used term, but software developers still just don’t get it. Keep. It. Simple.
If you are leaning towards something that Twitter or Google uses, do your homework and understand the real reasons why they picked that technology.
When thinking of growth, understand that the chances of your startup growing to be the size of Facebook are slim to none. Even if your odds are huge, is it really worth all this effort to set-up a ‘world-class foundation’ now vs doing it later?
Weigh operational burden and complexity. Do you really need multi-region replication for fault-tolerance in return of making your DevOps life 2x more difficult?
Be selfish. Do you want to be woken up in the middle of night because something, somewhere stopped working? There is nothing wrong with learning the new JavaScript framework from last week. Create a new project and publish it on GitHub. Don’t build production systems just because you like it.
Think about people who’d have to live with your mess. Think about your legacy. Do people remember you as the guy who built rock-solid systems or someone who left a crazy mess behind?
Share your ideas with experts and veterans and let them criticize. Identify people in other teams who you respect and who’d disagree freely.
Don’t jump to conclusions on the results of quick experiments. HelloWorld prototypes of anything is easy. Real-life is very different from HelloWorld.

We discussed YAGNI in this post, but it is generally applied in the context of writing software, design patterns, frameworks, ORMs, etc. Things that are in control of one person.

YAGNI is coding what you need, as you need, refactoring your way through.

Back to the guy I interviewed. It’s highly unlikely, even for a small startup, that a software developer in the trenches was allowed to pick Hadoop, Kafka and microservices architecture. Cargo cult practices usually start from someone higher up the ranks. A tech leader who may be a very smart engineer, but very bad at making rational decisions. Someone who probably spends way too much time reading blogs and tries very hard keep up with the Amazon or Google way of building software.

2017 is almost over. NoSQL has matured. MongoDB is on its way out. DynamoDB is actually a very solid product and is maturing really well. RDBMS systems didn’t die. One can argue they are actually doing pretty good. StackOverflow is powered by just4 Microsoft SQL Servers. Uber runs on MySQL.

You may have great reasons to use MapReduce or microservices. What matters is how you arrive at your decision. Did you fell victim to the hype-cycle and picked up the technology just to jump on the bandwagon? Or did you carefully analyze the problem, and picked the best tool that gets the job done and you and your team can deal with it?

As the professor said: “Pick the right tool for the job.” I’ll also add: don’t build Formula One cars when you need a Corolla.

This article was written by Umer Mansoor. Please leave your comments below and like on Facebook orfollow on Twitter to stay up-to-date.

↧

Lispers

August 19, 2017, 3:23 pm

≫ Next: Back to the Future: The Decentralized Web

≪ Previous: YAGNI, Cargo Cult and Overengineering

Lisp - made with secret alien technology

“The greatest single programming language ever designed.”— Alan Kay, on Lisp

“Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot.”— Eric Raymond, "How to Become a Hacker"

“One of the most important and fascinating of all computer languages is Lisp (standing for "List Processing"), which was invented by John McCarthy around the time Algol was invented.”— Douglas Hofstadter, Gödel, Escher, Bach

“Within a couple weeks of learning Lisp I found programming in any other language unbearably constraining.”— Paul Graham, Road to Lisp

“Lisp is the most sophisticated programming language I know. It is literally decades ahead of the competition ... it is not possible (as far as I know) to actually use Lisp seriously before reaching the point of no return.”— Christian Lynbech, Road to Lisp

“Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.”— Philip Greenspun

“We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp. Aren't you happy?”— Guy Steele, Java spec co-author, LL1 mailing list, 2003

“Lisp has jokingly been called "the most intelligent way to misuse a computer". I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.”— Edsger Dijkstra, CACM, 15:10

“Historically, languages designed for other people to use have been bad: Cobol, PL/I, Pascal, Ada, C++. The good languages have been those that were designed for their own creators: C, Perl, Smalltalk, Lisp.”— Paul Graham

“Lisp ... made me aware that software could be close to executable mathematics.”— L. Peter Deutsch

“Lisp is a programmable programming language.”— John Foderaro, CACM, September 1991

“Will write code that writes code that writes code that writes code for money.”— on comp.lang.lisp

“I object to doing things that computers can do.”— Olin Shivers

“Lisp is a language for doing what you've been told is impossible.”— Kent Pitman

“Anyone could learn Lisp in one day, except that if they already knew Fortran, it would take three days.”— Marvin Minsky

“Programming in Lisp is like playing with the primordial forces of the universe. It feels like lightning between your fingertips. No other language even feels close.”— Glenn Ehrlich, Road to Lisp

“Lisp is the red pill.”— John Fraser, on comp.lang.lisp

“The language God would have used to implement the Universe.”— Svein Ove Aas, Road to Lisp

“Lisp doesn't look any deader than usual to me.”— David Thornley, reply to a question older than most programming languages

“Don't worry about what anybody else is going to do. The best way to predict the future is to invent it.”— Alan Kay

↧

Back to the Future: The Decentralized Web

August 20, 2017, 12:03 am

≫ Next: How do I improve my storytelling?

≪ Previous: Lispers

The Web is a key space for civic debate and the current battleground for protecting freedom of expression. However, since its development, the Web has steadily evolved into an ecosystem of large, corporate-controlled mega-platforms which intermediate speech online. In many ways this has been a positive development; these platforms improved usability and enabled billions of people to publish and discover content without having to become experts on the Web’s intricate protocols.

But in other ways this development is alarming. Just a few large platforms drive most traffic to online news sources in the U.S., and thus have enormous influence over what sources of information the public consumes on a daily basis. The existence of these consolidated points of control is troubling for many reasons. A small number of stakeholders end up having outsized influence over the content the public can create and consume. This leads to problems ranging from censorship at the behest of national governments to more subtle, perhaps even unintentional, bias in the curation of content users see based on opaque, unaudited curation algorithms. The platforms that host our networked public sphere and inform us about the world are unelected, unaccountable, and often impossible to audit or oversee.

At the same time, there is growing excitement around the area of decentralized systems, which have grown in prominence over the past decade thanks to the popularity of the cryptocurrency Bitcoin. Bitcoin is a payment system that has no central points of control, and uses a novel peer-to-peer network protocol to agree on a distributed ledger of transactions, the blockchain. Bitcoin paints a picture of a world where untrusted networks of computers can coordinate to provide important infrastructure, like verifiable identity and distributed storage. Advocates of these decentralized systems propose related technology as the way forward to “re-decentralize” the Web, by shifting publishing and discovery out of the hands of a few corporations, and back into the hands of users. These types of code-based, structural interventions are appealing because in theory, they are less corruptible and resistant to corporate or political regulation. Surprisingly, low-level, decentralized systems don’t necessarily translate into decreased market consolidation around user-facing mega-platforms.

In this report, we explore two important ways structurally decentralized systems could help address the risks of mega-platform consolidation: First, these systems can help users directly publish and discover content directly, without intermediaries, and thus without censorship. All of the systems we evaluate advertise censorship-resistance as a major benefit. Second, these systems could indirectly enable greater competition and user choice, by lowering the barrier to entry for new platforms. As it stands, it is difficult for users to switch between platforms (they must recreate all their data when moving to a new service) and most mega-platforms do not interoperate, so switching means leaving behind your social network. Some systems we evaluate directly address the issues of data portability and interoperability in an effort to support greater competition.

We offer case studies of the following decentralized publishing projects:

Freedom Box, a system for personal publishing
Diaspora, a federated social network
Mastodon, a federated Twitter-like service
Blockstack, a distributed system for online identity services
IPFS (Interplanetary File System), a distributed storage service with a proposed mechanism to incentivize resource sharing
Solid (Social Linked Data), a linked-data protocol that could act as a back-end for data sharing between social media networks
Appcoins, a digital currency framework that enables users to financially participate in ownership of platforms and protocols
Steemit, an online community that uses an appcoin to incentivize development and community participation in a social network

Considering these projects as a whole, we found a robust and fertile community of experimenters developing promising software. Many of the projects in this report are working on deeply exciting new ideas. Easy to use, peer-to-peer distributed storage systems change the landscape for content censorship and archiving. Appcoins may transform how new projects are launched online, making it possible to fund open-source development teams focused on developing shared protocols instead of independent companies. There is also a renewed interest in creating interoperable standards and protocols that can cross platforms.

However, we have reason to doubt that these decentralized systems alone will address the problems of exclusion and bias caused by today’s mega-platforms. For example, distributed, censorship-resistant storage does not help address problems related to bias in curation algorithms – content that doesn’t appear at the top of your feed might as well be invisible, even if it’s technically accessible. And though censorship-resistance and decentralization are noble goals that will undoubtedly appeal to tech-savvy and politically inclined users, most users are not ideologically motivated and have no interest in shouldering the additional cost and responsibility of running these complex systems directly. They will want to engage with the Web through friendlier, third-party publishing platforms, and these platforms will suffer from the same forces that drive consolidation today.

It’s important to remember that today’s mega-platforms are built on top of the Web’s already distributed and open protocols. The real issue to address is this natural tendency towards market consolidation. Underlying these concerns is the predominant business model for platforms on the Web – user-targeted advertising. Advertising based business models encourage the consolidation and the hoarding of user views and data, driving platforms to become ever larger.

The challenges decentralized systems as a whole face are as follows:

User and developer adoption. Technical feasibility alone does not guarantee the sort of widespread adoption necessary to build a useful social network. Some of the more mature tools developed in this space have faced serious difficulties in attracting a permanent user base, and the problems those platforms suffer from may hinder the growth of new systems as well. Social networks, in particular, are difficult to bootstrap due to network effects. We generally join social networks because our friends are already there. Systems like Steemit and Diaspora are currently incompatible with existing social networks, planning to supersede existing communities like Reddit and Facebook, rather than integrate with them. Taking a competitive, rather than complementary, position in the market creates a difficult barrier to entry for new projects. Similarly, interoperable protocols require adoption at the developer level. Solid, which hopes to bridge between existing and novel social networks, faces a serious adoption challenge: Why should developers choose to switch to Solid’s new data model, and what's the incentive for Facebook to make their data interoperable without legal requirements forcing them to do so?

Security. Another major issue is security. “Decentralized” networks generally means anyone can join, which implies these systems have to take strong precautions to enforce security, usually by pushing the responsibility of security to users in the form of managing public key cryptography. It is extremely difficult to develop software that is both cryptographically secure and easy to use. Most of these systems, like IPFS and Blockstack, do not yet have a good story for how users will manage their private cryptographic keys and gain a good mental model of complex security protocols. As companies like Signal, an encrypted messaging service, have recently demonstrated, this is not impossible to achieve, but it requires an intense focus on usability that we did not see in many of the tools we review in this report.

Monetization and incentives. Given that user data is so important for monetizing these platforms, there is little incentive for the mega-platforms to adopt interoperable protocols – they would rather own all the data. Similarly, content that is viewed and clicked on the most generates the most advertising revenue, so mega-platforms have an incentive to prioritize viral, attention-grabbing or feel-good content. Steemit offers a fascinating alternative model to prioritizing and monetizing content. However, it replaces opacity with a semi-transparent free market model that concentrates power in a few hands and, if not carefully crafted, might even incentivize more clickbait. Designing robust reward mechanisms for community-governed content is still an open problem, but if solved, this could be integral to placing curation control in the hands of a community.

Resisting market consolidation? Platforms benefit from economies of scale in multiple ways – it’s cheaper to acquire resources like storage and servers in bulk and as platforms become larger they become more useful as a social network and usually, more profitable. Even in decentralized systems like Bitcoin, there has been a natural market consolidation in the form of large mining pools. This type of consolidation into a few super-participants might be inevitable due to economies of scale. We are increasingly persuaded that this isn't necessarily a bad thing, and that a more realistic goal might be the development of a robust, competitive marketplace that offers a range of ground rules for online speech, rather than a return to a purely peer-to-peer architecture for communication online.

Recommendations: We advise investors–whether motivated by civic or fiscal concerns–both to watch this space closely and to advocate for the pre-conditions that we believe will enable a healthier marketplace for online publishing. A precondition for the success of these distributed platforms is a shift towards user-controlled data, the ownership of a user’s social graph and her intellectual property created online. It will be difficult for new platforms to develop without widespread support for efforts towards data portability and rights over data ownership. Data portability also enables new models for aggregation.

Small, thoughtfully curated news sources will be made more powerful by having access to the user data currently locked inside mega-platforms, but right now, federated clients that interoperate between different platforms are borderline illegal – fixing this may require adjusting overly broad regulations, like the Digital Millennium Copyright Act. We believe that these user-controlled data rights are essential to develop a more robust market and allow new efforts to emerge from existing communities. Though individual users might not directly care about or understand these rights, their adoption will free developers to create applications that leverage users’ existing data, so that they can provide compelling, interesting new experiences, even with a small user base.

In envisioning a marketplace for open speech platforms that support more generative and censorship-resistant discussions, we recommend focusing on supporting existing efforts that provide alternatives to Facebook’s opaque curation and ranking. These alternatives might look more like Reddit's sub-communities, with different rule sets to enable different types of conversation, overseen and administered by members of the community with a system for due process when contentious issues arise.

Funding developers directly to create a diverse ecosystem of publishing platforms and curation websites is another place to make a difference. In particular, foundations are in an excellent place to fund the development of user-friendly software to implement common security practices that are common across many applications. An example of this is Let’s Encrypt, which makes using secure HTTP (HTTPS) easier for small website administrators. Most small platforms do not have the resources to directly hire experts in usability and security.

Another fascinating space to watch and explore is that of Appcoins. Recently there has been a dramatic upsurge in the adoption of appcoins as a mechanism for funding new projects and platforms. Appcoins potentially provide a way to circumvent the existing open-source or VC-funded software development models to create systems where users collectively own their data. Creating an alternative business model to advertising could end up pushing the markets to create entirely new, different types of applications than the ones we’ve seen so far, which mainly rely on user data and views. New funding models means smaller projects could more easily bootstrap small, personalized communities. However, this space also has a lot of potential for scams, and it might be unreasonable to expect users to manage a financial stake in many different networks.

This report was written by Chelsea Barabas, Neha Narula, and Ethan Zuckerman with support from the Knight Foundation.

↧

How do I improve my storytelling?

August 19, 2017, 9:32 pm

≫ Next: Marko – An isomorphic UI framework similar to Vue

≪ Previous: Back to the Future: The Decentralized Web

Some people can tell the most boring story and keep an audience on the edges of their seats. On the other hand, when I try to tell people about the most interesting things I've experienced (and there are some pretty interesting ones), it tends to land with a thud. I seem to have no natural talent for storytelling, but I'm hoping it's something I can learn to improve.

This applies to both spoken and written communication, in social and business contexts. I think there must be some foundational techniques for structuring and telling stories to make them effective, but I haven't really been able to reverse engineer them.

Are there books, websites, or other materials I should look at to learn this? Or do you have any specific tips or guidance for me?

posted by primethyme to Writing & Language (12 answers total) 70 users marked this as a favorite

↧

Marko – An isomorphic UI framework similar to Vue

August 20, 2017, 12:33 am

≫ Next: Aptible is hiring a web security evangelist

≪ Previous: How do I improve my storytelling?

simple.

If you know HTML, CSS, and Javascript, you know Marko

fast.

Faster loads via streaming and a tiny (~10kb gzip) runtime

progressive.

From simple HTML templates to powerful UI components

trusted.

Marko is powering high-traffic websites like ebay.com

Change the data backing a view and Marko will automatically and efficiently update the resulting HTML

class {
    onCreate() {
        this.state = { count:0 };
    }
    increment() {
        this.state.count++;
    }
}
 
style {
    .count {
        color:#09c;
        font-size:3em;
    }
    .example-button {
        font-size:1em;
        padding:0.5em;
    }
}
 
<div.count>
    ${state.count}
</div>
<button.example-button on-click('increment')>
    Click me!
</button>

Write in a familiar HTML-like style or drop the angle brackets and use Marko's concise syntax

<!doctype html>
<html>
<head>
    <title>Hello Marko</title>
</head>
<body>
    <h1>My favorite colors</h1>
    <ul.colors>
        <li for(color in input.colors)>
            ${color}
        </li>
    </ul>
</body>
</html>

⇄

<!doctype html>

html

head

title -- Hello Marko

body

h1 -- My favorite colors

ul.colors

li for(color in input.colors)

-- ${color}

Fast enough to power animations at 60FPS

The above animation is <div> tags. No SVG, no CSS transitions/animations. It's all powered by Marko which does a full re-render every frame.

Marko provides first-class support for the Atom editor with syntax highlighting, Autocomplete for both HTML and custom tags, Hyperclick to quickly jump to referenced files and methods, and Pretty printing to keep your code readable.

Community plugins also provide syntax highlighting for Sublime, VS Code, Webstorm & others!

View all editor plugins

Remarkoable performance across the board

Color picker

Search results

Inferno

1,792ops/s

Marko

3,262ops/s

Preact

2,841ops/s

React

212ops/s

Vue

824ops/s

Node.js(server)

Inferno

4,542ops/s

Marko

6,713ops/s

Preact

2,615ops/s

React

1,974ops/s

Vue

2,940ops/s

Desktop Browsers(average)

Inferno

653ops/s

Marko

1,796ops/s

Preact

301ops/s

React

358ops/s

Vue

398ops/s

Mobile Browsers(average)

The color picker benchmark

This benchmark measures the time it takes to cycle through 133 colors. The selected color index changes every cycle. When the selected color index changes three things happen:

The new selected color is highlighted
The old selected color is unhighlighted
The selected color's name is shown at the end

This benchmark measures how well a large render tree is optimized when only a few nodes actually need to be updated.

View the code on GitHub

↧

Aptible is hiring a web security evangelist

August 20, 2017, 1:04 am

≫ Next: TreeSheets: Free Form Data Organizer

≪ Previous: Marko – An isomorphic UI framework similar to Vue

Do you like thinking, writing, and teaching others about Internet security? Have you worked as a software engineer?

Aptible is hiring an expert in web security and software development to help make the Internet safe for sensitive data.

It's true: software is eating the world, powered by the Internet. A growing wave of developers all over the world are using web-scale technology to build and deploy health care, fintech, banking, payments, IoT, blockchain, machine learning, and other B2B applications.

Security and compliance are mandatory from day one for these teams of developers, but many of them struggle. When their teams are small, they can’t afford to hire full-time for security or compliance. When they scale, they find that existing security and compliance management tools suck.

At Aptible, we make people-centered security products that help developer teams build security into their architecture and their organization’s culture:

Enclave is a container orchestration platform built for developers that automates security best practices and controls needed for deploying and scaling Dockerized apps in regulated industries.

Gridiron is like the missing QuickBooks for security management. It helps developers design and run security management programs that meet and exceed requirements like HIPAA, SOC 2, and ISO 27001. Customers use it to build trust with their own customers and partners, and prepare for certifications.

This role involves working on both product and growth/marketing.

Please be sure to include a cover letter and samples of previous work that demonstrate why you will be a good fit for this role.

↧

TreeSheets: Free Form Data Organizer

August 20, 2017, 12:45 am

≫ Next: What Happens to Creativity as We Age?

≪ Previous: Aptible is hiring a web security evangelist

The ultimate replacement for spreadsheets, mind mappers, outliners, PIMs, text editors and small databases.

Suitable for any kind of data organization, such as Todo lists, calendars, project management, brainstorming, organizing ideas, planning, requirements gathering, presentation of information, etc.

It's like a spreadsheet, immediately familiar, but much more suitable for complex data because it's hierarchical.
It's like a mind mapper, but more organized and compact.
It's like an outliner, but in more than one dimension.
It's like a text editor, but with structure.

Have a quick look at what the application looks like on the screenshots page, see how easy it is to use in the tutorial, then give it a download (above).

TreeSheets is exceptionally small & fast, so can sit in your system tray at all times: with several documents loaded representing the equivalent of almost 100 pages of text, it uses only 5MB of memory on Windows 7 (!)

TreeSheets is free & open source. Enjoy!

Visit this google group for discussion, bug reports and news updates / releases. Grab the source code from GitHub. google moderator can be used for feature suggestions. Please donate if you enjoy using this software. contact the author (Wouter van Oortmerssen) personally.

↧

What Happens to Creativity as We Age?

August 19, 2017, 9:08 am

≫ Next: Defending Internet Freedom through Decentralization: Back to the Future? [pdf]

≪ Previous: TreeSheets: Free Form Data Organizer

Presented with these two scenarios, most adults did indeed explain the events by talking about a single block, or about Sally’s traits — they gave the obvious explanation.

Then we added a twist. Another group of participants saw the same scenarios, but this time they saw an additional set of facts that made the unusual explanation more likely than the more obvious one. Would the participants go with the obvious explanation, or try something new?

When it came to explaining the physical machine, the pattern was straightforward. The preschoolers were most likely to come up with the creative, unusual explanation. The school-age children were somewhat less creative. And there was a dramatic drop at adolescence. Both the teenagers and the adults were the most likely to stick with the obvious explanation even when it didn’t fit the data.

But there was a different pattern when it came to the social problems. Once again the preschoolers were more likely to give the creative explanation than were the 6-year-olds or adults. Now, however, the teenagers were the most creative group of all. They were more likely to choose the unusual explanation than were either the 6-year-olds or the adults.

Why does creativity generally tend to decline as we age? One reason may be that as we grow older, we know more. That’s mostly an advantage, of course. But it also may lead us to ignore evidence that contradicts what we already think. We become too set in our ways to change.

Relatedly, the explanation may have to do with a tension between two kinds of thinking: what computer scientists call exploration and exploitation. When we face a new problem, we adults usually exploit the knowledge about the world we have acquired so far. We try to quickly find a pretty good solution that is close to the solutions we already have. On the other hand, exploration — trying something new — may lead us to a more unusual idea, a less obvious solution, a new piece of knowledge. But it may also mean that we waste time considering crazy possibilities that will never work, something both preschoolers and teenagers have been known to do.

This idea suggests a solution to the evolutionary paradox that is human childhood and adolescence. We humans have an exceptionally long childhood and prolonged adolescence. Why make human children so helpless for so long, and make human adults invest so much time and effort into caring for them?

The answer: Childhood and adolescence may, at least in part, be designed to resolve the tension between exploration and exploitation. Those periods of our life give us time to explore before we have to face the stern and earnest realities of grown-up life. Teenagers may no longer care all that much about how the physical world works. But they care a lot about exploring all the ways that the social world can be organized. And that may help each new generation change the world.

↧

Defending Internet Freedom through Decentralization: Back to the Future? [pdf]

August 19, 2017, 1:50 pm

≫ Next: YANFF – Yet Another Network Function Framework, from Intel

≪ Previous: What Happens to Creativity as We Age?

Download PDF

↧

YANFF – Yet Another Network Function Framework, from Intel

August 20, 2017, 12:56 am

≫ Next: Retguard: OpenBSD/Clang

≪ Previous: Defending Internet Freedom through Decentralization: Back to the Future? [pdf]

README.md

What it is

YANFF is a set of libraries for creating and deploying cloud-native Network Functions (NFs). It simplifies the creation of network functions without sacrificing performance. Network functions are written in Go using high-level abstractions like flows and packets.

YANFF is an Open Source BSD licensed project that runs mostly in Linux user land. The most recent patches and enhancements provided by the community are available in the master branch.

Getting YANFF

Use the go get command to download YANFF. You must first set your GOPATH

   export GOPATH=/my/local/directory
   go get -v -d github.com/intel-go/yanff

Go will download the sources into $GOPATH/src. It will try to build YANFF and fail with a message:

    can't load package: package github.com/intel-go/yanff: no buildable Go source files in /localdisk/work/rscohn1/ws/yanff-test/src/github.com/intel-go/yanff

Ignore the message for now. We need to install some dependencies before you can build.

Working with a github fork

If you are working on a fork, then the go get command will not put yanff in $GOPATH/src/github.com/intel-go. However, imports will continue to reference githb.com/intel-go. This is a feature of Go and not a problem in the way yanff is written. See stackoverflow article for a discussion. A simple way to resolve the problem is to use a symlink. If you are rscohn2 on github, and you forked yanff into your personal account, then do this:

    cd $GOPATH/src/github.com
    mkdir intel-go
    cd intel-go
    ln -s ../rscohn2/yanff .

Setting up the build and run environment

DPDK

YANFF uses DPDK, so you must setup your system to build and run DPDK. See System Requirements in the DPDK Getting Started Guide for Linux for more information.

After building a DPDK driver with the make command, you must register network cards to work with the DPDK driver, load necessary kernel modules, and bind cards to the modules. See Compiling the DPDK Target from Source and How to get best performance with NICs on Intel platforms in the DPDK Getting Started Guide for Linux for more information.

The kernel module, which is required for DPDK user-mode drivers, is built but not installed into kernel directory. You can load it using the full path to the module file: $GOPATH/src/github.com/intel-go/yanff/test/dpdk/dpdk-17.02/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko

Go

Use Go version 1.8 or higher. To check the version of Go, do:

    go version

Installing YANFF dependencies

    $GOPATH/src/github.com/intel-go/yanff/scripts/get-depends.sh

environment variables

    export PATH="$PATH:$GOPATH"/bin

Building YANFF

    cd $GOPATH/src/github.com/intel-go/yanff
    make -j8

Running YANFF

Documentation

Use: make doc

to generate full documentation. Alternatively, you can do:

    godoc -http=:6060

and browse the following URLs:

Tests

Invoking make in the top-level directory builds the testing framework and examples. YANFF distributed tests are packaged inside of Docker container images. There are also single node unit tests in some packages that you can run using the command:

     make testing

Docker images

To create Docker images on the local default target (either the default UNIX socket in /var/run/docker.sock or whatever is defined in the DOCKER_HOST variable), use the make images command.

To deploy Docker images for use in distributed testing, use the make deploy command. This command requires two environment variables:

YANFF_HOSTS="hostname1 hostname2 ... hostnameN"* - a list of all hostnames for deployed test Docker images
DOCKER_PORT=2375* - the port number to connect to Docker daemons running on hosts in the YANFF_HOSTS variable

To delete generated images in the default Docker target, use the make clean-images command.

Running tests

After the Docker images are deployed on all test hosts, you can run distributed network tests. The test framework is located in the test/main directory and accepts a JSON file with a test specification. There are predefined configs for performance and stability tests in the same directory. To run these tests, change hostname1 and hostname2 to the hosts from the YANFF_HOSTS list in these JSON files.

Cleaning-up

To clean all generated binaries, use the make clean command. To delete all deployed images listed in YANFF_HOSTS, use the make cleanall command.

Changing the DPDK sources

If you use the make command from YANFF directories, the DPDK driver is downloaded and patched automatically. To use different sources, apply the patch from test/dpdk/DPDK_17_02.patch. The patch makes it impossible to use indirect mbufs due to data races, but it improves YANFF performance. YANFF doesn't use indirect mbufs.

Contributing

If you want to contribute to YANFF, check our Contributing guide. We also recommend checking the 'janitorial' bugs in our list of open issues; these bugs can be solved without an extensive knowledge of YANFF. We would love to help you start contributing.

You can reach the YANFF development team via our mailing list.

↧

Retguard: OpenBSD/Clang

August 19, 2017, 1:43 pm

≫ Next: What's Overleaf? Five years on from flunking our YC interview

≪ Previous: YANFF – Yet Another Network Function Framework, from Intel

'RETGUARD' - MARC

[prev in list] [next in list] [prev in thread] [next in thread] 
List:       openbsd-tech
Subject:    RETGUARD
From:       Theo de Raadt <deraadt () openbsd ! org>
Date:       2017-08-19 19:57:05
Message-ID: 21482.1503172625 () cvs ! openbsd ! org
[Download message RAW]

This year I went to BSDCAN in Ottawa.  I spent much of it in the
'hallway track', and had an extended conversation with various people
regarding our existing security mitigations and hopes for new ones in
the future.  I spoke a lot with Todd Mortimer.  Apparently I told him
that I felt return-address protection was impossible, so a few weeks
later he sent a clang diff to address that issue...

The first diff is for amd64 and i386 only -- in theory RISC
architectures can follow this approach soon.

The mechanism is like a userland 'stackghost' in the function prologue
and epilogue.  The preamble XOR's the return address at top of stack
with the stack pointer value itself.  This perturbs by introducing
bits from ASLR.  The function epilogue undoes the transform
immediately before the RET instruction.  ROP attack methods are
impacted because existing gadgets are transformed to consist of
"<gadget artifacts> <mangle ret address> RET".  That pivots the return
sequence off the ROP chain in a highly unpredictable and inconvenient
fashion.

The compiler diff handles this for all the C code, but the assembly
functions have to be done by hand.  I did this work first for amd64,
and more recently for i386.  I've fixed most of the functions and only
a handful of complex ones remain.

For those who know about polymorphism and pop/jmp or JOP, we believe
once standard-RET is solved those concerns become easier to address
seperately in the future.  In any case a substantial reduction of
gadgets is powerful.

For those worried about introducing worse polymorphism with these
"xor; ret" epilogues themselves, the nested gadgets for 64bit and 32bit
variations are +1 "xor %esp,(%rsp); ret", +2 "and $0x24,%al; ret" and
+3 "and $0xc3,%al; int3".  Not bad.

Over the last two weeks, we have received help and advice to ensure
debuggers (gdb, egdb, ddb, lldb) can still handle these transformed
callframes.  Also in the kernel, we discovered we must use a smaller
XOR, because otherwise userland addresses are generated, and cannot
rely on SMEP as it is really new feature of the architecture.  There
were also issues with pthreads and dlsym, which leads to a series of
uplifts around __builtin_return_address and DWARF CFI.

Application of this diff doesn't require anything special, a system
can simply be built twice.  Or shortcut by building & installing
gnu/usr.bin/clang first, then a full build.

We are at the point where userland and base are fully working without
regressions, and the remaining impacts are in a few larger ports which
directly access the return address (for a variety of reasons).

So work needs to continue with handling the RET-addr swizzle in those
ports, and then we can move forward.

Index: gnu/llvm/lib/Target/X86/CMakeLists.txt
===================================================================
RCS file: /cvs/src/gnu/llvm/lib/Target/X86/CMakeLists.txt,v
retrieving revision 1.1.1.3
diff -u -p -u -r1.1.1.3 CMakeLists.txt
--- gnu/llvm/lib/Target/X86/CMakeLists.txt	24 Jan 2017 08:33:28 -0000	1.1.1.3
+++ gnu/llvm/lib/Target/X86/CMakeLists.txt	18 Aug 2017 21:15:04 -0000
@@ -45,6 +45,7 @@ set(sources
   X86MachineFunctionInfo.cpp
   X86OptimizeLEAs.cpp
   X86PadShortFunction.cpp
+  X86XorRetProtector.cpp
   X86RegisterInfo.cpp
   X86SelectionDAGInfo.cpp
   X86ShuffleDecodeConstantPool.cpp
Index: gnu/llvm/lib/Target/X86/X86.h
===================================================================
RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86.h,v
retrieving revision 1.1.1.3
diff -u -p -u -r1.1.1.3 X86.h
--- gnu/llvm/lib/Target/X86/X86.h	24 Jan 2017 08:33:27 -0000	1.1.1.3
+++ gnu/llvm/lib/Target/X86/X86.h	18 Aug 2017 21:15:04 -0000
@@ -50,6 +50,10 @@ FunctionPass *createX86IssueVZeroUpperPa
 /// This will prevent a stall when returning on the Atom.
 FunctionPass *createX86PadShortFunctions();
 
+/// Return a pass that adds xor instructions for return pointers
+/// on the stack
+FunctionPass *createX86XorRetProtectorPass(unsigned opt);
+
 /// Return a pass that selectively replaces certain instructions (like add,
 /// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA
 /// instructions, in order to eliminate execution delays in some processors.
Index: gnu/llvm/lib/Target/X86/X86TargetMachine.cpp
===================================================================
RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86TargetMachine.cpp,v
retrieving revision 1.1.1.3
diff -u -p -u -r1.1.1.3 X86TargetMachine.cpp
--- gnu/llvm/lib/Target/X86/X86TargetMachine.cpp	24 Jan 2017 08:33:27 -0000	1.1.1.3
+++ gnu/llvm/lib/Target/X86/X86TargetMachine.cpp	18 Aug 2017 21:15:04 -0000
@@ -260,6 +260,12 @@ UseVZeroUpper("x86-use-vzeroupper", cl::
   cl::desc("Minimize AVX to SSE transition penalty"),
   cl::init(true));
 
+static cl::opt<unsigned>
+XorRetProtector("x86-ret-protector", cl::NotHidden,
+  cl::desc("XOR return pointers in function preambles and before RETs."
+           "Argument = 1 for userland (xor full values)"
+           "Argument = 2 for kernel (xor lower half bits)"),
+  cl::init(0));
 //===----------------------------------------------------------------------===//
 // X86 TTI query.
 //===----------------------------------------------------------------------===//
@@ -402,4 +408,6 @@ void X86PassConfig::addPreEmitPass() {
     addPass(createX86FixupLEAs());
     addPass(createX86EvexToVexInsts());
   }
+  if (XorRetProtector)
+    addPass(createX86XorRetProtectorPass(XorRetProtector));
 }
Index: gnu/llvm/lib/Target/X86/X86XorRetProtector.cpp
===================================================================
RCS file: gnu/llvm/lib/Target/X86/X86XorRetProtector.cpp
diff -N gnu/llvm/lib/Target/X86/X86XorRetProtector.cpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ gnu/llvm/lib/Target/X86/X86XorRetProtector.cpp	18 Aug 2017 21:15:04 -0000
@@ -0,0 +1,148 @@
+//===-------- X86XorRetProtector.cpp - xor return pointers -----------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines a pass that will xor the return pointer in
+// each function preamble, and before any ret.
+//
+//===----------------------------------------------------------------------===//
+
+#include <algorithm>
+
+#include "X86.h"
+#include "X86InstrInfo.h"
+#include "X86Subtarget.h"
+#include "X86InstrBuilder.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Function.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "x86-ret-protector"
+
+namespace {
+    struct X86XorRetProtector : public MachineFunctionPass {
+        static char ID;
+        X86XorRetProtector(bool kernel) : MachineFunctionPass(ID)
+                          , STI(nullptr), TII(nullptr), isKernel(kernel) {}
+
+        bool runOnMachineFunction(MachineFunction &MF) override;
+
+        MachineFunctionProperties getRequiredProperties() const override {
+            return MachineFunctionProperties().set(
+                    MachineFunctionProperties::Property::NoVRegs);
+        }
+
+        StringRef getPassName() const override {
+            return "X86 XOR RET Instructions";
+        }
+
+        private:
+        void addXORInst(MachineBasicBlock &MBB, MachineInstr &MI);
+
+        const X86Subtarget *STI;
+        const TargetInstrInfo *TII;
+        bool is64bit;
+        bool isKernel;
+    };
+
+    char X86XorRetProtector::ID = 0;
+}
+
+FunctionPass *llvm::createX86XorRetProtectorPass(unsigned optval) {
+    return new X86XorRetProtector(optval == 2 ? true : false);
+}
+
+/// runOnMachineFunction - Loop over all of the basic blocks, inserting
+//  XORs before each function and each ret
+bool X86XorRetProtector::runOnMachineFunction(MachineFunction &MF) {
+    STI = &MF.getSubtarget<X86Subtarget>();
+    TII = STI->getInstrInfo();
+    is64bit = STI->is64Bit();
+
+    bool MadeChange = false;
+    for (auto &MBB : MF) {
+        for (auto &MI : MBB) {
+            if (MI.isReturn()) {
+                addXORInst(MBB, MI);
+                MadeChange = true;
+            }
+        }
+    }
+    if (MadeChange) {
+        for (auto &MBB : MF) {
+            if (!MBB.empty()) {
+                unsigned CFIIndex;
+                if (is64bit) {
+                    if (isKernel) {
+                        // cfi_escape exp RA len const -8 plus dup deref swap const4u 0xffffffff and xor
+                        MCCFIInstruction CFIInst = MCCFIInstruction::createEscape(nullptr,
+                                "\x16\x10\x0d\x09\xf8\x22\x12\x06\x16\x0c\xff\xff\xff\xff\x1a\x27");
+                        CFIIndex = MF.addFrameInst(CFIInst);
+                    } else { /* userland */
+                        // cfi_escape exp RA len const -8 plus dup deref xor
+                        MCCFIInstruction CFIInst = MCCFIInstruction::createEscape(nullptr,
+                                "\x16\x10\x06\x09\xf8\x22\x12\x06\x27");
+                        CFIIndex = MF.addFrameInst(CFIInst);
+                    }
+                } else { /* 32 bit */
+                    if (isKernel) {
+                        // cfi_escape exp RA len const -4 plus dup deref swap const2u 0xffff and  xor
+                        MCCFIInstruction CFIInst = MCCFIInstruction::createEscape(nullptr,
+                                "\x16\x08\x0b\x09\xfc\x22\x12\x06\x16\x0a\xff\xff\x1a\x27");
+                        CFIIndex = MF.addFrameInst(CFIInst);
+                    } else { /* userland */
+                        // cfi_escape exp RA len const -4 plus dup deref xor
+                        MCCFIInstruction CFIInst = MCCFIInstruction::createEscape(nullptr,
+                                "\x16\x08\x06\x09\xfc\x22\x12\x06\x27");
+                        CFIIndex = MF.addFrameInst(CFIInst);
+                    }
+                }
+                BuildMI(MBB, MBB.front(), MBB.front().getDebugLoc(),
+                        TII->get(TargetOpcode::CFI_INSTRUCTION))
+                    .addCFIIndex(CFIIndex);
+                addXORInst(MBB, MBB.front());
+                break;
+            }
+        }
+    }
+    return MadeChange;
+}
+
+/// addXORInst - Add an xor before the given MBBI
+void X86XorRetProtector::addXORInst(MachineBasicBlock &MBB, MachineInstr &MI) {
+    unsigned opcode, stackp, target;
+    if (is64bit) {
+        target = X86::RSP;
+        if (isKernel) {
+            opcode = X86::XOR32mr;
+            stackp = X86::ESP;
+        } else {
+            opcode = X86::XOR64mr;
+            stackp = X86::RSP;
+        }
+    } else { /* 32 bit */
+        target = X86::ESP;
+        if (isKernel) {
+            opcode = X86::XOR16mr;
+            stackp = X86::SP;
+        } else {
+            opcode = X86::XOR32mr;
+            stackp = X86::ESP;
+        }
+    }
+    addDirectMem(BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(opcode)), target)
+        .addReg(stackp);
+}
Index: gnu/llvm/tools/clang/include/clang/Driver/Options.td
===================================================================
RCS file: /cvs/src/gnu/llvm/tools/clang/include/clang/Driver/Options.td,v
retrieving revision 1.4
diff -u -p -u -r1.4 Options.td
--- gnu/llvm/tools/clang/include/clang/Driver/Options.td	24 Jan 2017 08:39:08 -0000	1.4
+++ gnu/llvm/tools/clang/include/clang/Driver/Options.td	18 Aug 2017 21:15:04 -0000
@@ -1207,6 +1207,10 @@ def fstack_protector_strong : Flag<["-"]
   HelpText<"Use a strong heuristic to apply stack protectors to functions">;
 def fstack_protector : Flag<["-"], "fstack-protector">, Group<f_Group>,
   HelpText<"Enable stack protectors for functions potentially vulnerable to stack smashing">;
+def fret_protector : Flag<["-"], "fret-protector">, Group<f_Group>,
+  HelpText<"Enable ret protection for all functions">;
+def fno_ret_protector : Flag<["-"], "fno-ret-protector">, Group<f_Group>,
+  HelpText<"Disable ret protection">;
 def fstandalone_debug : Flag<["-"], "fstandalone-debug">, Group<f_Group>, Flags<[CoreOption]>,
   HelpText<"Emit full debug info for all types used by the program">;
 def fno_standalone_debug : Flag<["-"], "fno-standalone-debug">, Group<f_Group>, Flags<[CoreOption]>,
Index: gnu/llvm/tools/clang/lib/Driver/Tools.cpp
===================================================================
RCS file: /cvs/src/gnu/llvm/tools/clang/lib/Driver/Tools.cpp,v
retrieving revision 1.14
diff -u -p -u -r1.14 Tools.cpp
--- gnu/llvm/tools/clang/lib/Driver/Tools.cpp	28 Jul 2017 15:31:54 -0000	1.14
+++ gnu/llvm/tools/clang/lib/Driver/Tools.cpp	18 Aug 2017 21:15:04 -0000
@@ -5507,6 +5507,24 @@ void Clang::ConstructJob(Compilation &C,
     CmdArgs.push_back(Args.MakeArgString(Twine(StackProtectorLevel)));
   }
+  // -ret-protector
+  if (Args.hasFlag(options::OPT_fret_protector, options::OPT_fno_ret_protector,
+			  true)) {
+    if (!Args.hasArg(options::OPT_pg)) {
+      CmdArgs.push_back(Args.MakeArgString("-D_RET_PROTECTOR"));
+      CmdArgs.push_back(Args.MakeArgString("-munwind-tables"));
+      CmdArgs.push_back(Args.MakeArgString("-mllvm"));
+      // Switch mode depending on kernel / nokernel
+      StringRef opt = "-x86-ret-protector=1";
+      StringRef ker = "kernel";
+      if (Arg *A = Args.getLastArg(options::OPT_mcmodel_EQ))
+        if (A->getValue() == ker)
+          opt = "-x86-ret-protector=2";
+
+      CmdArgs.push_back(Args.MakeArgString(Twine(opt)));
+    }
+  }
+
   // --param ssp-buffer-size=
   for (const Arg *A : Args.filtered(options::OPT__param)) {
     StringRef Str(A->getValue());
Index: gnu/usr.bin/clang/libLLVMX86CodeGen/Makefile
===================================================================
RCS file: /cvs/src/gnu/usr.bin/clang/libLLVMX86CodeGen/Makefile,v
retrieving revision 1.4
diff -u -p -u -r1.4 Makefile
--- gnu/usr.bin/clang/libLLVMX86CodeGen/Makefile	9 Jul 2017 15:28:35 -0000	1.4
+++ gnu/usr.bin/clang/libLLVMX86CodeGen/Makefile	18 Aug 2017 21:15:04 -0000
@@ -25,6 +25,7 @@ SRCS=	X86AsmPrinter.cpp \
 	X86InterleavedAccess.cpp \
 	X86MCInstLower.cpp \
 	X86MachineFunctionInfo.cpp \
+	X86OptimizeLEAs.cpp \
 	X86PadShortFunction.cpp \
 	X86RegisterInfo.cpp \
 	X86SelectionDAGInfo.cpp \
@@ -36,7 +37,7 @@ SRCS=	X86AsmPrinter.cpp \
 	X86VZeroUpper.cpp \
 	X86WinAllocaExpander.cpp \
 	X86WinEHState.cpp \
-	X86OptimizeLEAs.cpp
+	X86XorRetProtector.cpp
 .PATH:	${.CURDIR}/../../../llvm/lib/Target/X86
Index: lib/csu/amd64/md_init.h
===================================================================
RCS file: /cvs/src/lib/csu/amd64/md_init.h,v
retrieving revision 1.6
diff -u -p -u -r1.6 md_init.h
--- lib/csu/amd64/md_init.h	20 Mar 2016 02:32:39 -0000	1.6
+++ lib/csu/amd64/md_init.h	18 Aug 2017 14:39:40 -0000
@@ -50,6 +50,7 @@
 	"	.type " #entry_pt ",@function	\n" \
 	#entry_pt":				\n" \
 	"	.align 16			\n" \
+	"	xorq	%rsp,(%rsp) # RETGUARD	\n" \
 	"	subq	$8,%rsp			\n" \
 	"	.previous")
@@ -58,6 +59,7 @@
 	__asm (					\
 	".section "#sect",\"ax\",@progbits	\n" \
 	"	addq	$8,%rsp			\n" \
+	"	xorq	%rsp,(%rsp) # RETGUARD	\n" \
 	"	ret				\n" \
 	"	.previous")
@@ -114,11 +116,17 @@
 	"	.type	_dl_exit,@function		\n" \
 	"	.align	8				\n" \
 	"_dl_exit:					\n" \
+	"	.cfi_startproc				\n" \
+	"	xorq	%rsp,(%rsp) # RETGUARD		\n" \
+	"	.cfi_escape 0x16, 0x10, 0x06, 0x09, 0xf8, 0x22, 0x12, 0x06, 0x27\n" \
 	"	movl	$(1), %eax			\n" \
 	"	syscall					\n" \
 	"	jb	1f				\n" \
+	"	xorq	%rsp,(%rsp) # RETGUARD		\n" \
 	"	ret					\n" \
 	"1:						\n" \
 	"	neg	%rax				\n" \
+	"	xorq	%rsp,(%rsp) # RETGUARD		\n" \
 	"	ret					\n" \
+	"	.cfi_endproc				\n" \
 	"	.previous")
Index: lib/csu/i386/md_init.h
===================================================================
RCS file: /cvs/src/lib/csu/i386/md_init.h,v
retrieving revision 1.9
diff -u -p -u -r1.9 md_init.h
--- lib/csu/i386/md_init.h	11 Aug 2017 20:13:31 -0000	1.9
+++ lib/csu/i386/md_init.h	18 Aug 2017 14:39:13 -0000
@@ -50,6 +50,7 @@
 	"	.type " #entry_pt ",@function	\n" \
 	#entry_pt":				\n" \
 	"	.align 16			\n" \
+	"	xorl    %esp,(%esp) # RETGUARD	\n" \
 	"	pushl	%ebp			\n" \
 	"	movl	%esp,%ebp		\n" \
 	"	andl	$~15,%esp		\n" \
@@ -60,6 +61,7 @@
 	__asm (					\
 	".section "#sect",\"ax\",@progbits	\n" \
 	"	leave				\n" \
+	"	xorl	%esp,(%esp) # RETGUARD	\n" \
 	"	ret				\n" \
 	"	.previous")
@@ -122,7 +124,12 @@
 	"	.globl	_dl_exit		\n" \
 	"	.type	_dl_exit,@function	\n" \
 	"_dl_exit:				\n" \
+	"	.cfi_startproc			\n" \
+	"	.cfi_escape 0x16, 0x08, 0x06, 0x09, 0xfc, 0x22, 0x12, 0x06, 0x27 \n" \
+	"	xorl    %esp,(%esp) # RETGUARD	\n" \
 	"	mov	$1, %eax		\n" \
 	"	int	$0x80			\n" \
+	"	xorl	%esp,(%esp) # RETGUARD	\n" \
 	"	ret				\n" \
+	"	.cfi_endproc			\n" \
 	"	.previous")
Index: lib/libc/arch/amd64/DEFS.h
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/DEFS.h,v
retrieving revision 1.1
diff -u -p -u -r1.1 DEFS.h
--- lib/libc/arch/amd64/DEFS.h	14 Nov 2015 21:53:03 -0000	1.1
+++ lib/libc/arch/amd64/DEFS.h	18 Aug 2017 18:00:12 -0000
@@ -56,6 +56,6 @@
  *   END_STRONG(x)	Like DEF_STRONG() in C; for standard/reserved C names
  *   END_WEAK(x)	Like DEF_WEAK() in C; for non-ISO C names
  */
-#define	END_STRONG(x)	END(x); _HIDDEN_FALIAS(x,x); END(_HIDDEN(x))
+#define	END_STRONG(x)	END(x); _HIDDEN_FALIAS(x,x); _ASM_SIZE(_HIDDEN(x))
 #define	END_WEAK(x)	END_STRONG(x); .weak x
Index: lib/libc/arch/amd64/SYS.h
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/SYS.h,v
retrieving revision 1.20
diff -u -p -u -r1.20 SYS.h
--- lib/libc/arch/amd64/SYS.h	6 Sep 2016 18:33:35 -0000	1.20
+++ lib/libc/arch/amd64/SYS.h	18 Aug 2017 18:00:34 -0000
@@ -52,8 +52,8 @@
 #define	SYSCALL_END_HIDDEN(x)						\
 	END(_thread_sys_ ## x);						\
 	_HIDDEN_FALIAS(x,_thread_sys_##x);				\
-	END(_HIDDEN(x))
-#define	SYSCALL_END(x)		SYSCALL_END_HIDDEN(x); END(x)
+	_ASM_SIZE(_HIDDEN(x))
+#define	SYSCALL_END(x)		SYSCALL_END_HIDDEN(x); _ASM_SIZE(x)
 #define SET_ERRNO							\
@@ -66,9 +66,11 @@
 #define _SYSCALL_NOERROR(x,y)						\
 	SYSENTRY(x);							\
+	RETGUARD_START;							\
 	SYSTRAP(y)
 #define _SYSCALL_HIDDEN_NOERROR(x,y)					\
 	SYSENTRY_HIDDEN(x);						\
+	RETGUARD_START;							\
 	SYSTRAP(y)
 #define SYSCALL_NOERROR(x)						\
@@ -85,12 +87,15 @@
 /* return, handling errno for failed calls */
 #define _RSYSCALL_RET							\
 	jc	99f;							\
+	RETGUARD_END;							\
 	ret;								\
 	99: SET_ERRNO;							\
+	RETGUARD_END;							\
 	ret
 #define PSEUDO_NOERROR(x,y)						\
 	_SYSCALL_NOERROR(x,y);						\
+	RETGUARD_END;							\
 	ret;								\
 	SYSCALL_END(x)
Index: lib/libc/arch/amd64/gen/fabs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fabs.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 fabs.S
--- lib/libc/arch/amd64/gen/fabs.S	29 May 2015 08:50:12 -0000	1.7
+++ lib/libc/arch/amd64/gen/fabs.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
  */
 ENTRY(fabs)
+	RETGUARD_START
 	movsd	%xmm0, -8(%rsp)
 	fldl	-8(%rsp)
 	fabs
 	fstpl	-8(%rsp)
 	movsd	-8(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END(fabs)
Index: lib/libc/arch/amd64/gen/flt_rounds.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/flt_rounds.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 flt_rounds.S
--- lib/libc/arch/amd64/gen/flt_rounds.S	19 Aug 2017 18:23:00 -0000	1.7
+++ lib/libc/arch/amd64/gen/flt_rounds.S	19 Aug 2017 18:29:07 -0000
@@ -16,6 +16,7 @@ _map:
 	.byte 0		/* round to zero */
 ENTRY(__flt_rounds)
+	RETGUARD_START
 	fnstcw -4(%rsp)
 	movl -4(%rsp),%eax
 	shrl $10,%eax
@@ -26,5 +27,6 @@ ENTRY(__flt_rounds)
 #else
 	movb _map(,%rax,1),%al
 #endif
+	RETGUARD_END
 	ret
 END_STRONG(__flt_rounds)
Index: lib/libc/arch/amd64/gen/fpgetmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpgetmask.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 fpgetmask.S
--- lib/libc/arch/amd64/gen/fpgetmask.S	29 May 2015 08:50:12 -0000	1.2
+++ lib/libc/arch/amd64/gen/fpgetmask.S	18 Aug 2017 02:28:21 -0000
@@ -20,10 +20,12 @@ ENTRY(_fpgetmask)
 #else
 ENTRY(fpgetmask)
 #endif
+	RETGUARD_START
 	fnstcw -4(%rsp)
 	movl -4(%rsp),%eax
 	notl %eax
 	andl $63,%eax
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpgetmask)
Index: lib/libc/arch/amd64/gen/fpgetround.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpgetround.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 fpgetround.S
--- lib/libc/arch/amd64/gen/fpgetround.S	29 May 2015 08:50:12 -0000	1.2
+++ lib/libc/arch/amd64/gen/fpgetround.S	18 Aug 2017 02:28:21 -0000
@@ -19,10 +19,12 @@ ENTRY(_fpgetround)
 #else
 ENTRY(fpgetround)
 #endif
+	RETGUARD_START
 	fnstcw -4(%rsp)
 	movl -4(%rsp),%eax
 	rorl $10,%eax
 	andl $3,%eax
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpgetround)
Index: lib/libc/arch/amd64/gen/fpgetsticky.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpgetsticky.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 fpgetsticky.S
--- lib/libc/arch/amd64/gen/fpgetsticky.S	29 May 2015 08:50:12 -0000	1.2
+++ lib/libc/arch/amd64/gen/fpgetsticky.S	18 Aug 2017 02:28:21 -0000
@@ -20,11 +20,13 @@ ENTRY(_fpgetsticky)
 #else
 ENTRY(fpgetsticky)
 #endif
+	RETGUARD_START
 	fnstsw	-4(%rsp)
 	stmxcsr	-8(%rsp)
 	movl	-4(%rsp),%eax
 	orl	-8(%rsp),%eax
 	andl	$63,%eax
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpgetsticky)
Index: lib/libc/arch/amd64/gen/fpsetmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpsetmask.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 fpsetmask.S
--- lib/libc/arch/amd64/gen/fpsetmask.S	29 May 2015 08:50:12 -0000	1.2
+++ lib/libc/arch/amd64/gen/fpsetmask.S	18 Aug 2017 02:28:21 -0000
@@ -21,6 +21,7 @@ ENTRY(_fpsetmask)
 #else
 ENTRY(fpsetmask)
 #endif
+	RETGUARD_START
 	fnstcw	-4(%rsp)
 	stmxcsr	-8(%rsp)
 	andl	$63,%edi
@@ -39,6 +40,7 @@ ENTRY(fpsetmask)
 	fldcw	-4(%rsp)
 	ldmxcsr	-8(%rsp)
 	andl	$63,%eax
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpsetmask)
Index: lib/libc/arch/amd64/gen/fpsetround.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpsetround.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 fpsetround.S
--- lib/libc/arch/amd64/gen/fpsetround.S	29 May 2015 08:50:12 -0000	1.2
+++ lib/libc/arch/amd64/gen/fpsetround.S	18 Aug 2017 02:28:21 -0000
@@ -22,6 +22,7 @@ ENTRY(_fpsetround)
 #else
 ENTRY(fpsetround)
 #endif
+	RETGUARD_START
 	fnstcw	-4(%rsp)
 	stmxcsr	-8(%rsp)
@@ -46,6 +47,7 @@ ENTRY(fpsetround)
 	ldmxcsr	-8(%rsp)
 	fldcw	-4(%rsp)
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpsetround)
Index: lib/libc/arch/amd64/gen/fpsetsticky.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/fpsetsticky.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpsetsticky.S
--- lib/libc/arch/amd64/gen/fpsetsticky.S	29 May 2015 08:50:12 -0000	1.3
+++ lib/libc/arch/amd64/gen/fpsetsticky.S	18 Aug 2017 02:28:21 -0000
@@ -22,6 +22,8 @@ ENTRY(_fpsetsticky)
 #else
 ENTRY(fpsetsticky)
 #endif
+	RETGUARD_START
+
 	fnstenv	-28(%rsp)
 	stmxcsr	-32(%rsp)
@@ -43,6 +45,7 @@ ENTRY(fpsetsticky)
 	ldmxcsr	-32(%rsp)
 	fldenv	-28(%rsp)
+	RETGUARD_END
 	ret
 #ifdef WEAK_ALIAS
 END(_fpsetsticky)
Index: lib/libc/arch/amd64/gen/modf.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/modf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 modf.S
--- lib/libc/arch/amd64/gen/modf.S	29 May 2015 08:50:12 -0000	1.5
+++ lib/libc/arch/amd64/gen/modf.S	18 Aug 2017 02:28:21 -0000
@@ -51,6 +51,7 @@
 /* With CHOP mode on, frndint behaves as TRUNC does.  Useful. */
 ENTRY(modf)
+	RETGUARD_START
 	/*
 	 * Set chop mode.
@@ -88,5 +89,6 @@ ENTRY(modf)
 	fstpl	-8(%rsp)
 	movsd   -8(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END(modf)
Index: lib/libc/arch/amd64/gen/setjmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/setjmp.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 setjmp.S
--- lib/libc/arch/amd64/gen/setjmp.S	29 May 2016 22:39:21 -0000	1.7
+++ lib/libc/arch/amd64/gen/setjmp.S	18 Aug 2017 02:28:21 -0000
@@ -45,7 +45,7 @@
 	.globl	__jmpxor
 __jmpxor:
 	.zero 8*3		# (rbp, rsp, pc)
-	END(__jmpxor)
+//	END(__jmpxor)
 	.type	__jmpxor,@object
 /*
Index: lib/libc/arch/amd64/net/htonl.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/net/htonl.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 htonl.S
--- lib/libc/arch/amd64/net/htonl.S	29 May 2015 09:25:28 -0000	1.2
+++ lib/libc/arch/amd64/net/htonl.S	18 Aug 2017 02:28:21 -0000
@@ -5,7 +5,9 @@
 #include <machine/asm.h>
 ENTRY(htonl)
+	RETGUARD_START
 	movl    %edi,%eax
 	bswapl  %eax
+	RETGUARD_END
 	ret
 END(htonl)
Index: lib/libc/arch/amd64/net/htons.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/net/htons.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 htons.S
--- lib/libc/arch/amd64/net/htons.S	29 May 2015 09:25:28 -0000	1.3
+++ lib/libc/arch/amd64/net/htons.S	18 Aug 2017 02:28:21 -0000
@@ -5,7 +5,9 @@
 #include <machine/asm.h>
 ENTRY(htons)
+	RETGUARD_START
 	movl    %edi,%eax
 	xchgb   %ah,%al
+	RETGUARD_END
 	ret
 END(htons)
Index: lib/libc/arch/amd64/net/ntohl.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/net/ntohl.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 ntohl.S
--- lib/libc/arch/amd64/net/ntohl.S	29 May 2015 09:25:28 -0000	1.3
+++ lib/libc/arch/amd64/net/ntohl.S	18 Aug 2017 02:28:21 -0000
@@ -5,7 +5,9 @@
 #include <machine/asm.h>
 ENTRY(ntohl)
+	RETGUARD_START
 	movl    %edi,%eax
 	bswapl  %eax
+	RETGUARD_END
 	ret
 END(ntohl)
Index: lib/libc/arch/amd64/net/ntohs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/net/ntohs.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 ntohs.S
--- lib/libc/arch/amd64/net/ntohs.S	29 May 2015 09:25:28 -0000	1.3
+++ lib/libc/arch/amd64/net/ntohs.S	18 Aug 2017 02:28:21 -0000
@@ -5,7 +5,9 @@
 #include <machine/asm.h>
 ENTRY(ntohs)
+	RETGUARD_START
 	movl    %edi,%eax
 	xchgb   %ah,%al
+	RETGUARD_END
 	ret
 END(ntohs)
Index: lib/libc/arch/amd64/string/bcmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/bcmp.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 bcmp.S
--- lib/libc/arch/amd64/string/bcmp.S	14 Nov 2015 21:53:03 -0000	1.6
+++ lib/libc/arch/amd64/string/bcmp.S	18 Aug 2017 02:28:21 -0000
@@ -1,6 +1,7 @@
 #include "DEFS.h"
 ENTRY(bcmp)
+	RETGUARD_START
 	xorl	%eax,%eax		/* clear return value */
 	cld				/* set compare direction forward */
@@ -17,5 +18,6 @@ ENTRY(bcmp)
 	je	L2
 L1:	incl	%eax
-L2:	ret
+L2:	RETGUARD_END
+	ret
 END_WEAK(bcmp)
Index: lib/libc/arch/amd64/string/bzero.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/bzero.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 bzero.S
--- lib/libc/arch/amd64/string/bzero.S	14 Nov 2015 21:53:03 -0000	1.6
+++ lib/libc/arch/amd64/string/bzero.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "DEFS.h"
 ENTRY(bzero)
+	RETGUARD_START
 	movq	%rsi,%rdx
 	cld				/* set fill direction forward */
@@ -37,5 +38,6 @@ L1:	movq	%rdx,%rcx		/* zero remainder by
 	rep
 	stosb
+	RETGUARD_END
 	ret
 END_WEAK(bzero)
Index: lib/libc/arch/amd64/string/ffs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/ffs.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 ffs.S
--- lib/libc/arch/amd64/string/ffs.S	14 Nov 2015 21:53:03 -0000	1.4
+++ lib/libc/arch/amd64/string/ffs.S	18 Aug 2017 02:28:21 -0000
@@ -8,12 +8,15 @@
 #include "DEFS.h"
 ENTRY(ffs)
+	RETGUARD_START
 	bsfl	%edi,%eax
 	jz	L1	 		/* ZF is set if all bits are 0 */
 	incl	%eax			/* bits numbered from 1, not 0 */
+	RETGUARD_END
 	ret
 	_ALIGN_TEXT
 L1:	xorl	%eax,%eax		/* clear result */
+	RETGUARD_END
 	ret
 END_WEAK(ffs)
Index: lib/libc/arch/amd64/string/memchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/memchr.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 memchr.S
--- lib/libc/arch/amd64/string/memchr.S	14 Nov 2015 21:53:03 -0000	1.6
+++ lib/libc/arch/amd64/string/memchr.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "DEFS.h"
 ENTRY(memchr)
+	RETGUARD_START
 	movb	%sil,%al		/* set character to search for */
 	movq	%rdx,%rcx		/* set length of search */
 	testq	%rcx,%rcx		/* test for len == 0 */
@@ -16,7 +17,9 @@ ENTRY(memchr)
 	scasb
 	jne	L1			/* scan failed, return null */
 	leaq	-1(%rdi),%rax		/* adjust result of scan */
+	RETGUARD_END
 	ret
 L1:	xorq	%rax,%rax
+	RETGUARD_END
 	ret
 END_STRONG(memchr)
Index: lib/libc/arch/amd64/string/memmove.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/memmove.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 memmove.S
--- lib/libc/arch/amd64/string/memmove.S	14 Nov 2015 21:53:03 -0000	1.6
+++ lib/libc/arch/amd64/string/memmove.S	18 Aug 2017 02:28:21 -0000
@@ -41,11 +41,14 @@
 	 */
 ENTRY(bcopy)
+	RETGUARD_START
 	xchgq	%rdi,%rsi
-	/* fall into memmove */
+	jmp	9f
+END_WEAK(bcopy)
 ENTRY(memmove)
-	movq	%rdi,%r11	/* save dest */
+	RETGUARD_START
+9:	movq	%rdi,%r11	/* save dest */
 	movq	%rdx,%rcx
 	movq	%rdi,%rax
 	subq	%rsi,%rax
@@ -66,6 +69,7 @@ ENTRY(memmove)
 	rep
 	movsb
 	movq	%r11,%rax
+	RETGUARD_END
 	ret
 1:
 	addq	%rcx,%rdi	/* copy backwards. */
@@ -84,7 +88,7 @@ ENTRY(memmove)
 	movsq
 	movq	%r11,%rax
 	cld
+	RETGUARD_END
 	ret
 // END(memcpy)
 END_STRONG(memmove)
-END_WEAK(bcopy)
Index: lib/libc/arch/amd64/string/memset.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/memset.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 memset.S
--- lib/libc/arch/amd64/string/memset.S	14 Nov 2015 21:53:03 -0000	1.6
+++ lib/libc/arch/amd64/string/memset.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "DEFS.h"
 ENTRY(memset)
+	RETGUARD_START
 	movq	%rsi,%rax
 	andq	$0xff,%rax
 	movq	%rdx,%rcx
@@ -52,5 +53,6 @@ L1:	rep
 	stosb
 	movq	%r11,%rax
+	RETGUARD_END
 	ret
 END_STRONG(memset)
Index: lib/libc/arch/amd64/string/strchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/strchr.S,v
retrieving revision 1.8
diff -u -p -u -r1.8 strchr.S
--- lib/libc/arch/amd64/string/strchr.S	14 Nov 2015 21:53:03 -0000	1.8
+++ lib/libc/arch/amd64/string/strchr.S	18 Aug 2017 02:28:21 -0000
@@ -44,6 +44,7 @@ WEAK_ALIAS(index, strchr)
  */
 ENTRY(strchr)
+	RETGUARD_START
 	movabsq	$0x0101010101010101,%r8
 	movzbq	%sil,%rdx	/* value to search for (c) */
@@ -85,6 +86,7 @@ ENTRY(strchr)
 	bsf	%r11,%r11	/* 7, 15, 23 ... 63 */
 8:	shr	$3,%r11		/* 0, 1, 2 .. 7 */
 	lea	-8(%r11,%rdi),%rax
+	RETGUARD_END
 	ret
 /* End of string, check whether char is before NUL */
@@ -97,6 +99,7 @@ ENTRY(strchr)
 	cmp	%r11,%rax
 	jae	8b		/* return 'found' if same - searching for NUL */
 11:	xor	%eax,%eax	/* char not found */
+	RETGUARD_END
 	ret
 /* Source misaligned: read aligned word and make low bytes invalid */
Index: lib/libc/arch/amd64/string/strcmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/strcmp.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 strcmp.S
--- lib/libc/arch/amd64/string/strcmp.S	14 Nov 2015 21:53:03 -0000	1.7
+++ lib/libc/arch/amd64/string/strcmp.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "DEFS.h"
 ENTRY(strcmp)
+	RETGUARD_START
 	/*
 	 * Align s1 to word boundary.
 	 * Consider unrolling loop?
@@ -68,5 +69,6 @@ ENTRY(strcmp)
 	movzbq	%al,%rax
 	movzbq	%dl,%rdx
 	subq	%rdx,%rax
+	RETGUARD_END
 	ret
 END_STRONG(strcmp)
Index: lib/libc/arch/amd64/string/strlen.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/strlen.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 strlen.S
--- lib/libc/arch/amd64/string/strlen.S	11 Dec 2015 00:05:46 -0000	1.7
+++ lib/libc/arch/amd64/string/strlen.S	18 Aug 2017 02:28:21 -0000
@@ -112,6 +112,7 @@
  */
 ENTRY(strlen)
+	RETGUARD_START
 	movabsq	$0x0101010101010101,%r8
 	test	$7,%dil
@@ -139,6 +140,7 @@ ENTRY(strlen)
 	bsf	%rdx,%rdx		/* 7, 15, 23 ... 63 */
 	shr	$3,%rdx			/* 0, 1, 2 ... 7 */
 	lea	-8(%rax,%rdx),%rax
+	RETGUARD_END
 	ret
 /* Misaligned, read aligned word and make low bytes non-zero */
Index: lib/libc/arch/amd64/string/strrchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/string/strrchr.S,v
retrieving revision 1.8
diff -u -p -u -r1.8 strrchr.S
--- lib/libc/arch/amd64/string/strrchr.S	14 Nov 2015 21:53:03 -0000	1.8
+++ lib/libc/arch/amd64/string/strrchr.S	18 Aug 2017 02:28:21 -0000
@@ -11,6 +11,7 @@
 WEAK_ALIAS(rindex, strrchr)
 ENTRY(strrchr)
+	RETGUARD_START
 	movzbq	%sil,%rcx
 	/* zero return value */
@@ -120,5 +121,6 @@ ENTRY(strrchr)
 	jne	.Lloop
 .Ldone:
+	RETGUARD_END
 	ret
 END_STRONG(strrchr)
Index: lib/libc/arch/amd64/sys/brk.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/sys/brk.S,v
retrieving revision 1.10
diff -u -p -u -r1.10 brk.S
--- lib/libc/arch/amd64/sys/brk.S	19 Aug 2017 18:24:06 -0000	1.10
+++ lib/libc/arch/amd64/sys/brk.S	19 Aug 2017 18:29:07 -0000
@@ -45,11 +45,12 @@
 	.data
 __minbrk:
 	.quad	_end
-	END(__minbrk)
+	_ASM_SIZE(__minbrk)
 	.type	__minbrk,@object
 	.weak	brk
 ENTRY(brk)
+	RETGUARD_START
 	cmpq	%rdi,__minbrk(%rip)
 	jb	1f
 	movq	__minbrk(%rip),%rdi
@@ -58,8 +59,10 @@ ENTRY(brk)
 	jc	1f
 	movq	%rdi,__curbrk(%rip)
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
 1:
 	SET_ERRNO
+	RETGUARD_END
 	ret
 END(brk)
Index: lib/libc/arch/amd64/sys/sbrk.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/sys/sbrk.S,v
retrieving revision 1.10
diff -u -p -u -r1.10 sbrk.S
--- lib/libc/arch/amd64/sys/sbrk.S	19 Aug 2017 18:24:06 -0000	1.10
+++ lib/libc/arch/amd64/sys/sbrk.S	19 Aug 2017 19:08:23 -0000
@@ -50,11 +50,12 @@
 	.data
 __curbrk:
 	.quad	_end
-	END(__curbrk)
+	_ASM_SIZE(__curbrk)
 	.type	__curbrk,@object
 	.weak	sbrk
 ENTRY(sbrk)
+	RETGUARD_START
 	movq	__curbrk(%rip),%rax
 	movslq	%edi,%rsi
 	movq	%rsi,%rdi
@@ -63,8 +64,10 @@ ENTRY(sbrk)
 	jc	1f
 	movq	__curbrk(%rip),%rax
 	addq	%rsi,__curbrk(%rip)
+	RETGUARD_END
 	ret
 1:
 	SET_ERRNO
+	RETGUARD_END
 	ret
 END(sbrk)
Index: lib/libc/arch/amd64/sys/sigpending.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/sys/sigpending.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 sigpending.S
--- lib/libc/arch/amd64/sys/sigpending.S	17 Jun 2015 03:04:50 -0000	1.3
+++ lib/libc/arch/amd64/sys/sigpending.S	18 Aug 2017 02:28:21 -0000
@@ -42,5 +42,6 @@
 SYSCALL(sigpending)
 	movl	%eax,(%rdi)		# store old mask
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
 SYSCALL_END(sigpending)
Index: lib/libc/arch/amd64/sys/sigprocmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/sys/sigprocmask.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 sigprocmask.S
--- lib/libc/arch/amd64/sys/sigprocmask.S	7 May 2016 19:05:21 -0000	1.9
+++ lib/libc/arch/amd64/sys/sigprocmask.S	18 Aug 2017 02:28:21 -0000
@@ -40,6 +40,7 @@
 #include "SYS.h"
 SYSENTRY_HIDDEN(sigprocmask)
+	RETGUARD_START
 	testq	%rsi,%rsi		# check new sigset pointer
 	jnz	1f			# if not null, indirect
 	movl	$1,%edi			# SIG_BLOCK
@@ -52,8 +53,10 @@ SYSENTRY_HIDDEN(sigprocmask)
 	movl	%eax,(%rdx)		# store old mask
 3:
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
 1:
 	SET_ERRNO
+	RETGUARD_END
 	ret
 SYSCALL_END_HIDDEN(sigprocmask)
Index: lib/libc/arch/amd64/sys/sigsuspend.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/sys/sigsuspend.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 sigsuspend.S
--- lib/libc/arch/amd64/sys/sigsuspend.S	7 May 2016 19:05:21 -0000	1.7
+++ lib/libc/arch/amd64/sys/sigsuspend.S	18 Aug 2017 19:59:18 -0000
@@ -40,8 +40,10 @@
 #include "SYS.h"
 SYSENTRY_HIDDEN(sigsuspend)
+	RETGUARD_START
 	movl	(%rdi),%edi		# indirect to mask arg
 	SYSTRAP(sigsuspend)
 	SET_ERRNO
+	RETGUARD_END
 	ret
 SYSCALL_END_HIDDEN(sigsuspend)
Index: lib/libc/arch/i386/SYS.h
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/SYS.h,v
retrieving revision 1.26
diff -u -p -u -r1.26 SYS.h
--- lib/libc/arch/i386/SYS.h	1 Jun 2017 12:14:48 -0000	1.26
+++ lib/libc/arch/i386/SYS.h	18 Aug 2017 18:01:11 -0000
@@ -56,7 +56,7 @@
  *   END_STRONG(x)	Like DEF_STRONG() in C; for standard/reserved C names
  *   END_WEAK(x)	Like DEF_WEAK() in C; for non-ISO C names
  */
-#define	END_STRONG(x)	END(x); _HIDDEN_FALIAS(x,x); END(_HIDDEN(x))
+#define	END_STRONG(x)	END(x); _HIDDEN_FALIAS(x,x); _ASM_SIZE(_HIDDEN(x))
 #define	END_WEAK(x)	END_STRONG(x); .weak x
@@ -71,18 +71,19 @@
 /* Use both _thread_sys_{syscall} and [weak] {syscall}. */
 #define	SYSENTRY(x)					\
-			ENTRY(_thread_sys_##x);		\
-			WEAK_ALIAS(x, _thread_sys_##x)
+	ENTRY(_thread_sys_##x);				\
+	WEAK_ALIAS(x, _thread_sys_##x)
 #define	SYSENTRY_HIDDEN(x)				\
-			ENTRY(_thread_sys_ ## x)
-#define	__END_HIDDEN(x)	END(_thread_sys_ ## x);			\
-			_HIDDEN_FALIAS(x,_thread_sys_ ## x);	\
-			END(_HIDDEN(x))
-#define	__END(x)	__END_HIDDEN(x); END(x)
+	ENTRY(_thread_sys_ ## x)
+#define	__END_HIDDEN(x)	END(_thread_sys_ ## x);		\
+	_HIDDEN_FALIAS(x,_thread_sys_ ## x);		\
+	_ASM_SIZE(_HIDDEN(x))
+#define	__END(x)					\
+	__END_HIDDEN(x); _ASM_SIZE(x)
 #define	__DO_SYSCALL(x)					\
-			movl $(SYS_ ## x),%eax;		\
-			int $0x80
+	movl $(SYS_ ## x),%eax;				\
+	int $0x80
 #define SET_ERRNO()					\
 	movl	%eax,%gs:(TCB_OFFSET_ERRNO);		\
@@ -95,53 +96,58 @@
 /* perform a syscall */
 #define	_SYSCALL_NOERROR(x,y)				\
-		SYSENTRY(x);				\
-			__DO_SYSCALL(y);
+	SYSENTRY(x);					\
+	RETGUARD_START;					\
+	__DO_SYSCALL(y);
 #define	_SYSCALL_HIDDEN_NOERROR(x,y)			\
-		SYSENTRY_HIDDEN(x);			\
-			__DO_SYSCALL(y);
+	SYSENTRY_HIDDEN(x);				\
+	RETGUARD_START;					\
+	__DO_SYSCALL(y);
 #define	SYSCALL_NOERROR(x)				\
-		_SYSCALL_NOERROR(x,x)
+	_SYSCALL_NOERROR(x,x)
 /* perform a syscall, set errno */
 #define	_SYSCALL(x,y)					\
-			.text;				\
-			.align 2;			\
-		_SYSCALL_NOERROR(x,y)			\
-			HANDLE_ERRNO()
+	.text;						\
+	.align 2;					\
+	_SYSCALL_NOERROR(x,y)				\
+	HANDLE_ERRNO()
 #define	_SYSCALL_HIDDEN(x,y)				\
-			.text;				\
-			.align 2;			\
-		_SYSCALL_HIDDEN_NOERROR(x,y)		\
-			HANDLE_ERRNO()
+	.text;						\
+	.align 2;					\
+	_SYSCALL_HIDDEN_NOERROR(x,y)			\
+	HANDLE_ERRNO()
 #define	SYSCALL(x)					\
-		_SYSCALL(x,x)
+	_SYSCALL(x,x)
 #define	SYSCALL_HIDDEN(x)				\
-		_SYSCALL_HIDDEN(x,y)
+	_SYSCALL_HIDDEN(x,y)
 /* perform a syscall, return */
 #define	PSEUDO_NOERROR(x,y)				\
-		_SYSCALL_NOERROR(x,y);			\
-			ret;				\
-		__END(x)
+	_SYSCALL_NOERROR(x,y);				\
+	RETGUARD_END;					\
+	ret;						\
+	__END(x)
 /* perform a syscall, set errno, return */
 #define	PSEUDO(x,y)					\
-		_SYSCALL(x,y);				\
-			ret;				\
-		__END(x)
+	_SYSCALL(x,y);					\
+	RETGUARD_END;					\
+	ret;						\
+	__END(x)
 #define	PSEUDO_HIDDEN(x,y)				\
-		_SYSCALL_HIDDEN(x,y);			\
-			ret;				\
-		__END_HIDDEN(x)
+	_SYSCALL_HIDDEN(x,y);				\
+	RETGUARD_END;					\
+	ret;						\
+	__END_HIDDEN(x)
 /* perform a syscall with the same name, set errno, return */
 #define	RSYSCALL(x)					\
-			PSEUDO(x,x);
+	PSEUDO(x,x);
 #define	RSYSCALL_HIDDEN(x)				\
-			PSEUDO_HIDDEN(x,x)
+	PSEUDO_HIDDEN(x,x)
 #define	SYSCALL_END(x)	__END(x)
 #define	SYSCALL_END_HIDDEN(x)				\
-			__END_HIDDEN(x)
+	__END_HIDDEN(x)
Index: lib/libc/arch/i386/gen/divsi3.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/divsi3.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 divsi3.S
--- lib/libc/arch/i386/gen/divsi3.S	7 Aug 2005 11:30:38 -0000	1.5
+++ lib/libc/arch/i386/gen/divsi3.S	18 Aug 2017 02:28:21 -0000
@@ -34,7 +34,10 @@
 #include <machine/asm.h>
 ENTRY(__divsi3)
+	RETGUARD_START
 	movl 4(%esp),%eax
 	cltd
 	idivl 8(%esp)
+	RETGUARD_END
 	ret
+END(__divsi3)
Index: lib/libc/arch/i386/gen/fabs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fabs.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 fabs.S
--- lib/libc/arch/i386/gen/fabs.S	8 Jul 2011 22:28:33 -0000	1.9
+++ lib/libc/arch/i386/gen/fabs.S	18 Aug 2017 02:28:21 -0000
@@ -34,6 +34,9 @@
 #include <machine/asm.h>
 ENTRY(fabs)
+	RETGUARD_START
 	fldl	4(%esp)
 	fabs
+	RETGUARD_END
 	ret
+END(fabs)
Index: lib/libc/arch/i386/gen/fixdfsi.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fixdfsi.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 fixdfsi.S
--- lib/libc/arch/i386/gen/fixdfsi.S	7 Aug 2005 11:30:38 -0000	1.5
+++ lib/libc/arch/i386/gen/fixdfsi.S	18 Aug 2017 02:28:21 -0000
@@ -34,7 +34,10 @@
 #include <machine/asm.h>
 ENTRY(__fixdfsi)
+	RETGUARD_START
 	fldl	4(%esp)
 	fistpl	4(%esp)
 	movl	4(%esp),%eax
+	RETGUARD_END
 	ret
+END(__fixdfsi)
Index: lib/libc/arch/i386/gen/fixunsdfsi.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fixunsdfsi.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 fixunsdfsi.S
--- lib/libc/arch/i386/gen/fixunsdfsi.S	14 Nov 2014 07:31:13 -0000	1.7
+++ lib/libc/arch/i386/gen/fixunsdfsi.S	18 Aug 2017 02:28:21 -0000
@@ -34,6 +34,7 @@
 #include <machine/asm.h>
 ENTRY(__fixunsdfsi)
+	RETGUARD_START
 	fldl	4(%esp)		/* argument double to accum stack */
 	frndint			/* create integer */
 #ifdef __PIC__
@@ -50,6 +51,7 @@ ENTRY(__fixunsdfsi)
 	fistpl	4(%esp)
 	movl	4(%esp),%eax
+	RETGUARD_END
 	ret
 1:
@@ -64,6 +66,8 @@ ENTRY(__fixunsdfsi)
 	fistpl	4(%esp)		/* convert */
 	movl	4(%esp),%eax
 	orl	$0x80000000,%eax	/* restore bias */
+	RETGUARD_END
 	ret
+END(__fixunsdfsi)
 fbiggestsigned:	.double	2147483648.0
Index: lib/libc/arch/i386/gen/flt_rounds.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/flt_rounds.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 flt_rounds.S
--- lib/libc/arch/i386/gen/flt_rounds.S	19 Aug 2017 18:23:00 -0000	1.6
+++ lib/libc/arch/i386/gen/flt_rounds.S	19 Aug 2017 18:29:07 -0000
@@ -15,6 +15,7 @@ _map:
 	.byte 0		/* round to zero */
 ENTRY(__flt_rounds)
+	RETGUARD_START
 	subl $4,%esp
 	fnstcw (%esp)
 	movl (%esp),%eax
@@ -29,5 +30,6 @@ ENTRY(__flt_rounds)
 	movb _map(,%eax,1),%al
 #endif
 	addl $4,%esp
+	RETGUARD_END
 	ret
 END_STRONG(__flt_rounds);
Index: lib/libc/arch/i386/gen/fpgetmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpgetmask.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpgetmask.S
--- lib/libc/arch/i386/gen/fpgetmask.S	7 Aug 2005 11:30:38 -0000	1.3
+++ lib/libc/arch/i386/gen/fpgetmask.S	18 Aug 2017 02:28:21 -0000
@@ -7,10 +7,13 @@
 #include <machine/asm.h>
 ENTRY(fpgetmask)
+	RETGUARD_START
 	subl $4,%esp
 	fnstcw (%esp)
 	movl (%esp),%eax
 	notl %eax
 	andl $63,%eax
 	addl $4,%esp
+	RETGUARD_END
 	ret
+END(fpgetmask)
Index: lib/libc/arch/i386/gen/fpgetround.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpgetround.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 fpgetround.S
--- lib/libc/arch/i386/gen/fpgetround.S	21 Jun 2009 00:38:22 -0000	1.4
+++ lib/libc/arch/i386/gen/fpgetround.S	18 Aug 2017 02:28:21 -0000
@@ -7,10 +7,13 @@
 #include <machine/asm.h>
 ENTRY(fpgetround)
+	RETGUARD_START
 	subl $4,%esp
 	fnstcw (%esp)
 	movl (%esp),%eax
 	rorl $10,%eax
 	andl $3,%eax
 	addl $4,%esp
+	RETGUARD_END
 	ret
+END(fpgetround)
Index: lib/libc/arch/i386/gen/fpgetsticky.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpgetsticky.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpgetsticky.S
--- lib/libc/arch/i386/gen/fpgetsticky.S	7 Aug 2005 11:30:38 -0000	1.3
+++ lib/libc/arch/i386/gen/fpgetsticky.S	18 Aug 2017 02:28:21 -0000
@@ -7,9 +7,12 @@
 #include <machine/asm.h>
 ENTRY(fpgetsticky)
+	RETGUARD_START
 	subl $4,%esp
 	fnstsw (%esp)
 	movl (%esp),%eax
 	andl $63,%eax
 	addl $4,%esp
+	RETGUARD_END
 	ret
+END(fpgetsticky)
Index: lib/libc/arch/i386/gen/fpsetmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpsetmask.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpsetmask.S
--- lib/libc/arch/i386/gen/fpsetmask.S	7 Aug 2005 11:30:38 -0000	1.3
+++ lib/libc/arch/i386/gen/fpsetmask.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(fpsetmask)
+	RETGUARD_START
 	subl $4,%esp
 	fnstcw (%esp)
@@ -24,4 +25,6 @@ ENTRY(fpsetmask)
 	fldcw (%esp)
 	addl $4,%esp
+	RETGUARD_END
 	ret
+END(fpsetmask)
Index: lib/libc/arch/i386/gen/fpsetround.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpsetround.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpsetround.S
--- lib/libc/arch/i386/gen/fpsetround.S	7 Aug 2005 11:30:38 -0000	1.3
+++ lib/libc/arch/i386/gen/fpsetround.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(fpsetround)
+	RETGUARD_START
 	subl $4,%esp
 	fnstcw (%esp)
@@ -25,4 +26,6 @@ ENTRY(fpsetround)
 	fldcw (%esp)
 	addl $4,%esp
+	RETGUARD_END
 	ret
+END(fpsetround)
Index: lib/libc/arch/i386/gen/fpsetsticky.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/fpsetsticky.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 fpsetsticky.S
--- lib/libc/arch/i386/gen/fpsetsticky.S	7 Aug 2005 11:30:38 -0000	1.3
+++ lib/libc/arch/i386/gen/fpsetsticky.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(fpsetsticky)
+	RETGUARD_START
 	subl $28,%esp
 	fnstenv (%esp)
@@ -23,4 +24,6 @@ ENTRY(fpsetsticky)
 	fldenv (%esp)
 	addl $28,%esp
+	RETGUARD_END
 	ret
+END(fpsetsticky)
Index: lib/libc/arch/i386/gen/modf.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/modf.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 modf.S
--- lib/libc/arch/i386/gen/modf.S	8 Jul 2011 22:28:33 -0000	1.7
+++ lib/libc/arch/i386/gen/modf.S	18 Aug 2017 02:28:21 -0000
@@ -43,6 +43,7 @@
 /* With CHOP mode on, frndint behaves as TRUNC does.  Useful. */
 ENTRY(modf)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp,%ebp
 	subl	$16,%esp
@@ -65,4 +66,6 @@ ENTRY(modf)
 	jmp	L1
 L1:
 	leave
+	RETGUARD_END
 	ret
+END(modf)
Index: lib/libc/arch/i386/gen/setjmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/setjmp.S,v
retrieving revision 1.11
diff -u -p -u -r1.11 setjmp.S
--- lib/libc/arch/i386/gen/setjmp.S	30 May 2016 02:11:21 -0000	1.11
+++ lib/libc/arch/i386/gen/setjmp.S	18 Aug 2017 18:03:48 -0000
@@ -39,7 +39,7 @@
 	.hidden	__jmpxor
 __jmpxor:
 	.zero	4*3		# (eip, esp, ebp)
-	END(__jmpxor)
+	_ASM_SIZE(__jmpxor)
 	.type	__jmpxor,@object
Index: lib/libc/arch/i386/gen/udivsi3.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/gen/udivsi3.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 udivsi3.S
--- lib/libc/arch/i386/gen/udivsi3.S	7 Aug 2005 11:30:38 -0000	1.5
+++ lib/libc/arch/i386/gen/udivsi3.S	18 Aug 2017 02:28:21 -0000
@@ -34,7 +34,10 @@
 #include <machine/asm.h>
 ENTRY(__udivsi3)
+	RETGUARD_START
 	movl 4(%esp),%eax
 	xorl %edx,%edx
 	divl 8(%esp)
+	RETGUARD_END
 	ret
+END(__udivsi3)
Index: lib/libc/arch/i386/net/htonl.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/net/htonl.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 htonl.S
--- lib/libc/arch/i386/net/htonl.S	28 Oct 2009 06:49:54 -0000	1.4
+++ lib/libc/arch/i386/net/htonl.S	18 Aug 2017 02:28:21 -0000
@@ -34,8 +34,11 @@
 /* netorder = htonl(hostorder) */
 ENTRY(htonl)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	rorw	$8,%ax
 	roll	$16,%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(htonl)
Index: lib/libc/arch/i386/net/htons.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/net/htons.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 htons.S
--- lib/libc/arch/i386/net/htons.S	28 Oct 2009 06:49:54 -0000	1.4
+++ lib/libc/arch/i386/net/htons.S	18 Aug 2017 02:28:21 -0000
@@ -34,6 +34,9 @@
 /* netorder = htons(hostorder) */
 ENTRY(htons)
+	RETGUARD_START
 	movzwl	4(%esp),%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(htons)
Index: lib/libc/arch/i386/net/ntohl.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/net/ntohl.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 ntohl.S
--- lib/libc/arch/i386/net/ntohl.S	28 Oct 2009 06:49:54 -0000	1.4
+++ lib/libc/arch/i386/net/ntohl.S	18 Aug 2017 02:28:21 -0000
@@ -34,8 +34,11 @@
 /* hostorder = ntohl(netorder) */
 ENTRY(ntohl)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	rorw	$8,%ax
 	roll	$16,%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(ntohl)
Index: lib/libc/arch/i386/net/ntohs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/net/ntohs.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 ntohs.S
--- lib/libc/arch/i386/net/ntohs.S	28 Oct 2009 06:49:54 -0000	1.4
+++ lib/libc/arch/i386/net/ntohs.S	18 Aug 2017 02:28:21 -0000
@@ -34,6 +34,9 @@
 /* hostorder = ntohs(netorder) */
 ENTRY(ntohs)
+	RETGUARD_START
 	movzwl	4(%esp),%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(ntohs)
Index: lib/libc/arch/i386/stdlib/abs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/stdlib/abs.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 abs.S
--- lib/libc/arch/i386/stdlib/abs.S	13 Sep 2015 16:27:59 -0000	1.6
+++ lib/libc/arch/i386/stdlib/abs.S	18 Aug 2017 02:28:21 -0000
@@ -33,9 +33,11 @@
 #include "SYS.h"
 ENTRY(abs)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	testl	%eax,%eax
 	jns	1f
 	negl	%eax
-1:	ret
+1:	RETGUARD_END
+	ret
 END_STRONG(abs)
Index: lib/libc/arch/i386/stdlib/div.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/stdlib/div.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 div.S
--- lib/libc/arch/i386/stdlib/div.S	13 Sep 2015 16:27:59 -0000	1.6
+++ lib/libc/arch/i386/stdlib/div.S	18 Aug 2017 02:28:21 -0000
@@ -6,11 +6,13 @@
 #include "SYS.h"
 ENTRY(div)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	movl	8(%esp),%ecx
 	cdq
 	idiv	%ecx
 	movl	%eax,4(%esp)
 	movl	%edx,8(%esp)
+	RETGUARD_END
 	ret
 END_STRONG(div)
Index: lib/libc/arch/i386/stdlib/labs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/stdlib/labs.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 labs.S
--- lib/libc/arch/i386/stdlib/labs.S	13 Sep 2015 16:27:59 -0000	1.6
+++ lib/libc/arch/i386/stdlib/labs.S	18 Aug 2017 02:28:21 -0000
@@ -33,9 +33,11 @@
 #include "SYS.h"
 ENTRY(labs)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	testl	%eax,%eax
 	jns	1f
 	negl	%eax
-1:	ret
+1:	RETGUARD_END
+	ret
 END_STRONG(labs)
Index: lib/libc/arch/i386/stdlib/ldiv.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/stdlib/ldiv.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 ldiv.S
--- lib/libc/arch/i386/stdlib/ldiv.S	13 Sep 2015 16:27:59 -0000	1.6
+++ lib/libc/arch/i386/stdlib/ldiv.S	18 Aug 2017 02:28:21 -0000
@@ -6,11 +6,13 @@
 #include "SYS.h"
 ENTRY(ldiv)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	movl	8(%esp),%ecx
 	cdq
 	idiv	%ecx
 	movl	%eax,4(%esp)
 	movl	%edx,8(%esp)
+	RETGUARD_END
 	ret
 END_STRONG(ldiv)
Index: lib/libc/arch/i386/string/bcmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/bcmp.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 bcmp.S
--- lib/libc/arch/i386/string/bcmp.S	31 Aug 2015 02:53:56 -0000	1.4
+++ lib/libc/arch/i386/string/bcmp.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "SYS.h"
 ENTRY(bcmp)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%esi
 	movl	12(%esp),%edi
@@ -29,5 +30,6 @@ ENTRY(bcmp)
 L1:	incl	%eax
 L2:	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
 END_WEAK(bcmp)
Index: lib/libc/arch/i386/string/bzero.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/bzero.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 bzero.S
--- lib/libc/arch/i386/string/bzero.S	31 Aug 2015 02:53:56 -0000	1.5
+++ lib/libc/arch/i386/string/bzero.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "SYS.h"
 ENTRY(bzero)
+	RETGUARD_START
 	pushl	%edi
 	movl	8(%esp),%edi
 	movl	12(%esp),%edx
@@ -40,5 +41,6 @@ L1:	movl	%edx,%ecx		/* zero remainder by
 	stosb
 	popl	%edi
+	RETGUARD_END
 	ret
 END_WEAK(bzero)
Index: lib/libc/arch/i386/string/ffs.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/ffs.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 ffs.S
--- lib/libc/arch/i386/string/ffs.S	19 Aug 2017 18:25:50 -0000	1.5
+++ lib/libc/arch/i386/string/ffs.S	19 Aug 2017 18:29:07 -0000
@@ -7,12 +7,15 @@
 #include "SYS.h"
 ENTRY(ffs)
+	RETGUARD_START
 	bsfl	4(%esp),%eax
 	jz	L1	 		/* ZF is set if all bits are 0 */
 	incl	%eax			/* bits numbered from 1, not 0 */
+	RETGUARD_END
 	ret
 	.align 2,0xcc
 L1:	xorl	%eax,%eax		/* clear result */
+	RETGUARD_END
 	ret
 END_WEAK(ffs)
Index: lib/libc/arch/i386/string/memchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/memchr.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 memchr.S
--- lib/libc/arch/i386/string/memchr.S	19 Aug 2017 18:25:50 -0000	1.5
+++ lib/libc/arch/i386/string/memchr.S	19 Aug 2017 18:29:07 -0000
@@ -7,6 +7,7 @@
 #include "SYS.h"
 ENTRY(memchr)
+	RETGUARD_START
 	pushl	%edi
 	movl	8(%esp),%edi		/* string address */
 	movl	12(%esp),%eax		/* set character to search for */
@@ -19,9 +20,11 @@ ENTRY(memchr)
 	jne	L1			/* scan failed, return null */
 	leal	-1(%edi),%eax		/* adjust result of scan */
 	popl	%edi
+	RETGUARD_END
 	ret
 	.align 2,0xcc
 L1:	xorl	%eax,%eax
 	popl	%edi
+	RETGUARD_END
 	ret
 END_STRONG(memchr)
Index: lib/libc/arch/i386/string/memcmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/memcmp.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 memcmp.S
--- lib/libc/arch/i386/string/memcmp.S	31 Aug 2015 02:53:56 -0000	1.5
+++ lib/libc/arch/i386/string/memcmp.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "SYS.h"
 ENTRY(memcmp)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%esi
 	movl	12(%esp),%edi
@@ -28,6 +29,7 @@ ENTRY(memcmp)
 	xorl	%eax,%eax		/* we match, return zero	*/
 	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
 L5:	movl	$4,%ecx			/* We know that one of the next	*/
@@ -40,5 +42,6 @@ L6:	movzbl  -1(%edi),%eax		/* Perform un
 	subl	%edx,%eax
 	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
 END_STRONG(memcmp)
Index: lib/libc/arch/i386/string/memmove.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/memmove.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 memmove.S
--- lib/libc/arch/i386/string/memmove.S	31 Aug 2015 02:53:56 -0000	1.6
+++ lib/libc/arch/i386/string/memmove.S	18 Aug 2017 02:28:21 -0000
@@ -40,17 +40,20 @@
  * into memmove(), which handles overlapping regions.
  */
 ENTRY(bcopy)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%esi
 	movl	16(%esp),%edi
 	jmp	docopy
+END_STRONG(bcopy)
 /*
  * memmove(caddr_t dst, caddr_t src, size_t len);
  * Copy len bytes, coping with overlapping space.
  */
 ENTRY(memmove)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%edi
@@ -66,6 +69,7 @@ docopy:
  * memcpy() doesn't worry about overlap and always copies forward
  */
 // ENTRY(memcpy)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%edi
@@ -82,6 +86,7 @@ docopyf:
 	movsb
 	popl	%edi
 	popl	%esi
+	RETGUARD_END
 	ret
 	_ALIGN_TEXT
@@ -103,6 +108,6 @@ docopyf:
 	popl	%edi
 	popl	%esi
 	cld
+	RETGUARD_END
 	ret
 END_STRONG(memmove)
-END_WEAK(bcopy)
Index: lib/libc/arch/i386/string/memset.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/memset.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 memset.S
--- lib/libc/arch/i386/string/memset.S	31 Aug 2015 02:53:56 -0000	1.5
+++ lib/libc/arch/i386/string/memset.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include "SYS.h"
 ENTRY(memset)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%ebx
 	movl	12(%esp),%edi
@@ -52,5 +53,6 @@ L1:	rep
 	popl	%eax			/* pop address of buffer */
 	popl	%ebx
 	popl	%edi
+	RETGUARD_END
 	ret
 END_STRONG(memset)
Index: lib/libc/arch/i386/string/strcat.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strcat.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 strcat.S
--- lib/libc/arch/i386/string/strcat.S	31 Aug 2015 02:53:56 -0000	1.9
+++ lib/libc/arch/i386/string/strcat.S	18 Aug 2017 02:28:21 -0000
@@ -20,6 +20,7 @@
  */
 ENTRY(strcat)
+	RETGUARD_START
 	pushl	%edi			/* save edi */
 	movl	8(%esp),%edi		/* dst address */
 	movl	12(%esp),%edx		/* src address */
@@ -70,5 +71,6 @@ L1:	movb	(%edx),%al		/* unroll loop, but
 	jnz	L1
 L2:	popl	%eax			/* pop destination address */
 	popl	%edi			/* restore edi */
+	RETGUARD_END
 	ret
 END(strcat)
Index: lib/libc/arch/i386/string/strchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strchr.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 strchr.S
--- lib/libc/arch/i386/string/strchr.S	31 Aug 2015 02:53:56 -0000	1.7
+++ lib/libc/arch/i386/string/strchr.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 WEAK_ALIAS(index, strchr)
 ENTRY(strchr)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	movb	8(%esp),%cl
 	.align 2,0x90
@@ -21,5 +22,6 @@ L1:
 	jnz	L1
 	xorl	%eax,%eax
 L2:
+	RETGUARD_END
 	ret
 END_STRONG(strchr)
Index: lib/libc/arch/i386/string/strcmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strcmp.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 strcmp.S
--- lib/libc/arch/i386/string/strcmp.S	31 Aug 2015 02:53:56 -0000	1.4
+++ lib/libc/arch/i386/string/strcmp.S	18 Aug 2017 02:28:21 -0000
@@ -13,6 +13,7 @@
  */
 ENTRY(strcmp)
+	RETGUARD_START
 	movl	0x04(%esp),%eax
 	movl	0x08(%esp),%edx
 	jmp	L2			/* Jump into the loop! */
@@ -78,5 +79,6 @@ L2:	movb	(%eax),%cl
 L3:	movzbl	(%eax),%eax		/* unsigned comparison */
 	movzbl	(%edx),%edx
 	subl	%edx,%eax
+	RETGUARD_END
 	ret
 END_STRONG(strcmp)
Index: lib/libc/arch/i386/string/strcpy.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strcpy.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 strcpy.S
--- lib/libc/arch/i386/string/strcpy.S	31 Aug 2015 02:53:56 -0000	1.9
+++ lib/libc/arch/i386/string/strcpy.S	18 Aug 2017 02:28:21 -0000
@@ -20,6 +20,7 @@
  */
 ENTRY(strcpy)
+	RETGUARD_START
 	movl	4(%esp),%ecx		/* dst address */
 	movl	8(%esp),%edx		/* src address */
 	pushl	%ecx			/* push dst address */
@@ -60,5 +61,6 @@ L1:	movb	(%edx),%al		/* unroll loop, but
 	testb	%al,%al
 	jnz	L1
 L2:	popl	%eax			/* pop dst address */
+	RETGUARD_END
 	ret
 END(strcpy)
Index: lib/libc/arch/i386/string/strncmp.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strncmp.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 strncmp.S
--- lib/libc/arch/i386/string/strncmp.S	19 Aug 2017 18:25:50 -0000	1.5
+++ lib/libc/arch/i386/string/strncmp.S	19 Aug 2017 18:29:07 -0000
@@ -13,6 +13,7 @@
  */
 ENTRY(strncmp)
+	RETGUARD_END
 	pushl	%ebx
 	movl	8(%esp),%eax
 	movl	12(%esp),%ecx
@@ -106,9 +107,11 @@ L3:	movzbl	(%eax),%eax		/* unsigned comp
 	movzbl	(%ecx),%ecx
 	subl	%ecx,%eax
 	popl	%ebx
+	RETGUARD_END
 	ret
 	.align 2,0xcc
 L4:	xorl	%eax,%eax
 	popl	%ebx
+	RETGUARD_END
 	ret
 END_STRONG(strncmp)
Index: lib/libc/arch/i386/string/strrchr.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/string/strrchr.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 strrchr.S
--- lib/libc/arch/i386/string/strrchr.S	31 Aug 2015 02:53:56 -0000	1.7
+++ lib/libc/arch/i386/string/strrchr.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 WEAK_ALIAS(rindex, strrchr)
 ENTRY(strrchr)
+	RETGUARD_START
 	pushl	%ebx
 	movl	8(%esp),%edx
 	movb	12(%esp),%cl
@@ -24,5 +25,6 @@ L2:
 	testb	%bl,%bl			/* null terminator??? */
 	jnz	L1
 	popl	%ebx
+	RETGUARD_END
 	ret
 END_STRONG(strrchr)
Index: lib/libc/arch/i386/sys/brk.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/sys/brk.S,v
retrieving revision 1.13
diff -u -p -u -r1.13 brk.S
--- lib/libc/arch/i386/sys/brk.S	19 Aug 2017 18:24:06 -0000	1.13
+++ lib/libc/arch/i386/sys/brk.S	19 Aug 2017 18:29:07 -0000
@@ -39,11 +39,12 @@
 	.data
 __minbrk:
 	.long	_end
-	END(__minbrk)
+	_ASM_SIZE(__minbrk)
 	.type	__minbrk,@object
 	.weak	brk
 ENTRY(brk)
+	RETGUARD_START
 #ifdef __PIC__
 	movl	4(%esp),%ecx
 	PIC_PROLOGUE
@@ -77,8 +78,10 @@ ENTRY(brk)
 	xorl	%eax,%eax
 	movl	%ecx,__curbrk
 #endif
+	RETGUARD_END
 	ret
 2:
 	SET_ERRNO()
+	RETGUARD_END
 	ret
 END(brk)
Index: lib/libc/arch/i386/sys/sbrk.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/sys/sbrk.S,v
retrieving revision 1.13
diff -u -p -u -r1.13 sbrk.S
--- lib/libc/arch/i386/sys/sbrk.S	19 Aug 2017 18:24:06 -0000	1.13
+++ lib/libc/arch/i386/sys/sbrk.S	19 Aug 2017 18:29:07 -0000
@@ -39,11 +39,12 @@
 	.data
 __curbrk:	.long	_end
-	END(__curbrk)
+	_ASM_SIZE(__curbrk)
 	.type	__curbrk,@object
 	.weak	sbrk
 ENTRY(sbrk)
+	RETGUARD_START
 #ifdef __PIC__
 	movl	4(%esp),%ecx
 	PIC_PROLOGUE
@@ -71,8 +72,10 @@ ENTRY(sbrk)
 	movl	__curbrk,%eax
 	addl	%ecx,__curbrk
 #endif
+	RETGUARD_END
 	ret
 2:
 	SET_ERRNO()
+	RETGUARD_END
 	ret
 END(sbrk)
Index: lib/libc/arch/i386/sys/sigpending.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/sys/sigpending.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 sigpending.S
--- lib/libc/arch/i386/sys/sigpending.S	5 Sep 2015 06:22:47 -0000	1.5
+++ lib/libc/arch/i386/sys/sigpending.S	18 Aug 2017 02:28:21 -0000
@@ -37,5 +37,6 @@ SYSCALL(sigpending)
 	movl	4(%esp),%ecx		# fetch pointer to...
 	movl	%eax,(%ecx)		# store old mask
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
 SYSCALL_END(sigpending)
Index: lib/libc/arch/i386/sys/sigprocmask.S
===================================================================
RCS file: /cvs/src/lib/libc/arch/i386/sys/sigprocmask.S,v
retrieving revision 1.12
diff -u -p -u -r1.12 sigprocmask.S
--- lib/libc/arch/i386/sys/sigprocmask.S	7 May 2016 19:05:21 -0000	1.12
+++ lib/libc/arch/i386/sys/sigprocmask.S	18 Aug 2017 02:28:21 -0000
@@ -34,6 +34,7 @@
 #include "SYS.h"
 SYSENTRY_HIDDEN(sigprocmask)
+//	RETGUARD_START
 	movl	8(%esp),%ecx		# fetch new sigset pointer
 	testl	%ecx,%ecx		# check new sigset pointer
 	jnz	1f			# if not null, indirect
@@ -51,8 +52,10 @@ SYSENTRY_HIDDEN(sigprocmask)
 	movl	%eax,(%ecx)		# store old mask
 out:
 	xorl	%eax,%eax
+//	RETGUARD_END
 	ret
 1:
 	SET_ERRNO()
+//	RETGUARD_END
 	ret
 SYSCALL_END_HIDDEN(sigprocmask)
Index: lib/libm/arch/amd64/abi.h
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/abi.h,v
retrieving revision 1.5
diff -u -p -u -r1.5 abi.h
--- lib/libm/arch/amd64/abi.h	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/abi.h	18 Aug 2017 18:02:43 -0000
@@ -64,5 +64,5 @@
  *   END_STD(x)	Like DEF_STD() in C; for standard/reserved C names
  *   END_NONSTD(x)	Like DEF_NONSTD() in C; for non-ISO C names
  */
-#define	END_STD(x)	END(x); _HIDDEN_FALIAS(x,x); END(_HIDDEN(x))
+#define	END_STD(x)	END(x); _HIDDEN_FALIAS(x,x); _ASM_SIZE(_HIDDEN(x))
 #define	END_NONSTD(x)	END_STD(x); .weak x
Index: lib/libm/arch/amd64/e_acos.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_acos.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 e_acos.S
--- lib/libm/arch/amd64/e_acos.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/e_acos.S	18 Aug 2017 02:28:21 -0000
@@ -10,6 +10,7 @@
 /* acos = atan (sqrt(1 - x^2) / x) */
 ENTRY(acos)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE		/* x */
 	fld	%st(0)
@@ -20,5 +21,6 @@ ENTRY(acos)
 	fxch	%st(1)
 	fpatan
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(acos)
Index: lib/libm/arch/amd64/e_asin.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_asin.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_asin.S
--- lib/libm/arch/amd64/e_asin.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_asin.S	18 Aug 2017 02:28:21 -0000
@@ -10,6 +10,7 @@
 /* asin = atan (x / sqrt(1 - x^2)) */
 ENTRY(asin)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE		/* x */
 	fld	%st(0)
@@ -19,5 +20,6 @@ ENTRY(asin)
 	fsqrt				/* sqrt (1 - x^2) */
 	fpatan
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(asin)
Index: lib/libm/arch/amd64/e_atan2.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_atan2.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_atan2.S
--- lib/libm/arch/amd64/e_atan2.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_atan2.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(atan2)
+	RETGUARD_START
 	XMM_TWO_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fldl	ARG_DOUBLE_TWO
 	fpatan
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(atan2)
Index: lib/libm/arch/amd64/e_atan2f.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_atan2f.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_atan2f.S
--- lib/libm/arch/amd64/e_atan2f.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_atan2f.S	18 Aug 2017 02:28:21 -0000
@@ -1,4 +1,4 @@
-/*	$OpenBSD: e_atan2f.S,v 1.4 2016/09/12 19:47:01 guenther Exp $ */
+R/*	$OpenBSD: e_atan2f.S,v 1.4 2016/09/12 19:47:01 guenther Exp $ */
 /*
  * Written by J.T. Conklin <jtc@NetBSD.org>.
  * Public domain.
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(atan2f)
+	RETGUARD_START
 	XMM_TWO_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	flds	ARG_FLOAT_TWO
 	fpatan
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(atan2f)
Index: lib/libm/arch/amd64/e_exp.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_exp.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 e_exp.S
--- lib/libm/arch/amd64/e_exp.S	12 Sep 2016 19:47:01 -0000	1.6
+++ lib/libm/arch/amd64/e_exp.S	18 Aug 2017 02:28:21 -0000
@@ -42,6 +42,7 @@
 /* e^x = 2^(x * log2(e)) */
 ENTRY(exp)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	/*
 	 * If x is +-Inf, then the subtraction would give Inf-Inf = NaN.
@@ -82,6 +83,7 @@ ENTRY(exp)
 	fldcw	-8(%rsp)
 1:
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 x_Inf_or_NaN:
@@ -94,9 +96,11 @@ x_Inf_or_NaN:
 	cmpl	$0,-8(%rsp)
 	jne	x_not_minus_Inf
 	xorpd	%xmm0,%xmm0
+	RETGUARD_END
 	ret
 x_not_minus_Inf:
 	movsd	ARG_DOUBLE_ONE,%xmm0
+	RETGUARD_END
 	ret
 END_STD(exp)
Index: lib/libm/arch/amd64/e_fmod.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_fmod.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_fmod.S
--- lib/libm/arch/amd64/e_fmod.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_fmod.S	18 Aug 2017 02:28:21 -0000
@@ -10,6 +10,7 @@
 ENTRY(fmod)
+	RETGUARD_START
 	XMM_TWO_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_TWO
 	fldl	ARG_DOUBLE_ONE
@@ -19,5 +20,6 @@ ENTRY(fmod)
 	jc	1b
 	fstp	%st(1)
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(fmod)
Index: lib/libm/arch/amd64/e_log.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_log.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_log.S
--- lib/libm/arch/amd64/e_log.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_log.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(log)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldln2
 	fldl	ARG_DOUBLE_ONE
 	fyl2x
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(log)
Index: lib/libm/arch/amd64/e_log10.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_log10.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_log10.S
--- lib/libm/arch/amd64/e_log10.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_log10.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(log10)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldlg2
 	fldl	ARG_DOUBLE_ONE
 	fyl2x
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(log10)
Index: lib/libm/arch/amd64/e_remainder.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_remainder.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_remainder.S
--- lib/libm/arch/amd64/e_remainder.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_remainder.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(remainder)
+	RETGUARD_START
 	XMM_TWO_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_TWO
 	fldl	ARG_DOUBLE_ONE
@@ -18,5 +19,6 @@ ENTRY(remainder)
 	jc	1b
 	fstp	%st(1)
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(remainder)
Index: lib/libm/arch/amd64/e_remainderf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_remainderf.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_remainderf.S
--- lib/libm/arch/amd64/e_remainderf.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_remainderf.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(remainderf)
+	RETGUARD_START
 	XMM_TWO_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_TWO
 	flds	ARG_FLOAT_ONE
@@ -18,5 +19,6 @@ ENTRY(remainderf)
 	jc	1b
 	fstp	%st(1)
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(remainderf)
Index: lib/libm/arch/amd64/e_scalb.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_scalb.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 e_scalb.S
--- lib/libm/arch/amd64/e_scalb.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/e_scalb.S	18 Aug 2017 02:28:21 -0000
@@ -9,11 +9,13 @@
 #include "abi.h"
 ENTRY(scalb)
+	RETGUARD_START
 	XMM_TWO_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_TWO
 	fldl	ARG_DOUBLE_ONE
 	fscale
 	fstp	%st(1)			/* bug fix for fp stack overflow */
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_NONSTD(scalb)
Index: lib/libm/arch/amd64/e_sqrt.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_sqrt.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 e_sqrt.S
--- lib/libm/arch/amd64/e_sqrt.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/e_sqrt.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,8 @@
 #include "abi.h"
 ENTRY(sqrt)
+	RETGUARD_START
 	sqrtsd	%xmm0,%xmm0
+	RETGUARD_END
 	ret
 END_STD(sqrt)
Index: lib/libm/arch/amd64/e_sqrtf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_sqrtf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 e_sqrtf.S
--- lib/libm/arch/amd64/e_sqrtf.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/e_sqrtf.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,8 @@
 #include "abi.h"
 ENTRY(sqrtf)
+	RETGUARD_START
 	sqrtss	%xmm0,%xmm0
+	RETGUARD_END
 	ret
 END_STD(sqrtf)
Index: lib/libm/arch/amd64/e_sqrtl.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/e_sqrtl.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 e_sqrtl.S
--- lib/libm/arch/amd64/e_sqrtl.S	12 Sep 2016 19:47:01 -0000	1.2
+++ lib/libm/arch/amd64/e_sqrtl.S	18 Aug 2017 02:28:21 -0000
@@ -8,7 +8,9 @@
 #include "abi.h"
 ENTRY(sqrtl)
+	RETGUARD_START
 	fldt	8(%rsp)
 	fsqrt
+	RETGUARD_END
 	ret
 END_STD(sqrtl)
Index: lib/libm/arch/amd64/s_atan.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_atan.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_atan.S
--- lib/libm/arch/amd64/s_atan.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_atan.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(atan)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fld1
 	fpatan
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(atan)
Index: lib/libm/arch/amd64/s_atanf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_atanf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_atanf.S
--- lib/libm/arch/amd64/s_atanf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_atanf.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(atanf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fld1
 	fpatan
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(atanf)
Index: lib/libm/arch/amd64/s_ceil.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_ceil.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 s_ceil.S
--- lib/libm/arch/amd64/s_ceil.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/s_ceil.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(ceil)
+	RETGUARD_START
 	fstcw	-12(%rsp)
 	movw	-12(%rsp),%dx
 	orw	$0x0800,%dx
@@ -21,5 +22,6 @@ ENTRY(ceil)
 	fldcw	-12(%rsp)
 	fstpl	-8(%rsp)
 	movsd	-8(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END_STD(ceil)
Index: lib/libm/arch/amd64/s_ceilf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_ceilf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 s_ceilf.S
--- lib/libm/arch/amd64/s_ceilf.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/s_ceilf.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(ceilf)
+	RETGUARD_START
 	fstcw	-8(%rsp)
 	movw	-8(%rsp),%dx
 	orw	$0x0800,%dx
@@ -19,5 +20,6 @@ ENTRY(ceilf)
 	fldcw	-8(%rsp)
 	fstps	-4(%rsp)
 	movss	-4(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END(ceilf)
Index: lib/libm/arch/amd64/s_copysign.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_copysign.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 s_copysign.S
--- lib/libm/arch/amd64/s_copysign.S	22 Dec 2016 16:11:26 -0000	1.6
+++ lib/libm/arch/amd64/s_copysign.S	18 Aug 2017 02:28:21 -0000
@@ -14,10 +14,12 @@
 	.quad	0x7fffffffffffffff
 ENTRY(copysign)
+	RETGUARD_START
 	movq	.Lpos(%rip),%xmm2
 	movq	.Lneg(%rip),%xmm3
 	pand	%xmm2,%xmm1
 	pand	%xmm3,%xmm0
 	por	%xmm1,%xmm0
+	RETGUARD_END
 	ret
 END_STD(copysign)
Index: lib/libm/arch/amd64/s_copysignf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_copysignf.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 s_copysignf.S
--- lib/libm/arch/amd64/s_copysignf.S	22 Dec 2016 16:11:26 -0000	1.6
+++ lib/libm/arch/amd64/s_copysignf.S	18 Aug 2017 02:28:21 -0000
@@ -14,10 +14,12 @@
 	.long 0x80000000
 ENTRY(copysignf)
+	RETGUARD_START
 	movss    .Lpos(%rip),%xmm2
 	movss    .Lneg(%rip),%xmm3
 	pand    %xmm2,%xmm1
 	pand    %xmm3,%xmm0
 	por     %xmm1,%xmm0
+	RETGUARD_END
 	ret
 END_STD(copysignf)
Index: lib/libm/arch/amd64/s_cos.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_cos.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_cos.S
--- lib/libm/arch/amd64/s_cos.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_cos.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(cos)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fcos
@@ -16,6 +17,7 @@ ENTRY(cos)
 	andw	$0x400,%ax
 	jnz	1f
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret	
 1:	fldpi
 	fadd	%st(0)
@@ -27,5 +29,6 @@ ENTRY(cos)
 	fstp	%st(1)
 	fcos
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(cos)
Index: lib/libm/arch/amd64/s_cosf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_cosf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_cosf.S
--- lib/libm/arch/amd64/s_cosf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_cosf.S	18 Aug 2017 02:28:21 -0000
@@ -10,9 +10,11 @@
 /* A float's domain isn't large enough to require argument reduction. */
 ENTRY(cosf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fcos
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret	
 END_STD(cosf)
Index: lib/libm/arch/amd64/s_floor.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_floor.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 s_floor.S
--- lib/libm/arch/amd64/s_floor.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/s_floor.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include "abi.h"
 ENTRY(floor)
+	RETGUARD_START
 	movsd	%xmm0, -8(%rsp)
 	fstcw	-12(%rsp)
 	movw	-12(%rsp),%dx
@@ -20,5 +21,6 @@ ENTRY(floor)
 	fldcw	-12(%rsp)
 	fstpl	-8(%rsp)
 	movsd	-8(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END_STD(floor)
Index: lib/libm/arch/amd64/s_floorf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_floorf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 s_floorf.S
--- lib/libm/arch/amd64/s_floorf.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/s_floorf.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include "abi.h"
 ENTRY(floorf)
+	RETGUARD_START
 	movss	%xmm0, -4(%rsp)
 	fstcw	-8(%rsp)
 	movw	-8(%rsp),%dx
@@ -20,5 +21,6 @@ ENTRY(floorf)
 	fldcw	-8(%rsp)
 	fstps	-4(%rsp)
 	movss	-4(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END_STD(floorf)
Index: lib/libm/arch/amd64/s_ilogb.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_ilogb.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 s_ilogb.S
--- lib/libm/arch/amd64/s_ilogb.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/s_ilogb.S	18 Aug 2017 02:28:21 -0000
@@ -8,11 +8,13 @@
 #include "abi.h"
 ENTRY(ilogb)
+	RETGUARD_START
 	movsd	%xmm0,-8(%rsp)
 	fldl	-8(%rsp)
 	fxtract
 	fstp	%st
 	fistpl	-8(%rsp)
 	movl	-8(%rsp),%eax
+	RETGUARD_END
 	ret
 END_STD(ilogb)
Index: lib/libm/arch/amd64/s_ilogbf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_ilogbf.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 s_ilogbf.S
--- lib/libm/arch/amd64/s_ilogbf.S	12 Sep 2016 19:47:01 -0000	1.4
+++ lib/libm/arch/amd64/s_ilogbf.S	18 Aug 2017 02:28:21 -0000
@@ -8,11 +8,13 @@
 #include "abi.h"
 ENTRY(ilogbf)
+	RETGUARD_START
 	movss	%xmm0,-4(%rsp)
 	flds	-4(%rsp)
 	fxtract
 	fstp	%st
 	fistpl	-4(%rsp)
 	movl	-4(%rsp),%eax
+	RETGUARD_END
 	ret
 END_STD(ilogbf)
Index: lib/libm/arch/amd64/s_llrint.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_llrint.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 s_llrint.S
--- lib/libm/arch/amd64/s_llrint.S	12 Sep 2016 19:47:01 -0000	1.2
+++ lib/libm/arch/amd64/s_llrint.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,8 @@
 #include <machine/asm.h>
 ENTRY(llrint)
+	RETGUARD_START
 	cvtsd2si %xmm0, %rax
+	RETGUARD_END
 	ret
 END(llrint)
Index: lib/libm/arch/amd64/s_llrintf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_llrintf.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 s_llrintf.S
--- lib/libm/arch/amd64/s_llrintf.S	12 Sep 2016 19:47:01 -0000	1.2
+++ lib/libm/arch/amd64/s_llrintf.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,8 @@
 #include "abi.h"
 ENTRY(llrintf)
+	RETGUARD_START
 	cvtss2si %xmm0, %rax
+	RETGUARD_END
 	ret
 END_STD(llrintf)
Index: lib/libm/arch/amd64/s_log1p.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_log1p.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 s_log1p.S
--- lib/libm/arch/amd64/s_log1p.S	19 Aug 2017 18:27:19 -0000	1.5
+++ lib/libm/arch/amd64/s_log1p.S	19 Aug 2017 18:29:09 -0000
@@ -40,6 +40,7 @@
  */
 ENTRY(log1p)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fabs
@@ -62,6 +63,7 @@ use_fyl2x:
 	faddp
 	fyl2x
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 	.align	4,0xcc
@@ -70,5 +72,6 @@ use_fyl2xp1:
 	fldl	ARG_DOUBLE_ONE
 	fyl2xp1
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(log1p)
Index: lib/libm/arch/amd64/s_log1pf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_log1pf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 s_log1pf.S
--- lib/libm/arch/amd64/s_log1pf.S	19 Aug 2017 18:27:19 -0000	1.5
+++ lib/libm/arch/amd64/s_log1pf.S	19 Aug 2017 18:29:09 -0000
@@ -40,6 +40,7 @@
  */
 ENTRY(log1pf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fabs
@@ -62,6 +63,7 @@ use_fyl2x:
 	faddp
 	fyl2x
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 	.align	4,0xcc
@@ -70,5 +72,6 @@ use_fyl2xp1:
 	flds	ARG_FLOAT_ONE
 	fyl2xp1
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(log1pf)
Index: lib/libm/arch/amd64/s_logb.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_logb.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_logb.S
--- lib/libm/arch/amd64/s_logb.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_logb.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(logb)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fxtract
 	fstp	%st
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(logb)
Index: lib/libm/arch/amd64/s_logbf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_logbf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_logbf.S
--- lib/libm/arch/amd64/s_logbf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_logbf.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(logbf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fxtract
 	fstp	%st
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(logbf)
Index: lib/libm/arch/amd64/s_lrint.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_lrint.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 s_lrint.S
--- lib/libm/arch/amd64/s_lrint.S	12 Sep 2016 19:47:01 -0000	1.2
+++ lib/libm/arch/amd64/s_lrint.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,8 @@
 #include <machine/asm.h>
 ENTRY(lrint)
+	RETGUARD_START
 	cvtsd2si %xmm0, %rax
+	RETGUARD_END
 	ret
 END(lrint)
Index: lib/libm/arch/amd64/s_lrintf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_lrintf.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 s_lrintf.S
--- lib/libm/arch/amd64/s_lrintf.S	12 Sep 2016 19:47:01 -0000	1.2
+++ lib/libm/arch/amd64/s_lrintf.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,8 @@
 #include <machine/asm.h>
 ENTRY(lrintf)
+	RETGUARD_START
 	cvtss2si %xmm0, %rax
+	RETGUARD_END
 	ret
 END(lrintf)
Index: lib/libm/arch/amd64/s_rint.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_rint.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_rint.S
--- lib/libm/arch/amd64/s_rint.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_rint.S	18 Aug 2017 02:28:21 -0000
@@ -9,9 +9,11 @@
 #include "abi.h"
 ENTRY(rint)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	frndint
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(rint)
Index: lib/libm/arch/amd64/s_rintf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_rintf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_rintf.S
--- lib/libm/arch/amd64/s_rintf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_rintf.S	18 Aug 2017 02:28:21 -0000
@@ -9,9 +9,11 @@
 #include "abi.h"
 ENTRY(rintf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	frndint
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(rintf)
Index: lib/libm/arch/amd64/s_scalbnf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_scalbnf.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 s_scalbnf.S
--- lib/libm/arch/amd64/s_scalbnf.S	12 Sep 2016 19:47:01 -0000	1.5
+++ lib/libm/arch/amd64/s_scalbnf.S	18 Aug 2017 02:28:21 -0000
@@ -12,6 +12,7 @@
 	ldexpf = scalbnf
 ENTRY(scalbnf)
+	RETGUARD_START
 	movss	%xmm0,-8(%rsp)
 	movl	%edi,-4(%rsp)
 	fildl	-4(%rsp)
@@ -20,5 +21,6 @@ ENTRY(scalbnf)
 	fstp	%st(1)		/* bug fix for fp stack overflow */
 	fstps	-8(%rsp)
 	movss	-8(%rsp),%xmm0
+	RETGUARD_END
 	ret
 END_STD(scalbnf)
Index: lib/libm/arch/amd64/s_significand.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_significand.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_significand.S
--- lib/libm/arch/amd64/s_significand.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_significand.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(significand)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fxtract
 	fstp	%st(1)
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(significand)
Index: lib/libm/arch/amd64/s_significandf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_significandf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_significandf.S
--- lib/libm/arch/amd64/s_significandf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_significandf.S	18 Aug 2017 02:28:21 -0000
@@ -9,10 +9,12 @@
 #include "abi.h"
 ENTRY(significandf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fxtract
 	fstp	%st(1)
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END(significandf)
Index: lib/libm/arch/amd64/s_sin.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_sin.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_sin.S
--- lib/libm/arch/amd64/s_sin.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_sin.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(sin)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fsin
@@ -16,6 +17,7 @@ ENTRY(sin)
 	andw	$0x400,%ax
 	jnz	1f
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 1:	fldpi
 	fadd	%st(0)
@@ -27,5 +29,6 @@ ENTRY(sin)
 	fstp	%st(1)
 	fsin
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(sin)
Index: lib/libm/arch/amd64/s_sinf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_sinf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_sinf.S
--- lib/libm/arch/amd64/s_sinf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_sinf.S	18 Aug 2017 02:28:21 -0000
@@ -10,9 +10,11 @@
 /* A float's domain isn't large enough to require argument reduction. */
 ENTRY(sinf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fsin
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END_STD(sinf)
Index: lib/libm/arch/amd64/s_tan.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_tan.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_tan.S
--- lib/libm/arch/amd64/s_tan.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_tan.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include "abi.h"
 ENTRY(tan)
+	RETGUARD_START
 	XMM_ONE_ARG_DOUBLE_PROLOGUE
 	fldl	ARG_DOUBLE_ONE
 	fptan
@@ -17,6 +18,7 @@ ENTRY(tan)
 	jnz	1f
 	fstp	%st(0)
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 1:	fldpi
 	fadd	%st(0)
@@ -29,5 +31,6 @@ ENTRY(tan)
 	fptan
 	fstp	%st(0)
 	XMM_DOUBLE_EPILOGUE
+	RETGUARD_END
 	ret
 END(tan)
Index: lib/libm/arch/amd64/s_tanf.S
===================================================================
RCS file: /cvs/src/lib/libm/arch/amd64/s_tanf.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 s_tanf.S
--- lib/libm/arch/amd64/s_tanf.S	12 Sep 2016 19:47:01 -0000	1.3
+++ lib/libm/arch/amd64/s_tanf.S	18 Aug 2017 02:28:21 -0000
@@ -10,10 +10,12 @@
 /* A float's domain isn't large enough to require argument reduction. */
 ENTRY(tanf)
+	RETGUARD_START
 	XMM_ONE_ARG_FLOAT_PROLOGUE
 	flds	ARG_FLOAT_ONE
 	fptan
 	fstp	%st(0)
 	XMM_FLOAT_EPILOGUE
+	RETGUARD_END
 	ret
 END(tanf)
Index: lib/libm/arch/i387/DEFS.h
===================================================================
RCS file: /cvs/src/lib/libm/arch/i387/DEFS.h,v
retrieving revision 1.1
diff -u -p -u -r1.1 DEFS.h
--- lib/libm/arch/i387/DEFS.h	12 Sep 2016 19:47:02 -0000	1.1
+++ lib/libm/arch/i387/DEFS.h	18 Aug 2017 18:02:51 -0000
@@ -25,5 +25,5 @@
  *   END_STD(x)	Like DEF_STD() in C; for standard/reserved C names
  *   END_NONSTD(x)	Like DEF_NONSTD() in C; for non-ISO C names
  */
-#define	END_STD(x)	END(x); _HIDDEN_FALIAS(x,x); END(_HIDDEN(x))
+#define	END_STD(x)	END(x); _HIDDEN_FALIAS(x,x); _ASM_SIZE(_HIDDEN(x))
 #define	END_NONSTD(x)	END_STD(x); .weak x
Index: libexec/ld.so/amd64/ldasm.S
===================================================================
RCS file: /cvs/src/libexec/ld.so/amd64/ldasm.S,v
retrieving revision 1.27
diff -u -p -u -r1.27 ldasm.S
--- libexec/ld.so/amd64/ldasm.S	15 Aug 2017 00:26:02 -0000	1.27
+++ libexec/ld.so/amd64/ldasm.S	18 Aug 2017 02:28:21 -0000
@@ -75,11 +75,15 @@ _dl_start:
 	.type	__CONCAT(_dl_,n), @function	;\
 	.align	16,0xcc				;\
 __CONCAT(_dl_,n):				;\
+	.cfi_startproc				;\
+	RETGUARD_START				;\
 	movl	$(__CONCAT(SYS_,c)), %eax	;\
 	movq	%rcx, %r10			;\
 	syscall					;\
 	jb	1f				;\
-	ret
+	RETGUARD_END				;\
+	ret					;\
+	.cfi_endproc
 DL_SYSCALL(open)
 DL_SYSCALL(fstat)
@@ -106,6 +110,7 @@ DL_SYSCALL(thrkill)
 1:
 	/* error: result = -errno; - handled here. */
 	neg	%rax
+	RETGUARD_END
 	ret
@@ -114,6 +119,8 @@ DL_SYSCALL(thrkill)
 	.type _dl_bind_start,@function
 _dl_bind_start:
 	.cfi_startproc
+	.cfi_escape 0x16, 0x10, 0x06, 0x09, 0xf8, 0x22, 0x12, 0x06, 0x27
+	xorq	%rsp,16(%rsp)		# RETGUARD_START, sort of
 	.cfi_adjust_cfa_offset	16
 	pushfq				# save registers
 	.cfi_adjust_cfa_offset	8
@@ -182,6 +189,7 @@ _dl_bind_start:
 	.cfi_adjust_cfa_offset	-8
 	/*.cfi_restore	%rflags */
+	xorq	%rsp,16(%rsp)		# RETGUARD, sort of
 	leaq	8(%rsp),%rsp		# Discard reloff, do not change eflags
 	.cfi_adjust_cfa_offset	-8
 	ret
Index: libexec/ld.so/i386/ldasm.S
===================================================================
RCS file: /cvs/src/libexec/ld.so/i386/ldasm.S,v
retrieving revision 1.31
diff -u -p -u -r1.31 ldasm.S
--- libexec/ld.so/i386/ldasm.S	16 Aug 2017 19:48:49 -0000	1.31
+++ libexec/ld.so/i386/ldasm.S	18 Aug 2017 02:28:21 -0000
@@ -92,9 +92,13 @@ _dl_start:
 	.global		__CONCAT(_dl_,n)			;\
 	.type		__CONCAT(_dl_,n),@function		;\
 __CONCAT(_dl_,n):						;\
+	.cfi_startproc						;\
+	RETGUARD_START						;\
 	__DO_SYSCALL(c)						;\
 	jb	.L_cerr						;\
-	ret
+	RETGUARD_END						;\
+	ret							;\
+	.cfi_endproc
 DL_SYSCALL(open)
 DL_SYSCALL(fstat)
@@ -121,6 +125,7 @@ DL_SYSCALL(thrkill)
 .L_cerr:
 	/* error: result = -errno; - handled here. */
 	neg	%eax
+	RETGUARD_END
 	ret
 	.align 16,0xcc
Index: sys/arch/amd64/amd64/acpi_wakecode.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/acpi_wakecode.S,v
retrieving revision 1.40
diff -u -p -u -r1.40 acpi_wakecode.S
--- sys/arch/amd64/amd64/acpi_wakecode.S	28 Jun 2017 07:16:58 -0000	1.40
+++ sys/arch/amd64/amd64/acpi_wakecode.S	18 Aug 2017 02:28:21 -0000
@@ -393,6 +393,7 @@ _ACPI_TRMP_OFFSET(.Lhibernate_resume_vec
 	/* Jump to the S3 resume vector */
 	ljmp	$(_ACPI_RM_CODE_SEG), $.Lacpi_s3_vector_real
+NEND(hibernate_resume_machdep)
 NENTRY(hibernate_drop_to_real_mode)
 	.code64
@@ -431,10 +432,12 @@ _ACPI_TRMP_OFFSET(.Lhibernate_resume_vec
 _ACPI_TRMP_OFFSET(.Lhib_hlt_real)
 	hlt
 	ljmp	$(_ACPI_RM_CODE_SEG), $.Lhib_hlt_real
+NEND(hibernate_drop_to_real_mode)
 	.code64
 	/* Switch to hibernate resume pagetable */
 NENTRY(hibernate_activate_resume_pt_machdep)
+	RETGUARD_START
 	/* Enable large pages */
 	movq	%cr4, %rax
 	orq	$(CR4_PSE), %rax
@@ -449,23 +452,31 @@ NENTRY(hibernate_activate_resume_pt_mach
 	jmp	1f
 1:	nop
+	RETGUARD_END
 	ret
+NEND(hibernate_activate_resume_pt_machdep)
 	/*
 	 * Switch to the private resume-time hibernate stack
 	 */
 NENTRY(hibernate_switch_stack_machdep)
+	xorl	$(HIBERNATE_STACK_PAGE + HIBERNATE_STACK_OFFSET),(%rsp)	# RETGUARD
 	movq	(%rsp), %rax
 	movq    %rax, HIBERNATE_STACK_PAGE + HIBERNATE_STACK_OFFSET
 	movq    $(HIBERNATE_STACK_PAGE + HIBERNATE_STACK_OFFSET), %rax
 	movq    %rax, %rsp
 	/* On our own stack from here onward */
+	RETGUARD_END
 	ret
+NEND(hibernate_switch_stack_machdep)
 NENTRY(hibernate_flush)
+	RETGUARD_START
 	invlpg	HIBERNATE_INFLATE_PAGE
+	RETGUARD_END
 	ret
+NEND(hibernate_flush)
 #endif /* HIBERNATE */
 	/*
@@ -662,6 +673,7 @@ _C_LABEL(acpi_tramp_data_end):
 	.code64
 NENTRY(acpi_savecpu)
 	movq	(%rsp), %rax
+	RETGUARD_START		# 2nd instruction
 	movq	%rax, .Lacpi_saved_ret
 	movq	%rbx, .Lacpi_saved_rbx
@@ -752,4 +764,6 @@ NENTRY(acpi_savecpu)
 	str	.Lacpi_saved_tr
 	movl	$1, %eax
+	RETGUARD_END
 	ret
+NEND(acpi_savecpu)
Index: sys/arch/amd64/amd64/aes_intel.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/aes_intel.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 aes_intel.S
--- sys/arch/amd64/amd64/aes_intel.S	26 Mar 2013 15:47:01 -0000	1.9
+++ sys/arch/amd64/amd64/aes_intel.S	18 Aug 2017 02:28:21 -0000
@@ -106,6 +106,8 @@
 _key_expansion_128:
 _key_expansion_256a:
+	.cfi_startproc
+	RETGUARD_START
 	pshufd	$0b11111111,%xmm1,%xmm1
 	shufps	$0b00010000,%xmm0,%xmm4
 	pxor	%xmm4,%xmm0
@@ -114,9 +116,13 @@ _key_expansion_256a:
 	pxor	%xmm1,%xmm0
 	movaps	%xmm0,(%rcx)
 	add	$0x10,%rcx
+	RETGUARD_END
 	ret
+	.cfi_endproc
 _key_expansion_192a:
+	.cfi_startproc
+	RETGUARD_START
 	pshufd	$0b01010101,%xmm1,%xmm1
 	shufps	$0b00010000,%xmm0,%xmm4
 	pxor	%xmm4,%xmm0
@@ -137,9 +143,13 @@ _key_expansion_192a:
 	shufps	$0b01001110,%xmm2,%xmm1
 	movaps	%xmm1,16(%rcx)
 	add	$0x20,%rcx
+	RETGUARD_END
 	ret
+	.cfi_endproc
 _key_expansion_192b:
+	.cfi_startproc
+	RETGUARD_START
 	pshufd	$0b01010101,%xmm1,%xmm1
 	shufps	$0b00010000,%xmm0,%xmm4
 	pxor	%xmm4,%xmm0
@@ -155,9 +165,13 @@ _key_expansion_192b:
 	movaps	%xmm0,(%rcx)
 	add	$0x10,%rcx
+	RETGUARD_END
 	ret
+	.cfi_endproc
 _key_expansion_256b:
+	.cfi_startproc
+	RETGUARD_START
 	pshufd	$0b10101010,%xmm1,%xmm1
 	shufps	$0b00010000,%xmm2,%xmm4
 	pxor	%xmm4,%xmm2
@@ -166,12 +180,14 @@ _key_expansion_256b:
 	pxor	%xmm1,%xmm2
 	movaps	%xmm2,(%rcx)
 	add	$0x10,%rcx
+	RETGUARD_END
 	ret
-
+	.cfi_endproc
 /*
  * void aesni_set_key(struct aesni_session *ses, uint8_t *key, size_t len)
  */
 ENTRY(aesni_set_key)
+	RETGUARD_START
 	movups	(%rsi),%xmm0		# user key (first 16 bytes)
 	movaps	%xmm0,(%rdi)
 	lea	0x10(%rdi),%rcx		# key addr
@@ -267,17 +283,22 @@ ENTRY(aesni_set_key)
 	sub	$0x10,%rsi
 	cmp	%rcx,%rdi
 	jb	4b
+	RETGUARD_END
 	ret
+END(aesni_set_key)
 /*
  * void aesni_enc(struct aesni_session *ses, uint8_t *dst, uint8_t *src)
  */
 ENTRY(aesni_enc)
+	RETGUARD_START
 	movl	480(KEYP),KLEN		# key length
 	movups	(INP),STATE		# input
 	call	_aesni_enc1
 	movups	STATE,(OUTP)		# output
+	RETGUARD_END
 	ret
+END(aesni_enc)
 /*
  * _aesni_enc1:		internal ABI
@@ -292,6 +313,8 @@ ENTRY(aesni_enc)
  *	TKEYP (T1)
  */
 _aesni_enc1:
+	.cfi_startproc
+	RETGUARD_START
 	movaps	(KEYP),KEY		# key
 	mov	KEYP,TKEYP
 	pxor	KEY,STATE		# round 0
@@ -333,7 +356,9 @@ _aesni_enc1:
 	aesenc	KEY,STATE
 	movaps	0x70(TKEYP),KEY
 	aesenclast KEY,STATE
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * _aesni_enc4:	internal ABI
@@ -354,6 +379,8 @@ _aesni_enc1:
  *	TKEYP (T1)
  */
 _aesni_enc4:
+	.cfi_startproc
+	RETGUARD_START
 	movaps	(KEYP),KEY		# key
 	mov	KEYP,TKEYP
 	pxor	KEY,STATE1		# round 0
@@ -440,18 +467,23 @@ _aesni_enc4:
 	aesenclast KEY,STATE2
 	aesenclast KEY,STATE3
 	aesenclast KEY,STATE4
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * void aesni_dec(struct aesni_session *ses, uint8_t *dst, uint8_t *src)
  */
 ENTRY(aesni_dec)
+	RETGUARD_START
 	mov	480(KEYP),KLEN		# key length
 	add	$240,KEYP
 	movups	(INP),STATE		# input
 	call	_aesni_dec1
 	movups	STATE,(OUTP)		# output
+	RETGUARD_END
 	ret
+END(aesni_dec)
 /*
  * _aesni_dec1:		internal ABI
@@ -466,6 +498,8 @@ ENTRY(aesni_dec)
  *	TKEYP (T1)
  */
 _aesni_dec1:
+	.cfi_startproc
+	RETGUARD_START
 	movaps	(KEYP),KEY		# key
 	mov	KEYP,TKEYP
 	pxor	KEY,STATE		# round 0
@@ -507,7 +541,9 @@ _aesni_dec1:
 	aesdec	KEY,STATE
 	movaps	0x70(TKEYP),KEY
 	aesdeclast KEY,STATE
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * _aesni_dec4:	internal ABI
@@ -528,6 +564,8 @@ _aesni_dec1:
  *	TKEYP (T1)
  */
 _aesni_dec4:
+	.cfi_startproc
+	RETGUARD_START
 	movaps	(KEYP),KEY		# key
 	mov	KEYP,TKEYP
 	pxor	KEY,STATE1		# round 0
@@ -614,7 +652,9 @@ _aesni_dec4:
 	aesdeclast KEY,STATE2
 	aesdeclast KEY,STATE3
 	aesdeclast KEY,STATE4
+	RETGUARD_END
 	ret
+	.cfi_endproc
 #if 0
 /*
@@ -622,6 +662,7 @@ _aesni_dec4:
  *     size_t len)
  */
 ENTRY(aesni_ecb_enc)
+	RETGUARD_START
 	test	LEN,LEN			# check length
 	jz	3f
 	mov	480(KEYP),KLEN
@@ -658,13 +699,16 @@ ENTRY(aesni_ecb_enc)
 	cmp	$16,LEN
 	jge	2b
 3:
+	RETGUARD_END
 	ret
+END(aesni_ecb_enc)
 /*
  * void aesni_ecb_dec(struct aesni_session *ses, uint8_t *dst, uint8_t *src,
  *     size_t len);
  */
 ENTRY(aesni_ecb_dec)
+	RETGUARD_START
 	test	LEN,LEN
 	jz	3f
 	mov	480(KEYP),KLEN
@@ -702,7 +746,9 @@ ENTRY(aesni_ecb_dec)
 	cmp	$16,LEN
 	jge	2b
 3:
+	RETGUARD_END
 	ret
+END(aesni_ecb_dec)
 #endif
 /*
@@ -710,6 +756,7 @@ ENTRY(aesni_ecb_dec)
  *     size_t len, uint8_t *iv)
  */
 ENTRY(aesni_cbc_enc)
+	RETGUARD_START
 	cmp	$16,LEN
 	jb	2f
 	mov	480(KEYP),KLEN
@@ -727,13 +774,16 @@ ENTRY(aesni_cbc_enc)
 	jge	1b
 	movups	STATE,(IVP)
 2:
+	RETGUARD_END
 	ret
+END(aesni_cbc_enc)
 /*
  * void aesni_cbc_dec(struct aesni_session *ses, uint8_t *dst, uint8_t *src,
  *     size_t len, uint8_t *iv)
  */
 ENTRY(aesni_cbc_dec)
+	RETGUARD_START
 	cmp	$16,LEN
 	jb	4f
 	mov	480(KEYP),KLEN
@@ -784,7 +834,9 @@ ENTRY(aesni_cbc_dec)
 3:
 	movups	IV,(IVP)
 4:
+	RETGUARD_END
 	ret
+END(aesni_cbc_dec)
 /*
  * _aesni_inc_init:	internal ABI
@@ -799,6 +851,8 @@ ENTRY(aesni_cbc_dec)
  *	BSWAP_MASK	== endian swapping mask
  */
 _aesni_inc_init:
+	.cfi_startproc
+	RETGUARD_START
 	movdqa	CTR,IV
 	pslldq	$8,IV
 	movdqu	.Lbswap_mask,BSWAP_MASK
@@ -806,7 +860,9 @@ _aesni_inc_init:
 	mov	$1,TCTR_LOW
 	movd	TCTR_LOW,INC
 	movd	CTR,TCTR_LOW
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * _aesni_inc:		internal ABI
@@ -824,6 +880,8 @@ _aesni_inc_init:
  *	TCTR_LOW:	== lower dword of CTR
  */
 _aesni_inc:
+	.cfi_startproc
+	RETGUARD_START
 	paddq	INC,CTR
 	add	$1,TCTR_LOW
 	jnc	1f
@@ -833,13 +891,16 @@ _aesni_inc:
 1:
 	movaps	CTR,IV
 	pshufb	BSWAP_MASK,IV
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * void aesni_ctr_enc(struct aesni_session *ses, uint8_t *dst, uint8_t *src,
  *     size_t len, uint8_t *icb)
  */
 ENTRY(aesni_ctr_enc)
+	RETGUARD_START
 	cmp	$16,LEN
 	jb	4f
 	mov	480(KEYP),KLEN
@@ -893,9 +954,13 @@ ENTRY(aesni_ctr_enc)
 3:
 	movq	IV,(IVP)
 4:
+	RETGUARD_END
 	ret
+END(aesni_ctr_enc)
 _aesni_gmac_gfmul:
+	.cfi_startproc
+	RETGUARD_START
 	movdqa	%xmm0,%xmm3
 	pclmulqdq $0x00,%xmm1,%xmm3	# xmm3 holds a0*b0
 	movdqa	%xmm0,%xmm4
@@ -959,12 +1024,15 @@ _aesni_gmac_gfmul:
 	pxor	%xmm8,%xmm2
 	pxor	%xmm2,%xmm3
 	pxor	%xmm3,%xmm6		# the result is in xmm6
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * void aesni_gmac_update(GHASH_CTX *ghash, uint8_t *src, size_t len)
  */
 ENTRY(aesni_gmac_update)
+	RETGUARD_START
 	cmp	$16,%rdx
 	jb	2f
@@ -990,26 +1058,32 @@ ENTRY(aesni_gmac_update)
 	movdqu	%xmm6,16(%rdi)
 	movdqu	%xmm6,32(%rdi)
 2:
+	RETGUARD_END
 	ret
+END(aesni_gmac_update)
 /*
  * void aesni_gmac_final(struct aesni_sess *ses, uint8_t *tag,
  *     uint8_t *icb, uint8_t *hashstate)
  */
 ENTRY(aesni_gmac_final)
+	RETGUARD_START
 	movl	480(KEYP),KLEN		# key length
 	movdqu	(INP),STATE		# icb
 	call	_aesni_enc1
 	movdqu	(HSTATE),IN
 	pxor	IN,STATE
 	movdqu	STATE,(OUTP)		# output
+	RETGUARD_END
 	ret
+END(aesni_gmac_final)
 /*
  * void aesni_xts_enc(struct aesni_xts_ctx *xts, uint8_t *dst, uint8_t *src,
  *    size_t len, uint8_t *iv)
  */
 ENTRY(aesni_xts_enc)
+	RETGUARD_START
 	cmp	$16,%rcx
 	jb	2f
@@ -1031,13 +1105,16 @@ ENTRY(aesni_xts_enc)
 	cmp	$16,%rcx
 	jge	1b
 2:
+	RETGUARD_END
 	ret
+END(aesni_xts_enc)
 /*
  * void aesni_xts_dec(struct aesni_xts_ctx *xts, uint8_t *dst, uint8_t *src,
  *    size_t len, uint8_t *iv)
  */
 ENTRY(aesni_xts_dec)
+	RETGUARD_START
 	cmp	$16,%rcx
 	jb	2f
@@ -1060,7 +1137,9 @@ ENTRY(aesni_xts_dec)
 	cmp	$16,%rcx
 	jge	1b
 2:
+	RETGUARD_END
 	ret
+END(aesni_xts_dec)
 /*
  * Prepare tweak as E_k2(IV). IV is specified as LE representation of a
@@ -1070,6 +1149,8 @@ ENTRY(aesni_xts_dec)
  * xts is in %rdi, iv is in %r8 and we return the tweak in %xmm3.
  */
 _aesni_xts_tweak:
+	.cfi_startproc
+	RETGUARD_START
 	mov	(%r8),%r10
 	movd	%r10,%xmm0		# Last 64-bits of IV are always zero.
 	mov	KEYP,%r11
@@ -1078,12 +1159,16 @@ _aesni_xts_tweak:
 	call	_aesni_enc1
 	movdqa	%xmm0,%xmm3
 	mov	%r11,KEYP
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * Exponentiate AES XTS tweak (in %xmm3).
  */
 _aesni_xts_tweak_exp:
+	.cfi_startproc
+	RETGUARD_START
 	pextrw	$7,%xmm3,%r10
 	pextrw	$3,%xmm3,%r11
 	psllq	$1,%xmm3		# Left shift.
@@ -1101,4 +1186,6 @@ _aesni_xts_tweak_exp:
 	xor	$0x87,%r11		# AES XTS alpha - GF(2^128).
 	pinsrw	$0,%r11,%xmm3
 2:
+	RETGUARD_END
 	ret
+	.cfi_endproc
Index: sys/arch/amd64/amd64/copy.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v
retrieving revision 1.8
diff -u -p -u -r1.8 copy.S
--- sys/arch/amd64/amd64/copy.S	12 May 2017 19:25:19 -0000	1.8
+++ sys/arch/amd64/amd64/copy.S	18 Aug 2017 02:28:21 -0000
@@ -69,6 +69,7 @@
  */
 ENTRY(kcopy)
+	RETGUARD_START
 	movq	CPUVAR(CURPCB),%rax
 	pushq	PCB_ONFAULT(%rax)
 	leaq	_C_LABEL(copy_fault)(%rip),%r11
@@ -93,6 +94,7 @@ ENTRY(kcopy)
 	movq	CPUVAR(CURPCB),%rdx
 	popq	PCB_ONFAULT(%rdx)
 	xorq	%rax,%rax
+	RETGUARD_END
 	ret
 1:	addq	%rcx,%rdi		# copy backward
@@ -114,9 +116,12 @@ ENTRY(kcopy)
 	movq	CPUVAR(CURPCB),%rdx
 	popq	PCB_ONFAULT(%rdx)
 	xorq	%rax,%rax
+	RETGUARD_END
 	ret
+END(kcopy)
 ENTRY(copyout)
+	RETGUARD_START
 	pushq	$0
 	xchgq	%rdi,%rsi
@@ -149,9 +154,12 @@ ENTRY(copyout)
 	SMAP_CLAC
 	popq	PCB_ONFAULT(%rdx)
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
+END(copyout)
 ENTRY(copyin)
+	RETGUARD_START
 	movq	CPUVAR(CURPCB),%rax
 	pushq	$0
 	leaq	_C_LABEL(copy_fault)(%rip),%r11
@@ -186,18 +194,24 @@ ENTRY(copyin)
 	movq	CPUVAR(CURPCB),%rdx
 	popq	PCB_ONFAULT(%rdx)
 	xorl	%eax,%eax
+	RETGUARD_END
 	ret
+END(copyin)
 NENTRY(copy_efault)
 	movq	$EFAULT,%rax
+NEND(copy_efault)
 NENTRY(copy_fault)
 	SMAP_CLAC
 	movq	CPUVAR(CURPCB),%rdx
 	popq	PCB_ONFAULT(%rdx)
+	RETGUARD_END
 	ret
+NEND(copy_efault)
 ENTRY(copyoutstr)
+	RETGUARD_START
 	xchgq	%rdi,%rsi
 	movq	%rdx,%r8
 	movq	%rcx,%r9
@@ -237,8 +251,10 @@ ENTRY(copyoutstr)
 	jae	_C_LABEL(copystr_efault)
 	movq	$ENAMETOOLONG,%rax
 	jmp	copystr_return
+END(copyoutstr)
 ENTRY(copyinstr)
+	RETGUARD_START
 	xchgq	%rdi,%rsi
 	movq	%rdx,%r8
 	movq	%rcx,%r9
@@ -279,9 +295,11 @@ ENTRY(copyinstr)
 	jae	_C_LABEL(copystr_efault)
 	movq	$ENAMETOOLONG,%rax
 	jmp	copystr_return
+END(copyinstr)
-ENTRY(copystr_efault)
+NENTRY(copystr_efault)
 	movl	$EFAULT,%eax
+NEND(copystr_efault)
 ENTRY(copystr_fault)
 copystr_return:
@@ -294,9 +312,12 @@ copystr_return:
 	subq	%rdx,%r8
 	movq	%r8,(%r9)
-8:	ret
+8:	RETGUARD_END
+	ret
+END(copystr_fault)
 ENTRY(copystr)
+	RETGUARD_START
 	xchgq	%rdi,%rsi
 	movq	%rdx,%r8
@@ -323,7 +344,9 @@ ENTRY(copystr)
 	subq	%rdx,%r8
 	movq	%r8,(%rcx)
-7:	ret
+7:	RETGUARD_END
+	ret
+END(copystr)
 .globl _C_LABEL(_stac)
 _C_LABEL(_stac):
Index: sys/arch/amd64/amd64/db_trace.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/db_trace.c,v
retrieving revision 1.34
diff -u -p -u -r1.34 db_trace.c
--- sys/arch/amd64/amd64/db_trace.c	14 Aug 2017 16:32:37 -0000	1.34
+++ sys/arch/amd64/amd64/db_trace.c	18 Aug 2017 05:34:10 -0000
@@ -73,6 +73,14 @@ struct db_variable * db_eregs = db_regs 
  */
 #define	INKERNEL(va)	(((vaddr_t)(va)) >= VM_MIN_KERNEL_ADDRESS)
+/* Kernel uses xor %esp,(%rsp) for RETGUARD/-fret-protector */
+#if defined(PROF) || defined(GPROF)
+# define GETPC(frame) (db_get_value((db_addr_t)&frame->f_retaddr, 8, FALSE))
+#else
+# define GETPC(frame) (db_get_value((db_addr_t)&frame->f_retaddr, 8, FALSE)\
+		^ (unsigned int)&(frame->f_retaddr))
+#endif
+
 #define	NONE		0
 #define	TRAP		1
 #define	SYSCALL		2
@@ -111,8 +119,7 @@ db_nextframe(struct callframe **fp, db_a
 	switch (is_trap) {
 	    case NONE:
-		*ip = (db_addr_t)
-			db_get_value((db_addr_t)&(*fp)->f_retaddr, 8, FALSE);
+		*ip = (db_addr_t)GETPC((*fp));
 		*fp = (struct callframe *)
 			db_get_value((db_addr_t)&(*fp)->f_frame, 8, FALSE);
 		break;
@@ -211,8 +218,7 @@ db_stack_trace_print(db_expr_t addr, boo
 		} else {
 			frame = (struct callframe *)addr;
 		}
-		callpc = (db_addr_t)
-			 db_get_value((db_addr_t)&frame->f_retaddr, 8, FALSE);
+		callpc = (db_addr_t)GETPC(frame);
 		frame = (struct callframe *)frame->f_frame;
 	}
@@ -286,9 +292,7 @@ db_stack_trace_print(db_expr_t addr, boo
 		if (lastframe == 0 && offset == 0 && !have_addr && !is_trap) {
 			/* Frame really belongs to next callpc */
 			lastframe = (struct callframe *)(ddb_regs.tf_rsp-8);
-			callpc = (db_addr_t)
-				 db_get_value((db_addr_t)&lastframe->f_retaddr,
-				    8, FALSE);
+			callpc = (db_addr_t)GETPC(lastframe);
 			continue;
 		}
@@ -350,7 +354,7 @@ db_save_stack_trace(struct db_stack_trac
 	frame = __builtin_frame_address(0);
-	callpc = db_get_value((db_addr_t)&frame->f_retaddr, 8, FALSE);
+	callpc = GETPC(frame);
 	frame = frame->f_frame;
 	lastframe = NULL;
@@ -372,7 +376,7 @@ db_save_stack_trace(struct db_stack_trac
 		if (is_trap == NONE) {
 			lastframe = frame;
-			callpc = frame->f_retaddr;
+			callpc = GETPC(frame);
 			frame = frame->f_frame;
 		} else {
 			if (is_trap == INTERRUPT) {
@@ -409,7 +413,7 @@ db_get_pc(struct trapframe *tf)
 {
 	struct callframe *cf = (struct callframe *)(tf->tf_rsp - sizeof(long));
-	return db_get_value((db_addr_t)&cf->f_retaddr, sizeof(long), 0);
+	return GETPC(cf);
 }
 vaddr_t
Index: sys/arch/amd64/amd64/locore.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/locore.S,v
retrieving revision 1.87
diff -u -p -u -r1.87 locore.S
--- sys/arch/amd64/amd64/locore.S	6 Jul 2017 06:17:04 -0000	1.87
+++ sys/arch/amd64/amd64/locore.S	18 Aug 2017 02:28:21 -0000
@@ -282,6 +282,7 @@ NENTRY(lgdt)
 	pushq	$GSEL(GCODE_SEL, SEL_KPL)
 	pushq	%rax
 	lretq
+NEND(lgdt)
 ENTRY(setjmp)
 	/*
@@ -301,6 +302,7 @@ ENTRY(setjmp)
 	movq	%rdx,56(%rax)
 	xorl	%eax,%eax
 	ret
+END(setjmp)
 ENTRY(longjmp)
 	movq	%rdi,%rax
@@ -316,6 +318,7 @@ ENTRY(longjmp)
 	xorl	%eax,%eax
 	incl	%eax
 	ret
+END(longjmp)
 /*****************************************************************************/
@@ -324,6 +327,7 @@ ENTRY(longjmp)
  * Switch from "old" proc to "new".
  */
 ENTRY(cpu_switchto)
+	RETGUARD_START
 	pushq	%rbx
 	pushq	%rbp
 	pushq	%r12
@@ -362,7 +366,12 @@ ENTRY(cpu_switchto)
 	btrq	%rdi,PM_CPUS(%rcx)
 	/* Save stack pointers. */
-	movq	%rsp,PCB_RSP(%r13)
+	movq	%rsp,%rax
+	addq	$6*8,%rax
+	xorl	%eax,(%rax)		# RETGUARD
+	subq	$6*8,%rax
+	movq	%rax,PCB_RSP(%r13)
+
 	movq	%rbp,PCB_RBP(%r13)
 switch_exited:
@@ -391,7 +400,12 @@ restore_saved:
 	movq	P_ADDR(%r12),%r13
 	/* Restore stack pointers. */
-	movq	PCB_RSP(%r13),%rsp
+	movq	PCB_RSP(%r13),%rax
+	addq	$6*8,%rax
+	xorl	%eax,(%rax)		# RETGUARD
+	subq	$6*8,%rax
+	movq	%rax,%rsp
+
 	movq	PCB_RBP(%r13),%rbp
 	movq	CPUVAR(TSS),%rcx
@@ -439,34 +453,47 @@ switch_return:
 	popq	%r12
 	popq	%rbp
 	popq	%rbx
+	RETGUARD_END
 	ret
+END(cpu_switchto)
 ENTRY(cpu_idle_enter)
+	RETGUARD_START
 	movq	_C_LABEL(cpu_idle_enter_fcn),%rax
 	cmpq	$0,%rax
 	je	1f
 	jmpq	*%rax
 1:
+	RETGUARD_END
 	ret
+END(cpu_idle_enter)
 ENTRY(cpu_idle_cycle)
+	RETGUARD_START
 	movq	_C_LABEL(cpu_idle_cycle_fcn),%rax
 	cmpq	$0,%rax
 	je	1f
 	call	*%rax
+	RETGUARD_END
 	ret
 1:
 	sti
 	hlt
+	RETGUARD_END
 	ret
+END(cpu_idle_cycle)
 ENTRY(cpu_idle_leave)
+	RETGUARD_START
 	movq	_C_LABEL(cpu_idle_leave_fcn),%rax
 	cmpq	$0,%rax
 	je	1f
+	RETGUARD_END
 	jmpq	*%rax
 1:
+	RETGUARD_END
 	ret
+END(cpu_idle_leave)
 	.globl	_C_LABEL(panic)
@@ -475,6 +502,7 @@ NENTRY(switch_pmcpu_set)
 	movabsq	$switch_active,%rdi
 	call	_C_LABEL(panic)
 	/* NOTREACHED */
+NEND(switch_pmcpu_set)
 	.section .rodata
 switch_active:
@@ -486,11 +514,16 @@ switch_active:
  * Update pcb, saving current processor state.
  */
 ENTRY(savectx)
+	RETGUARD_START
 	/* Save stack pointers. */
-	movq	%rsp,PCB_RSP(%rdi)
+	movq	%rsp,%rax
+	xorl	%eax,(%rax)		# undo RETGUARD
+	movq	%rax,PCB_RSP(%rdi)
 	movq	%rbp,PCB_RBP(%rdi)
+	RETGUARD_END
 	ret
+END(savectx)
 IDTVEC(syscall32)
 	sysret		/* go away please */
@@ -614,7 +647,7 @@ NENTRY(proc_trampoline)
 	call	*%r12
 	movq	CPUVAR(CURPROC),%r14
 	jmp	.Lsyscall_check_asts
-
+NEND(proc_trampoline)
 /*
  * Return via iretq, for real interrupts and signal returns
@@ -659,7 +692,7 @@ NENTRY(intr_fast_exit)
 	.globl	_C_LABEL(doreti_iret)
 _C_LABEL(doreti_iret):
 	iretq
-
+NEND(intr_fast_exit)
 #if !defined(GPROF) && defined(DDBPROF)
 .Lprobe_fixup:
@@ -692,6 +725,7 @@ _C_LABEL(doreti_iret):
 #endif /* !defined(GPROF) && defined(DDBPROF) */
 ENTRY(pagezero)
+	RETGUARD_START
 	movq    $-PAGE_SIZE,%rdx
 	subq    %rdx,%rdi
 	xorq    %rax,%rax
@@ -703,7 +737,9 @@ ENTRY(pagezero)
 	addq    $32,%rdx
 	jne     1b
 	sfence
+	RETGUARD_END
 	ret
+END(pagezero)
 #if NXEN > 0
 	/* Hypercall page needs to be page aligned */
Index: sys/arch/amd64/amd64/mutex.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/mutex.S,v
retrieving revision 1.13
diff -u -p -u -r1.13 mutex.S
--- sys/arch/amd64/amd64/mutex.S	29 Jun 2017 17:17:28 -0000	1.13
+++ sys/arch/amd64/amd64/mutex.S	18 Aug 2017 02:28:21 -0000
@@ -39,12 +39,16 @@
  * all the functions in the same place.
  */
 ENTRY(__mtx_init)
+	RETGUARD_START
 	movl	%esi, MTX_WANTIPL(%rdi)
 	movl	$0, MTX_OLDIPL(%rdi)
 	movq	$0, MTX_OWNER(%rdi)
+	RETGUARD_END
 	ret
+END(__mtx_init)
 ENTRY(__mtx_enter)
+	RETGUARD_START
 1:	movl	MTX_WANTIPL(%rdi), %eax
 	movq	CPUVAR(SELF), %rcx
 	movl	CPU_INFO_ILEVEL(%rcx), %edx	# oipl = cpl;
@@ -65,6 +69,7 @@ ENTRY(__mtx_enter)
 #ifdef DIAGNOSTIC
 	incl	CPU_INFO_MUTEX_LEVEL(%rcx)
 #endif
+	RETGUARD_END
 	ret
 	/* We failed to obtain the lock. splx, spin and retry. */
@@ -92,8 +97,10 @@ mtx_lockingself:
 	.asciz	"mtx_enter: locking against myself"
 	.text
 #endif
+END(__mtx_enter)
 ENTRY(__mtx_enter_try)
+	RETGUARD_START
 1:	movl	MTX_WANTIPL(%rdi), %eax
 	movq	CPUVAR(SELF), %rcx
 	movl	CPU_INFO_ILEVEL(%rcx), %edx	# oipl = cpl;
@@ -115,6 +122,7 @@ ENTRY(__mtx_enter_try)
 	incl	CPU_INFO_MUTEX_LEVEL(%rcx)
 #endif
 	movq	$1, %rax
+	RETGUARD_END
 	ret
 	/* We failed to obtain the lock. splx and return 0. */
@@ -128,6 +136,7 @@ ENTRY(__mtx_enter_try)
 	je	3f
 #endif
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 #ifdef DIAGNOSTIC
@@ -139,9 +148,10 @@ mtx_lockingtry:
 	.asciz	"mtx_enter_try: locking against myself"
 	.text
 #endif
-
+END(__mtx_enter_try)
 ENTRY(__mtx_leave)
+	RETGUARD_START
 	movq	%rdi, %rax
 #ifdef DIAGNOSTIC
 	movq	CPUVAR(SELF), %rcx
@@ -157,6 +167,7 @@ ENTRY(__mtx_leave)
 	je	1f
 	call	_C_LABEL(spllower)
 1:
+	RETGUARD_END
 	ret
 #ifdef DIAGNOSTIC
@@ -168,3 +179,4 @@ mtx_leave_held:
 	.asciz	"mtx_leave: lock not held"
 	.text
 #endif
+END(__mtx_leave)
Index: sys/arch/amd64/amd64/spl.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/spl.S,v
retrieving revision 1.11
diff -u -p -u -r1.11 spl.S
--- sys/arch/amd64/amd64/spl.S	20 May 2016 14:37:53 -0000	1.11
+++ sys/arch/amd64/amd64/spl.S	18 Aug 2017 02:28:21 -0000
@@ -85,18 +85,24 @@
 	.globl	_C_LABEL(splhigh), _C_LABEL(splx)
 	.align 16, 0xcc
-_C_LABEL(splhigh):
+ENTRY(splhigh)
+	RETGUARD_START
 	movl	$IPL_HIGH,%eax
 	xchgl	%eax,CPUVAR(ILEVEL)
+	RETGUARD_END
 	ret
+END(splhigh)
 	.align 16, 0xcc
-_C_LABEL(splx):
+ENTRY(splx)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	movl	%eax,CPUVAR(ILEVEL)
 	testl	%eax,%eax
 	jnz	_C_LABEL(Xspllower)
+	RETGUARD_END
 	ret
+END(splx)
 #endif /* PROF || GPROF */
 #endif
@@ -115,10 +121,18 @@ _C_LABEL(splx):
  * the sending CPU will never see the that CPU accept the IPI
  */
 IDTVEC(spllower)
+	.cfi_startproc
+	RETGUARD_START
+
 	_PROF_PROLOGUE
 	pushq	%rbx
 	pushq	%r13
 	movl	%edi,%ebx
+
+	movq	%rsp,%rax
+	addq	$16,%rax
+	xorq	%rax,(%rax)
+
 	leaq	1f(%rip),%r13		# address to resume loop at
 1:	movl	%ebx,%eax		# get cpl
 	movq	CPUVAR(IUNMASK)(,%rax,8),%rax
@@ -130,11 +144,17 @@ IDTVEC(spllower)
 	movq	CPUVAR(ISOURCES)(,%rax,8),%rax
 	jmp	*IS_RECURSE(%rax)
 2:
+	movq	%rsp,%rax
+	addq	$2*8,%rax
+	xorq	%rax,(%rax)
+
 	movl	%ebx,CPUVAR(ILEVEL)
 	sti
 	popq	%r13
 	popq	%rbx
+	RETGUARD_END
 	ret
+	.cfi_endproc
 /*
  * Handle return from interrupt after device handler finishes.
Index: sys/arch/amd64/amd64/vector.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/vector.S,v
retrieving revision 1.49
diff -u -p -u -r1.49 vector.S
--- sys/arch/amd64/amd64/vector.S	29 Jun 2017 17:17:28 -0000	1.49
+++ sys/arch/amd64/amd64/vector.S	18 Aug 2017 02:28:21 -0000
@@ -236,6 +236,7 @@ NENTRY(resume_iret)
 	INTR_SAVE_GPRS
 	sti
 	jmp	calltrap
+NEND(resume_iret)
 /*
  * All traps go through here. Call the generic trap handler, and
@@ -296,7 +297,10 @@ calltrap:
 #endif /* DDB */
 	movl	%ebx,CPUVAR(ILEVEL)
 	jmp	2b
+#endif /* DIAGNOSTIC */
+NEND(alltraps)
+#ifdef DIAGNOSTIC
 	.section .rodata
 spl_lowered:
 	.asciz	"WARNING: SPL NOT LOWERED ON TRAP EXIT %x %x\n"
@@ -326,8 +330,8 @@ spl_lowered:
 /* XXX See comment in locore.s */
 #define	XINTR(name,num)		Xintr_##name##num
-	.globl _C_LABEL(x2apic_eoi)
-_C_LABEL(x2apic_eoi):
+NENTRY(x2apic_eoi)
+	RETGUARD_START
 	pushq   %rax
 	pushq   %rcx
 	pushq   %rdx
@@ -338,7 +342,9 @@ _C_LABEL(x2apic_eoi):
 	popq    %rdx
 	popq    %rcx
 	popq    %rax
+	RETGUARD_END
 	ret
+NEND(x2apic_eoi)
 #if NLAPIC > 0
 #ifdef MULTIPROCESSOR
Index: sys/arch/amd64/amd64/vmm_support.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/vmm_support.S,v
retrieving revision 1.9
diff -u -p -u -r1.9 vmm_support.S
--- sys/arch/amd64/amd64/vmm_support.S	30 May 2017 17:49:47 -0000	1.9
+++ sys/arch/amd64/amd64/vmm_support.S	18 Aug 2017 02:28:21 -0000
@@ -28,23 +28,12 @@
 #define VMX_FAIL_LAUNCH_INVALID_VMCS 2
 #define VMX_FAIL_LAUNCH_VALID_VMCS 3
-	.global _C_LABEL(vmxon)
-	.global _C_LABEL(vmxoff)
-	.global _C_LABEL(vmclear)
-	.global _C_LABEL(vmptrld)
-	.global _C_LABEL(vmptrst)
-	.global _C_LABEL(vmwrite)
-	.global _C_LABEL(vmread)
-	.global _C_LABEL(invvpid)
-	.global _C_LABEL(invept)
-	.global _C_LABEL(vmx_enter_guest)
-	.global _C_LABEL(vmm_dispatch_intr)
-	.global _C_LABEL(svm_enter_guest)
-
 	.text
 	.code64
 	.align 16,0xcc
-_C_LABEL(vmm_dispatch_intr):
+
+ENTRY(vmm_dispatch_intr)
+	RETGUARD_START
 	movq	%rsp, %r11	/* r11 = temporary register */
 	andq    $0xFFFFFFFFFFFFFFF0, %rsp
 	movw	%ss, %ax
@@ -55,87 +44,124 @@ _C_LABEL(vmm_dispatch_intr):
 	pushq   %rax
 	cli
 	callq	*%rdi
+	RETGUARD_END
 	ret
+END(vmm_dispatch_intr)
-_C_LABEL(vmxon):
+ENTRY(vmxon)
+	RETGUARD_START
 	vmxon	(%rdi)
 	jz	failed_on
 	jc	failed_on
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_on:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmxon)
-_C_LABEL(vmxoff):
+ENTRY(vmxoff)
+	RETGUARD_START
 	vmxoff
 	jz	failed_off
 	jc	failed_off
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_off:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmxoff)
-_C_LABEL(vmclear):
+ENTRY(vmclear)
+	RETGUARD_START
 	vmclear	(%rdi)
 	jz	failed_clear
 	jc	failed_clear
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_clear:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmclear)
-_C_LABEL(vmptrld):
+ENTRY(vmptrld)
+	RETGUARD_START
 	vmptrld	(%rdi)
 	jz	failed_ptrld
 	jc	failed_ptrld
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_ptrld:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmptrld)
-_C_LABEL(vmptrst):
+ENTRY(vmptrst)
+	RETGUARD_START
 	vmptrst	(%rdi)
 	jz	failed_ptrst
 	jc	failed_ptrst
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_ptrst:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmptrst)
-_C_LABEL(vmwrite):
+ENTRY(vmwrite)
+	RETGUARD_START
 	vmwrite	%rsi, %rdi
 	jz	failed_write
 	jc	failed_write
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_write:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmwrite)
-_C_LABEL(vmread):
+ENTRY(vmread)
+	RETGUARD_START
 	vmread	%rdi, (%rsi)
 	jz	failed_read
 	jc	failed_read
 	xorq	%rax, %rax
+	RETGUARD_END
 	ret
 failed_read:
 	movq	$0x01, %rax
+	RETGUARD_END
 	ret
+END(vmread)
-_C_LABEL(invvpid):
+ENTRY(invvpid)
+	RETGUARD_START
 	invvpid (%rsi), %rdi
+	RETGUARD_END
 	ret
+END(invvpid)
-_C_LABEL(invept):
+ENTRY(invept)
+	RETGUARD_START
 	invept (%rsi), %rdi
+	RETGUARD_END
 	ret
+END(invept)
-_C_LABEL(vmx_enter_guest):
+ENTRY(vmx_enter_guest)
+	RETGUARD_START
 	movq	%rdx, %r8	/* resume flag */
 	testq	%r8, %r8
 	jnz skip_init
@@ -385,9 +411,12 @@ restore_host:
 	popfq
 	movq	%rdi, %rax
+	RETGUARD_END
 	ret	
+END(vmx_enter_guest)
-_C_LABEL(svm_enter_guest):
+ENTRY(svm_enter_guest)
+	RETGUARD_START
 	clgi
 	movq	%rdi, %r8
 	pushfq
@@ -587,4 +616,6 @@ restore_host_svm:
 	movq	%rdi, %rax
 	stgi
+	RETGUARD_END
 	ret	
+END(svm_enter_guest)
Index: sys/arch/amd64/conf/ld.script
===================================================================
RCS file: /cvs/src/sys/arch/amd64/conf/ld.script,v
retrieving revision 1.7
diff -u -p -u -r1.7 ld.script
--- sys/arch/amd64/conf/ld.script	6 Jul 2017 06:21:56 -0000	1.7
+++ sys/arch/amd64/conf/ld.script	18 Aug 2017 02:28:21 -0000
@@ -29,14 +29,6 @@ PHDRS
 	openbsd_randomize PT_OPENBSD_RANDOMIZE;
 }
-/*
- * If we want the text/rodata/data sections aligned on 2M boundaries,
- * we could use the following instead.  Note, file size would increase
- * due to necessary padding.
- *
- *__ALIGN_SIZE = 0x200000;
- */
-__ALIGN_SIZE = 0x1000;
 __kernel_base = 0xffffffff80000000;
 __kernel_virt_base = __kernel_base + 0x1000000;
 __kernel_phys_base = 0x1000000;
@@ -56,7 +48,7 @@ SECTIONS
 	_etext = .;
 	/* Move rodata to the next page, so we can nuke X and W bit on them */
-	. = ALIGN(__ALIGN_SIZE);
+	. = ALIGN(0x1000);
 	__kernel_rodata_phys = (. - __kernel_virt_base) + 0x1000000;
 	.rodata : AT (__kernel_rodata_phys)
 	{
@@ -77,7 +69,7 @@ SECTIONS
 	_erodata = .;
 	/* Move data to the next page, so we can add W bit on them */
-	. = ALIGN(__ALIGN_SIZE);
+	. = ALIGN(0x1000);
 	__kernel_data_phys = (. - __kernel_virt_base) + 0x1000000;
 	.data : AT (__kernel_data_phys)
 	{
Index: sys/arch/amd64/include/asm.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/asm.h,v
retrieving revision 1.8
diff -u -p -u -r1.8 asm.h
--- sys/arch/amd64/include/asm.h	29 Jun 2017 17:36:16 -0000	1.8
+++ sys/arch/amd64/include/asm.h	18 Aug 2017 17:58:23 -0000
@@ -49,6 +49,18 @@
 # define _C_LABEL(x)	x
 #define	_ASM_LABEL(x)	x
+#ifdef _KERNEL	/* 32 bit */
+#define RETGUARD_CFI	.cfi_escape 0x16, 0x10, 0x0d, 0x09, 0xf8, 0x22, 0x12, \
+			0x06, 0x16, 0x0c, 0xff, 0xff, 0xff, 0xff, 0x1a, 0x27
+#define	RETGUARD_START	RETGUARD_CFI; xorl %esp,(%rsp)
+#define	RETGUARD_END	xorl %esp,(%rsp)
+#else
+#define RETGUARD_CFI	.cfi_escape 0x16, 0x10, 0x06, 0x09, 0xf8, 0x22, 0x12, \
+			0x06, 0x27
+#define	RETGUARD_START	RETGUARD_CFI; xorq %rsp,(%rsp)
+#define	RETGUARD_END	xorq %rsp,(%rsp)
+#endif
+
 #define CVAROFF(x,y)		(_C_LABEL(x)+y)(%rip)
 #ifdef __STDC__
@@ -92,10 +104,12 @@
 # define _PROF_PROLOGUE
 #endif
-#define	ENTRY(y)	_ENTRY(_C_LABEL(y)); _PROF_PROLOGUE
-#define	NENTRY(y)	_ENTRY(_C_LABEL(y))
+#define	ENTRY(y)	_ENTRY(_C_LABEL(y)); _PROF_PROLOGUE; .cfi_startproc
+#define	NENTRY(y)	_ENTRY(_C_LABEL(y)); .cfi_startproc
 #define	ASENTRY(y)	_ENTRY(_ASM_LABEL(y)); _PROF_PROLOGUE
-#define	END(y)		.size y, . - y
+#define	_ASM_SIZE(y)	.size y, . - y
+#define	END(y)		.cfi_endproc; _ASM_SIZE(y)
+#define	NEND(y)		.cfi_endproc
 #define	STRONG_ALIAS(alias,sym)						\
 	.global alias;							\
Index: sys/arch/amd64/include/cdefs.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cdefs.h,v
retrieving revision 1.3
diff -u -p -u -r1.3 cdefs.h
--- sys/arch/amd64/include/cdefs.h	28 Mar 2013 17:30:45 -0000	1.3
+++ sys/arch/amd64/include/cdefs.h	18 Aug 2017 02:39:39 -0000
@@ -18,4 +18,19 @@
 	__asm__(".section .gnu.warning." __STRING(sym)			\
 	    " ; .ascii \"" msg "\" ; .text")
+/*
+ * Fix __builtin_return_address() when compile with -fxor-ret-protector.
+ */
+#if defined(_RET_PROTECTOR)
+# if defined(_KERNEL)
+#  define __builtin_return_address(d) ((void *) \
+        ((size_t)__builtin_return_address(d) ^ \
+         (unsigned int)__builtin_frame_address(d) + sizeof(void *)))
+# else
+#  define __builtin_return_address(d) ((void *) \
+        ((size_t)__builtin_return_address(d) ^ \
+         (size_t)__builtin_frame_address(d) + sizeof(void *)))
+# endif
+#endif
+
 #endif /* !_MACHINE_CDEFS_H_ */
Index: sys/arch/amd64/stand/cdboot/srt0.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/cdboot/srt0.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 srt0.S
--- sys/arch/amd64/stand/cdboot/srt0.S	29 Oct 2012 13:54:56 -0000	1.3
+++ sys/arch/amd64/stand/cdboot/srt0.S	18 Aug 2017 02:28:21 -0000
@@ -204,6 +204,7 @@ ENTRY(debugchar)
 	movb	%al, (%ebx)
 	popl	%ebx
 	ret
+END(debugchar)
 	.code16
Index: sys/arch/amd64/stand/efiboot/eficall.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/efiboot/eficall.S,v
retrieving revision 1.1
diff -u -p -u -r1.1 eficall.S
--- sys/arch/amd64/stand/efiboot/eficall.S	2 Sep 2015 01:52:25 -0000	1.1
+++ sys/arch/amd64/stand/efiboot/eficall.S	18 Aug 2017 02:28:21 -0000
@@ -62,3 +62,4 @@ ENTRY(efi_call)
 	mov	%rbp, %rsp
 	pop	%rbp
 	retq
+END(efi_call)
Index: sys/arch/amd64/stand/libsa/gidt.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/gidt.S,v
retrieving revision 1.11
diff -u -p -u -r1.11 gidt.S
--- sys/arch/amd64/stand/libsa/gidt.S	27 Oct 2012 15:43:42 -0000	1.11
+++ sys/arch/amd64/stand/libsa/gidt.S	18 Aug 2017 02:28:21 -0000
@@ -160,6 +160,7 @@ ENTRY(_rtt)
 	/* Again... */
 	movl	$0, %esp	/* segment violation */
 	ret
+END(_rtt)
 #define IPROC(n)	X##n
 #define IEMU(n)		IPROC(emu##n)
@@ -462,5 +463,4 @@ ENTRY(bootbuf)
 	/* Jump to buffer */
 	ljmp	$0x0, $0x7c00
-
-	.end
+END(bootbuf)
Index: sys/arch/amd64/stand/libsa/pxe_call.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/pxe_call.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 pxe_call.S
--- sys/arch/amd64/stand/libsa/pxe_call.S	2 Jan 2006 00:26:29 -0000	1.4
+++ sys/arch/amd64/stand/libsa/pxe_call.S	18 Aug 2017 02:28:21 -0000
@@ -82,6 +82,7 @@ _C_LABEL(bangpxe_seg) = . - 2
 	popl	%ebx
 	popl	%ebp
 	ret
+END(pxecall_bangpxe)
 ENTRY(pxecall_pxenv)
 	.code32
@@ -125,6 +126,7 @@ _C_LABEL(pxenv_seg) = . - 2
 	popl	%ebx
 	popl	%ebp
 	ret
+END(pxecall_pxenv)
 /*
  * prot_to_real()
Index: sys/arch/amd64/stand/libsa/random_amd64.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/random_amd64.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 random_amd64.S
--- sys/arch/amd64/stand/libsa/random_amd64.S	12 Feb 2016 21:36:33 -0000	1.5
+++ sys/arch/amd64/stand/libsa/random_amd64.S	18 Aug 2017 02:28:21 -0000
@@ -104,3 +104,4 @@ usetsc:
 done:
 	popq	%rbx
 	retq
+END(mdrandom)
Index: sys/arch/amd64/stand/libsa/random_i386.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/random_i386.S,v
retrieving revision 1.10
diff -u -p -u -r1.10 random_i386.S
--- sys/arch/amd64/stand/libsa/random_i386.S	12 Feb 2016 21:36:33 -0000	1.10
+++ sys/arch/amd64/stand/libsa/random_i386.S	18 Aug 2017 02:28:21 -0000
@@ -104,3 +104,4 @@ usetsc:
 done:
 	popal
 	ret
+END(mdrandom)
Index: sys/arch/amd64/stand/pxeboot/srt0.S
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/pxeboot/srt0.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 srt0.S
--- sys/arch/amd64/stand/pxeboot/srt0.S	29 Oct 2012 14:18:11 -0000	1.3
+++ sys/arch/amd64/stand/pxeboot/srt0.S	18 Aug 2017 02:28:21 -0000
@@ -199,6 +199,7 @@ ENTRY(debugchar)
 	movb	%al, (%ebx)
 	popl	%ebx
 	ret
+END(debugchar)
 	.code16
Index: sys/arch/i386/conf/Makefile.i386
===================================================================
RCS file: /cvs/src/sys/arch/i386/conf/Makefile.i386,v
retrieving revision 1.117
diff -u -p -u -r1.117 Makefile.i386
--- sys/arch/i386/conf/Makefile.i386	12 Aug 2017 20:26:11 -0000	1.117
+++ sys/arch/i386/conf/Makefile.i386	18 Aug 2017 02:28:21 -0000
@@ -29,7 +29,7 @@ CWARNFLAGS=	-Werror -Wall -Wimplicit-fun
 		-Wframe-larger-than=2047
 CMACHFLAGS=
-CMACHFLAGS+=	-ffreestanding ${NOPIE_FLAGS}
+CMACHFLAGS+=	-mcmodel=kernel -ffreestanding ${NOPIE_FLAGS}
 SORTR=		sort -R
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=	-fno-stack-protector
Index: sys/arch/i386/i386/acpi_wakecode.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/acpi_wakecode.S,v
retrieving revision 1.29
diff -u -p -u -r1.29 acpi_wakecode.S
--- sys/arch/i386/i386/acpi_wakecode.S	28 Jun 2017 08:51:36 -0000	1.29
+++ sys/arch/i386/i386/acpi_wakecode.S	18 Aug 2017 02:28:21 -0000
@@ -350,10 +350,12 @@ _ACPI_TRMP_LABEL(.Lhibernate_resume_vect
 	/* Jump to the S3 resume vector */
 	ljmp	$(_ACPI_RM_CODE_SEG), $.Lacpi_s3_vector_real
+NEND(hibernate_resume_machdep)
 	.code32
 	/* Switch to hibernate resume pagetable */
 NENTRY(hibernate_activate_resume_pt_machdep)
+	RETGUARD_START
 	/* Enable large pages */
 	movl	%cr4, %eax
 	orl	$(CR4_PSE), %eax
@@ -384,8 +386,9 @@ NENTRY(hibernate_activate_resume_pt_mach
 	jmp	1f
 1:	nop
+	RETGUARD_END
 	ret
-
+NEND(hibernate_activate_resume_pt_machdep)
 	/*
 	 * Switch to the private resume-time hibernate stack
 	 */
@@ -397,10 +400,14 @@ NENTRY(hibernate_switch_stack_machdep)
 	/* On our own stack from here onward */
 	ret
+NEND(hibernate_switch_stack_machdep)
 NENTRY(hibernate_flush)
+	RETGUARD_START
 	invlpg  HIBERNATE_INFLATE_PAGE
+	RETGUARD_END
 	ret
+NEND(hibernate_flush)
 #endif /* HIBERNATE */
 	/*
@@ -578,6 +585,7 @@ _C_LABEL(acpi_tramp_data_end):
 	.code32
 NENTRY(acpi_savecpu)
 	movl	(%esp), %eax
+	RETGUARD_START		# 2nd instruction
 	movl	%eax, .Lacpi_saved_ret
 	movw	%cs, .Lacpi_saved_cs
@@ -613,4 +621,6 @@ NENTRY(acpi_savecpu)
 	str	.Lacpi_saved_tr
 	movl	$1, %eax
+	RETGUARD_END
 	ret
+NEND(acpi_savecpu)
Index: sys/arch/i386/i386/apmcall.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/apmcall.S,v
retrieving revision 1.6
diff -u -p -u -r1.6 apmcall.S
--- sys/arch/i386/i386/apmcall.S	28 Nov 2013 19:30:46 -0000	1.6
+++ sys/arch/i386/i386/apmcall.S	18 Aug 2017 02:28:21 -0000
@@ -43,6 +43,7 @@ _C_LABEL(apm_cli):
  */
 	.text
 ENTRY(apmcall)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp, %ebp
 	pushl	%ebx
@@ -104,6 +105,7 @@ ENTRY(apmcall)
 	popl	%esi
 	popl	%ebx
 	popl	%ebp
+	RETGUARD_END
 	ret
-
+END(apmcall)
 	.end
Index: sys/arch/i386/i386/db_trace.c
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/db_trace.c,v
retrieving revision 1.29
diff -u -p -u -r1.29 db_trace.c
--- sys/arch/i386/i386/db_trace.c	11 Aug 2017 20:50:15 -0000	1.29
+++ sys/arch/i386/i386/db_trace.c	18 Aug 2017 05:34:09 -0000
@@ -68,6 +68,14 @@ struct db_variable *db_eregs = db_regs +
  */
 #define	INKERNEL(va)	(((vaddr_t)(va)) >= VM_MIN_KERNEL_ADDRESS)
+/* Kernel uses xor %sp,(%esp) for RETGUARD/-fret-protector */
+#if defined(PROF) || defined(GPROF)
+# define GETPC(frame) (db_get_value((db_addr_t)&frame->f_retaddr, 4, FALSE))
+#else
+# define GETPC(frame) (db_get_value((db_addr_t)&frame->f_retaddr, 4, FALSE)\
+		^ (unsigned short)&(frame->f_retaddr))
+#endif
+
 #define	NONE		0
 #define	TRAP		1
 #define	SYSCALL		2
@@ -124,8 +132,7 @@ db_nextframe(struct callframe **fp, db_a
 	switch (is_trap) {
 	    case NONE:
-		*ip = (db_addr_t)
-			db_get_value((int) &(*fp)->f_retaddr, 4, FALSE);
+		*ip = (db_addr_t)GETPC((*fp));
 		*fp = (struct callframe *)
 			db_get_value((int) &(*fp)->f_frame, 4, FALSE);
 		break;
@@ -221,12 +228,10 @@ db_stack_trace_print(db_expr_t addr, boo
 			return;
 		}
 		frame = (struct callframe *)p->p_addr->u_pcb.pcb_ebp;
-		callpc = (db_addr_t)
-		    db_get_value((int)&frame->f_retaddr, 4, FALSE);
+		callpc = (db_addr_t)GETPC(frame);
 	} else {
 		frame = (struct callframe *)addr;
-		callpc = (db_addr_t)
-			 db_get_value((int)&frame->f_retaddr, 4, FALSE);
+		callpc = (db_addr_t)GETPC(frame);
 	}
 	lastframe = 0;
@@ -284,8 +289,7 @@ db_stack_trace_print(db_expr_t addr, boo
 		if (lastframe == 0 && offset == 0 && !have_addr && !is_trap) {
 			/* Frame really belongs to next callpc */
 			lastframe = (struct callframe *)(ddb_regs.tf_esp-4);
-			callpc = (db_addr_t)
-				 db_get_value((int)&lastframe->f_retaddr, 4, FALSE);
+			callpc = (db_addr_t)GETPC(frame);
 			continue;
 		}
@@ -331,7 +335,7 @@ db_save_stack_trace(struct db_stack_trac
 	unsigned int	i;
 	frame = __builtin_frame_address(0);
-	callpc = db_get_value((int)&frame->f_retaddr, 4, FALSE);
+	callpc = (db_addr_t)GETPC(frame);
 	lastframe = NULL;
 	for (i = 0; i < DB_STACK_TRACE_MAX && frame != NULL; i++) {
@@ -378,7 +382,7 @@ db_get_pc(struct trapframe *tf)
 	else
 		cf = (struct callframe *)(tf->tf_esp - sizeof(long));
-	return db_get_value((db_addr_t)&cf->f_retaddr, sizeof(long), 0);
+	return (db_addr_t)GETPC(cf);
 }
 vaddr_t
Index: sys/arch/i386/i386/in_cksum.s
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/in_cksum.s,v
retrieving revision 1.9
diff -u -p -u -r1.9 in_cksum.s
--- sys/arch/i386/i386/in_cksum.s	29 Jun 2017 17:17:28 -0000	1.9
+++ sys/arch/i386/i386/in_cksum.s	18 Aug 2017 02:28:21 -0000
@@ -117,6 +117,7 @@
 /* LINTSTUB: Func: int in4_cksum(struct mbuf *m, u_int8_t nxt, int off, int len) */
 ENTRY(in4_cksum)
+	RETGUARD_START
 	pushl	%ebp
 	pushl	%ebx
 	pushl	%esi
@@ -157,10 +158,12 @@ ENTRY(in4_cksum)
 	 * doesn't explode.
 	 */
 	jmp	.Lin4_entry
+END(in4_cksum)
 /* LINTSTUB: Func: int in_cksum(struct mbuf *m, int len) */
 ENTRY(in_cksum)
+	RETGUARD_START
 	pushl	%ebp
 	pushl	%ebx
 	pushl	%esi
@@ -352,6 +355,7 @@ ENTRY(in_cksum)
 	popl	%esi
 	popl	%ebx
 	popl	%ebp
+	RETGUARD_END
 	ret
 .Lout_of_mbufs:
@@ -359,6 +363,7 @@ ENTRY(in_cksum)
 	call	_C_LABEL(printf)
 	leal	4(%esp), %esp
 	jmp	.Lreturn
+END(in_cksum)
 	.section	.rodata
 cksum_ood:
Index: sys/arch/i386/i386/kvm86call.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/kvm86call.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 kvm86call.S
--- sys/arch/i386/i386/kvm86call.S	25 Apr 2015 21:31:24 -0000	1.7
+++ sys/arch/i386/i386/kvm86call.S	18 Aug 2017 02:28:21 -0000
@@ -152,7 +152,7 @@ ENTRY(kvm86_call)
 	popl	%eax
 	addl	$8,%esp
 	iret
-
+END(kvm86_call)
 /* void kvm86_ret(struct trapframe *, int) */
 ENTRY(kvm86_ret)
@@ -226,3 +226,4 @@ ENTRY(kvm86_ret)
 	popl	%esi
 	popl	%ebp
 	ret				/* back to kvm86_call()'s caller */
+END(kvm86_ret)
Index: sys/arch/i386/i386/locore.s
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.178
diff -u -p -u -r1.178 locore.s
--- sys/arch/i386/i386/locore.s	6 Jul 2017 06:17:05 -0000	1.178
+++ sys/arch/i386/i386/locore.s	18 Aug 2017 03:18:16 -0000
@@ -246,6 +246,7 @@ NENTRY(proc_trampoline)
 	addl	$4,%esp
 	INTRFASTEXIT
 	/* NOTREACHED */
+NEND(proc_trampoline)
 	/* This must come before any use of the CODEPATCH macros */
        .section .codepatch,"a"
@@ -311,6 +312,7 @@ _C_LABEL(sigfillsiz):
  * Copy len bytes, abort on fault.
  */
 ENTRY(kcopy)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -344,6 +346,7 @@ ENTRY(kcopy)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
 	.align  4,0xcc
@@ -371,7 +374,9 @@ ENTRY(kcopy)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(kcopy)
 /*****************************************************************************/
@@ -385,6 +390,7 @@ ENTRY(kcopy)
  * Copy len bytes into the user's address space.
  */
 ENTRY(copyout)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -432,13 +438,16 @@ ENTRY(copyout)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(copyout)
 /*
  * copyin(caddr_t from, caddr_t to, size_t len);
  * Copy len bytes from the user's address space.
  */
 ENTRY(copyin)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -484,7 +493,9 @@ ENTRY(copyin)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(copyin)
 ENTRY(copy_fault)
 	SMAP_CLAC
@@ -496,7 +507,9 @@ ENTRY(copy_fault)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(copy_fault)
 /*
  * copyoutstr(caddr_t from, caddr_t to, size_t maxlen, size_t *lencopied);
@@ -506,6 +519,7 @@ ENTRY(copy_fault)
  * return 0 or EFAULT.
  */
 ENTRY(copyoutstr)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -553,6 +567,7 @@ ENTRY(copyoutstr)
 	jae	_C_LABEL(copystr_fault)
 	movl	$ENAMETOOLONG,%eax
 	jmp	copystr_return
+END(copyoutstr)
 /*
  * copyinstr(caddr_t from, caddr_t to, size_t maxlen, size_t *lencopied);
@@ -562,6 +577,7 @@ ENTRY(copyoutstr)
  * return 0 or EFAULT.
  */
 ENTRY(copyinstr)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -608,6 +624,7 @@ ENTRY(copyinstr)
 	jae	_C_LABEL(copystr_fault)
 	movl	$ENAMETOOLONG,%eax
 	jmp	copystr_return
+END(copyinstr)
 ENTRY(copystr_fault)
 	movl	$EFAULT,%eax
@@ -629,7 +646,9 @@ copystr_return:
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(copystr_fault)
 /*
  * copystr(caddr_t from, caddr_t to, size_t maxlen, size_t *lencopied);
@@ -638,6 +657,7 @@ copystr_return:
  * string is too long, return ENAMETOOLONG; else return 0.
  */
 ENTRY(copystr)
+	RETGUARD_START
 #ifdef DDB
 	pushl	%ebp
 	movl	%esp,%ebp
@@ -678,7 +698,9 @@ ENTRY(copystr)
 #ifdef DDB
 	leave
 #endif
+	RETGUARD_END
 	ret
+END(copystr)
 /*****************************************************************************/
@@ -709,6 +731,7 @@ NENTRY(lgdt)
 	pushl	$GSEL(GCODE_SEL, SEL_KPL)
 	pushl	%eax
 	lret
+NEND(lgdt)
 ENTRY(setjmp)
 	movl	4(%esp),%eax
@@ -721,6 +744,7 @@ ENTRY(setjmp)
 	movl	%edx,20(%eax)		# save eip
 	xorl	%eax,%eax		# return (0);
 	ret
+END(setjmp)
 ENTRY(longjmp)
 	movl	4(%esp),%eax
@@ -734,6 +758,7 @@ ENTRY(longjmp)
 	xorl	%eax,%eax		# return (1);
 	incl	%eax
 	ret
+END(longjmp)
 /*****************************************************************************/
@@ -817,33 +842,46 @@ switch_exited:
 	popl	%esi
 	popl	%ebx
 	ret
+END(cpu_switchto)
 ENTRY(cpu_idle_enter)
+	RETGUARD_START
 	movl	_C_LABEL(cpu_idle_enter_fcn),%eax
 	cmpl	$0,%eax
 	je	1f
+	RETGUARD_END
 	jmpl	*%eax
 1:
+	RETGUARD_END
 	ret
+END(cpu_idle_enter)
 ENTRY(cpu_idle_cycle)
+	RETGUARD_START
 	movl	_C_LABEL(cpu_idle_cycle_fcn),%eax
 	cmpl	$0,%eax
 	je	1f
 	call	*%eax
+	RETGUARD_END
 	ret
 1:
 	sti
 	hlt
+	RETGUARD_END
 	ret
+END(cpu_idle_cycle)
 ENTRY(cpu_idle_leave)
+	RETGUARD_START
 	movl	_C_LABEL(cpu_idle_leave_fcn),%eax
 	cmpl	$0,%eax
 	je	1f
+	RETGUARD_END
 	jmpl	*%eax
 1:
+	RETGUARD_END
 	ret
+END(cpu_idle_cycle)
 /*
  * savectx(struct pcb *pcb);
@@ -861,6 +899,7 @@ ENTRY(savectx)
 	movl	%ecx,PCB_FLAGS(%edx)
 	ret
+END(savectx)
 /*****************************************************************************/
@@ -991,22 +1030,27 @@ IDTVEC(align)
  */
 NENTRY(resume_iret)
 	ZTRAP(T_PROTFLT)
+NEND(resume_iret)
 NENTRY(resume_pop_ds)
 	pushl	%es
 	movl	$GSEL(GDATA_SEL, SEL_KPL),%eax
 	movw	%ax,%es
+NEND(resume_pop_ds)
 NENTRY(resume_pop_es)
 	pushl	%gs
 	xorl	%eax,%eax	/* $GSEL(GNULL_SEL, SEL_KPL) == 0 */
 	movw	%ax,%gs
+NEND(resume_pop_es)
 NENTRY(resume_pop_gs)
 	pushl	%fs
 	movl	$GSEL(GCPU_SEL, SEL_KPL),%eax
 	movw	%ax,%fs
+NEND(resume_pop_gs)
 NENTRY(resume_pop_fs)
 	movl	$T_PROTFLT,TF_TRAPNO(%esp)
 	sti
 	jmp	calltrap
+NEND(resume_pop_fs)
 /*
  * All traps go through here. Call the generic trap handler, and
@@ -1083,7 +1127,10 @@ calltrap:
 #endif /* DDB */
 	movl	%ebx,CPL
 	jmp	2b
+#endif /* DIAGNOSTIC */
+NEND(alltraps)
+#ifdef DIAGNOSTIC
 	.section .rodata
 spl_lowered:
 	.asciz	"WARNING: SPL NOT LOWERED ON TRAP EXIT\n"
@@ -1148,6 +1195,7 @@ IDTVEC(syscall)
  */
 ENTRY(bzero)
+	RETGUARD_START
 	pushl	%edi
 	movl	8(%esp),%edi
 	movl	12(%esp),%edx
@@ -1207,10 +1255,13 @@ ENTRY(bzero)
 	stosb
 	popl	%edi
+	RETGUARD_END
 	ret
+END(bzero)
 #if !defined(SMALL_KERNEL)
 ENTRY(sse2_pagezero)
+	RETGUARD_START
 	pushl	%ebx
 	movl	8(%esp),%ecx
 	movl	%ecx,%eax
@@ -1223,9 +1274,12 @@ ENTRY(sse2_pagezero)
 	jne	1b
 	sfence
 	popl	%ebx
+	RETGUARD_END
 	ret
+END(sse2_pagezero)
 ENTRY(i686_pagezero)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%ebx
@@ -1241,6 +1295,7 @@ ENTRY(i686_pagezero)
 	popl	%ebx
 	popl	%edi
+	RETGUARD_END
 	ret
 	.align  4,0x90
@@ -1271,13 +1326,16 @@ ENTRY(i686_pagezero)
 	popl	%ebx
 	popl	%edi
+	RETGUARD_END
 	ret
+END(i686_pagezero)
 #endif
 /*
  * int cpu_paenable(void *);
  */
 ENTRY(cpu_paenable)
+	RETGUARD_END
 	movl	$-1, %eax
 	testl	$CPUID_PAE, _C_LABEL(cpu_feature)
 	jz	1f
@@ -1312,7 +1370,9 @@ ENTRY(cpu_paenable)
 	popl	%edi
 	popl	%esi
 1:
+	RETGUARD_END
 	ret
+END(cpu_paenable)
 #if NLAPIC > 0
 #include <i386/i386/apicvec.s>
Index: sys/arch/i386/i386/mutex.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/mutex.S,v
retrieving revision 1.12
diff -u -p -u -r1.12 mutex.S
--- sys/arch/i386/i386/mutex.S	29 Jun 2017 17:17:28 -0000	1.12
+++ sys/arch/i386/i386/mutex.S	18 Aug 2017 02:28:21 -0000
@@ -31,6 +31,7 @@
  * all the functions in the same place.
  */
 ENTRY(__mtx_init)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp, %ebp
 	movl	8(%esp), %eax
@@ -41,11 +42,14 @@ ENTRY(__mtx_init)
 	movl	%edx, MTX_LOCK(%eax)
 	movl	%edx, MTX_OWNER(%eax)
 	leave
+	RETGUARD_END
 	ret
+END(__mtx_init)
 #define SOFF	8
 ENTRY(__mtx_enter)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp, %ebp	
 1:	movl	SOFF(%ebp), %ecx
@@ -69,6 +73,7 @@ ENTRY(__mtx_enter)
 	movl	%eax, MTX_OWNER(%ecx)
 	movl	%edx, MTX_OLDIPL(%ecx)
 	leave
+	RETGUARD_END
 	ret
 	/* We failed to obtain the lock. splx, spin and retry. */
@@ -90,7 +95,10 @@ ENTRY(__mtx_enter)
 #ifdef DIAGNOSTIC
 5:	pushl	$mtx_lockingself
 	call	_C_LABEL(panic)
+#endif
+END(__mtx_enter)
+#ifdef DIAGNOSTIC
 	.section .rodata
 mtx_lockingself:
 	.asciz	"mtx_enter: locking against myself"
@@ -98,6 +106,7 @@ mtx_lockingself:
 #endif
 ENTRY(__mtx_enter_try)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp, %ebp	
 1:	movl	SOFF(%ebp), %ecx
@@ -122,6 +131,7 @@ ENTRY(__mtx_enter_try)
 	movl	%edx, MTX_OLDIPL(%ecx)
 	movl	$1, %eax
 	leave
+	RETGUARD_END
 	ret
 	/* We failed to obtain the lock. splx and return zero. */
@@ -136,12 +146,16 @@ ENTRY(__mtx_enter_try)
 #endif
 	xorl	%eax, %eax
 	leave
+	RETGUARD_END
 	ret
 #ifdef DIAGNOSTIC
 4:	pushl	$mtx_lockingtry
 	call	_C_LABEL(panic)
+#endif
+END(__mtx_enter_try)
+#ifdef DIAGNOSTIC
 	.section .rodata
 mtx_lockingtry:
 	.asciz	"mtx_enter_try: locking against myself"
@@ -150,6 +164,7 @@ mtx_lockingtry:
 ENTRY(__mtx_leave)
+	RETGUARD_START
 	pushl	%ebp
 	movl	%esp, %ebp
 	movl	SOFF(%ebp), %ecx
@@ -166,12 +181,16 @@ ENTRY(__mtx_leave)
 	movl	%eax, MTX_LOCK(%ecx)
 	call	_C_LABEL(splx)
 	leave
+	RETGUARD_END
 	ret
 #ifdef DIAGNOSTIC
 1:	pushl	$mtx_leave_held
 	call	_C_LABEL(panic)
+#endif
+END(__mtx_leave)
+#ifdef DIAGNOSTIC
 	.section .rodata
 mtx_leave_held:
 	.asciz	"mtx_leave: lock not held"
Index: sys/arch/i386/i386/vmm_support.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/vmm_support.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 vmm_support.S
--- sys/arch/i386/i386/vmm_support.S	6 Jul 2017 04:32:30 -0000	1.3
+++ sys/arch/i386/i386/vmm_support.S	18 Aug 2017 02:28:21 -0000
@@ -28,19 +28,9 @@
 #define VMX_FAIL_LAUNCH_VALID_VMCS 3
 	.text
-	.global _C_LABEL(vmxon)
-	.global _C_LABEL(vmxoff)
-	.global _C_LABEL(vmclear)
-	.global _C_LABEL(vmptrld)
-	.global _C_LABEL(vmptrst)
-	.global _C_LABEL(vmwrite)
-        .global _C_LABEL(vmread)
-        .global _C_LABEL(invvpid)
-        .global _C_LABEL(invept)
-        .global _C_LABEL(vmx_enter_guest)
-	.global _C_LABEL(vmm_dispatch_intr)
-_C_LABEL(vmm_dispatch_intr):
+ENTRY(vmm_dispatch_intr)
+	RETGUARD_START
 	movl	%esp, %eax
 	andl	$0xFFFFFFF0, %esp
 	pushl	%ss
@@ -51,74 +41,101 @@ _C_LABEL(vmm_dispatch_intr):
 	movl	4(%eax), %eax
 	calll	*%eax
 	addl	$0x8, %esp
+	RETGUARD_END
 	ret
+END(vmm_dispatch_intr)
-_C_LABEL(vmxon):
+ENTRY(vmxon)
+	RETGUARD_START
 	movl	4(%esp), %eax
 	vmxon	(%eax)
 	jz	failed_on
 	jc	failed_on
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_on:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmxon)
-_C_LABEL(vmxoff):
+ENTRY(vmxoff)
+	RETGUARD_START
 	vmxoff
 	jz	failed_off
 	jc	failed_off
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_off:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmxoff)
-_C_LABEL(vmclear):
+ENTRY(vmclear)
+	RETGUARD_START
 	movl	0x04(%esp), %eax
 	vmclear	(%eax)
 	jz	failed_clear
 	jc	failed_clear
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_clear:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmclear)
-_C_LABEL(vmptrld):
+ENTRY(vmptrld)
+	RETGUARD_START
 	movl	4(%esp), %eax
 	vmptrld	(%eax)
 	jz	failed_ptrld
 	jc	failed_ptrld
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_ptrld:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmptrld)
-_C_LABEL(vmptrst):
+ENTRY(vmptrst)
+	RETGUARD_START
 	movl	0x04(%esp), %eax
 	vmptrst	(%eax)
 	jz	failed_ptrst
 	jc	failed_ptrst
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_ptrst:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmptrst)
-_C_LABEL(vmwrite):
+ENTRY(vmwrite)
+	RETGUARD_START
 	movl	0x04(%esp), %eax
 	vmwrite	0x08(%esp), %eax
 	jz	failed_write
 	jc	failed_write
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_write:
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmwrite)
-_C_LABEL(vmread):
+ENTRY(vmread)
+	RETGUARD_START
 	pushl	%ebx
 	movl	0x08(%esp), %ebx
 	movl	0x0c(%esp), %eax
@@ -127,26 +144,35 @@ _C_LABEL(vmread):
 	jc	failed_read
 	popl	%ebx
 	xorl	%eax, %eax
+	RETGUARD_END
 	ret
 failed_read:
 	popl	%ebx
 	movl	$0x01, %eax
+	RETGUARD_END
 	ret
+END(vmread)
-_C_LABEL(invvpid):
+ENTRY(invvpid)
+	RETGUARD_START
 	pushl	%ebx
 	movl	0x08(%esp), %eax
 	movl	0x0c(%esp), %ebx	
 	invvpid (%ebx), %eax
 	popl	%ebx
+	RETGUARD_END
 	ret
+END(invvpid)
-_C_LABEL(invept):
+ENTRY(invept)
 	movl	0x04(%esp), %eax
 	invept	0x08(%esp), %eax
+	RETGUARD_END
 	ret
+END(invept)
-_C_LABEL(vmx_enter_guest):
+ENTRY(vmx_enter_guest)
+	RETGUARD_START
 	pushl	%ebx
 	pushl	%ecx
 	pushl	%edx
@@ -288,3 +314,5 @@ restore_host:
 	xorl	%eax, %eax
 	ret
+	RETGUARD_END
+END(vmx_enter_guest)
Index: sys/arch/i386/include/asm.h
===================================================================
RCS file: /cvs/src/sys/arch/i386/include/asm.h,v
retrieving revision 1.15
diff -u -p -u -r1.15 asm.h
--- sys/arch/i386/include/asm.h	29 Jun 2017 17:36:16 -0000	1.15
+++ sys/arch/i386/include/asm.h	18 Aug 2017 17:58:22 -0000
@@ -61,6 +61,18 @@
 #define _C_LABEL(name)	name
 #define	_ASM_LABEL(x)	x
+#ifdef _KERNEL	/* 16 bit */
+#define RETGUARD_CFI	.cfi_escape 0x16, 0x08, 0x0b, 0x09, 0xfc, 0ax22, 0x12,\
+			0x06, 0x16, 0x0a, 0xff, 0xff, 0x1a, 0x27
+#define RETGUARD_START	RETGUARD_CFI; xor %sp,(%esp)
+#define RETGUARD_END	xor %sp,(%esp)
+#else		/* 32 bit */
+#define RETGUARD_CFI	.cfi_escape 0x16, 0x08, 0x06, 0x09, 0xfc, 0x22, 0x12,\
+			0x06, 0x27
+#define RETGUARD_START	RETGUARD_CFI; xorl %esp,(%esp)
+#define RETGUARD_END	xorl %esp,(%esp)
+#endif
+
 #define CVAROFF(x, y)	_C_LABEL(x) + y
 #ifdef __STDC__
@@ -103,11 +115,13 @@
 # define _PROF_PROLOGUE
 #endif
-#define	ENTRY(y)	_ENTRY(_C_LABEL(y)); _PROF_PROLOGUE
-#define	NENTRY(y)	_ENTRY(_C_LABEL(y))
+#define	ENTRY(y)	_ENTRY(_C_LABEL(y)); _PROF_PROLOGUE; .cfi_startproc
+#define	NENTRY(y)	_ENTRY(_C_LABEL(y)); .cfi_startproc
 #define	ASENTRY(y)	_ENTRY(_ASM_LABEL(y)); _PROF_PROLOGUE
 #define	NASENTRY(y)	_ENTRY(_ASM_LABEL(y))
-#define	END(y)		.size y, . - y
+#define	_ASM_SIZE(y)	.size y, . - y
+#define	END(y)		.cfi_endproc; _ASM_SIZE(y)
+#define	NEND(y)		.cfi_endproc
 #define	ALTENTRY(name)	.globl _C_LABEL(name); _C_LABEL(name):
Index: sys/arch/i386/include/cdefs.h
===================================================================
RCS file: /cvs/src/sys/arch/i386/include/cdefs.h,v
retrieving revision 1.10
diff -u -p -u -r1.10 cdefs.h
--- sys/arch/i386/include/cdefs.h	28 Mar 2013 17:30:45 -0000	1.10
+++ sys/arch/i386/include/cdefs.h	18 Aug 2017 02:39:34 -0000
@@ -18,4 +18,23 @@
 	__asm__(".section .gnu.warning." __STRING(sym)	\
 	    " ; .ascii \"" msg "\" ; .text")
+/*
+ * Fix __builtin_return_address() when compile with -fxor-ret-protector.
+ */
+#if defined(_RET_PROTECTOR)
+# if defined(_KERNEL)
+#  define __builtin_return_address(d) ((void *) \
+        ((size_t)__builtin_return_address(d) ^ \
+         (unsigned short)__builtin_frame_address(d) + sizeof(void *)))
+# else
+#  define __builtin_return_address(d) ((void *) \
+        ((size_t)__builtin_return_address(d) ^ \
+         (size_t)__builtin_frame_address(d) + sizeof(void *)))
+# endif
+#endif
+
+#if defined(_KERNEL)
+#define	_KERNEL_XORRET unsigned short
+#endif
+
 #endif /* !_MACHINE_CDEFS_H_ */
Index: sys/arch/i386/stand/cdboot/srt0.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/cdboot/srt0.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 srt0.S
--- sys/arch/i386/stand/cdboot/srt0.S	31 Oct 2012 14:31:30 -0000	1.3
+++ sys/arch/i386/stand/cdboot/srt0.S	19 Aug 2017 02:33:33 -0000
@@ -204,6 +204,7 @@ ENTRY(debugchar)
 	movb	%al, (%ebx)
 	popl	%ebx
 	ret
+END(debugchar)
 	.code16
Index: sys/arch/i386/stand/libsa/debug_i386.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/libsa/debug_i386.S,v
retrieving revision 1.12
diff -u -p -u -r1.12 debug_i386.S
--- sys/arch/i386/stand/libsa/debug_i386.S	9 Mar 2004 19:12:12 -0000	1.12
+++ sys/arch/i386/stand/libsa/debug_i386.S	19 Aug 2017 02:33:33 -0000
@@ -122,3 +122,4 @@ ENTRY(check_regs)
 	movl	$0x47374736, (%edi)
 #endif
 	ret
+END(check_regs)
Index: sys/arch/i386/stand/libsa/gidt.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/libsa/gidt.S,v
retrieving revision 1.36
diff -u -p -u -r1.36 gidt.S
--- sys/arch/i386/stand/libsa/gidt.S	31 Oct 2012 13:55:58 -0000	1.36
+++ sys/arch/i386/stand/libsa/gidt.S	19 Aug 2017 02:33:33 -0000
@@ -161,6 +161,7 @@ ENTRY(_rtt)
 	/* Again... */
 	movl	$0, %esp	/* segment violation */
 	ret
+END(_rtt)
 #define IPROC(n)	X##n
 #define IEMU(n)		IPROC(emu##n)
@@ -465,5 +466,6 @@ ENTRY(bootbuf)
 	/* Jump to buffer */
 	ljmp	$0x0, $0x7c00
+END(bootbuf)
 	.end
Index: sys/arch/i386/stand/libsa/pxe_call.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/libsa/pxe_call.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 pxe_call.S
--- sys/arch/i386/stand/libsa/pxe_call.S	2 Jan 2006 00:26:29 -0000	1.4
+++ sys/arch/i386/stand/libsa/pxe_call.S	19 Aug 2017 02:33:33 -0000
@@ -82,6 +82,7 @@ _C_LABEL(bangpxe_seg) = . - 2
 	popl	%ebx
 	popl	%ebp
 	ret
+END(pxecall_bangpxe)
 ENTRY(pxecall_pxenv)
 	.code32
@@ -125,6 +126,7 @@ _C_LABEL(pxenv_seg) = . - 2
 	popl	%ebx
 	popl	%ebp
 	ret
+END(pxecall_pxenv)
 /*
  * prot_to_real()
Index: sys/arch/i386/stand/libsa/random_i386.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/libsa/random_i386.S,v
retrieving revision 1.10
diff -u -p -u -r1.10 random_i386.S
--- sys/arch/i386/stand/libsa/random_i386.S	12 Feb 2016 21:36:33 -0000	1.10
+++ sys/arch/i386/stand/libsa/random_i386.S	19 Aug 2017 02:33:33 -0000
@@ -104,3 +104,4 @@ usetsc:
 done:
 	popal
 	ret
+END(mdrandom)
Index: sys/arch/i386/stand/pxeboot/srt0.S
===================================================================
RCS file: /cvs/src/sys/arch/i386/stand/pxeboot/srt0.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 srt0.S
--- sys/arch/i386/stand/pxeboot/srt0.S	31 Oct 2012 14:31:30 -0000	1.3
+++ sys/arch/i386/stand/pxeboot/srt0.S	19 Aug 2017 02:33:33 -0000
@@ -199,6 +199,7 @@ ENTRY(debugchar)
 	movb	%al, (%ebx)
 	popl	%ebx
 	ret
+END(debugchar)
 	.code16
Index: sys/lib/libkern/arch/amd64/bcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/bcmp.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 bcmp.S
--- sys/lib/libkern/arch/amd64/bcmp.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/amd64/bcmp.S	18 Aug 2017 02:28:21 -0000
@@ -1,6 +1,7 @@
 #include <machine/asm.h>
 ENTRY(bcmp)
+	RETGUARD_START
 	xorl	%eax,%eax		/* clear return value */
 	movq	%rdx,%rcx		/* compare by words */
@@ -16,4 +17,6 @@ ENTRY(bcmp)
 	je	L2
 L1:	incl	%eax
-L2:	ret
+L2:	RETGUARD_END
+	ret
+END(bcmp)
Index: sys/lib/libkern/arch/amd64/bzero.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/bzero.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 bzero.S
--- sys/lib/libkern/arch/amd64/bzero.S	29 Nov 2014 18:51:23 -0000	1.4
+++ sys/lib/libkern/arch/amd64/bzero.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(bzero)
+	RETGUARD_START
 	movq	%rsi,%rdx
 	xorq	%rax,%rax		/* set fill data to 0 */
@@ -36,4 +37,6 @@ L1:	movq	%rdx,%rcx		/* zero remainder by
 	rep
 	stosb
+	RETGUARD_END
 	ret
+END(bzero)
Index: sys/lib/libkern/arch/amd64/ffs.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/ffs.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 ffs.S
--- sys/lib/libkern/arch/amd64/ffs.S	24 Nov 2007 19:28:25 -0000	1.2
+++ sys/lib/libkern/arch/amd64/ffs.S	18 Aug 2017 02:28:21 -0000
@@ -7,11 +7,15 @@
 #include <machine/asm.h>
 ENTRY(ffs)
+	RETGUARD_START
 	bsfl	%edi,%eax
 	jz	L1	 		/* ZF is set if all bits are 0 */
 	incl	%eax			/* bits numbered from 1, not 0 */
+	RETGUARD_END
 	ret
 	_ALIGN_TEXT
 L1:	xorl	%eax,%eax		/* clear result */
+	RETGUARD_END
 	ret
+END(ffs)
Index: sys/lib/libkern/arch/amd64/htonl.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/htonl.S,v
retrieving revision 1.1
diff -u -p -u -r1.1 htonl.S
--- sys/lib/libkern/arch/amd64/htonl.S	25 Nov 2007 18:25:34 -0000	1.1
+++ sys/lib/libkern/arch/amd64/htonl.S	18 Aug 2017 02:28:21 -0000
@@ -44,6 +44,10 @@ _ENTRY(_C_LABEL(htonl))
 _ENTRY(_C_LABEL(ntohl))
 _ENTRY(_C_LABEL(bswap32))
 _PROF_PROLOGUE
+	.cfi_startproc
+	RETGUARD_START
 	movl	%edi,%eax
 	bswap	%eax
+	RETGUARD_END
 	ret
+END(_C_LABEL(htonl))
Index: sys/lib/libkern/arch/amd64/htons.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/htons.S,v
retrieving revision 1.1
diff -u -p -u -r1.1 htons.S
--- sys/lib/libkern/arch/amd64/htons.S	25 Nov 2007 18:25:34 -0000	1.1
+++ sys/lib/libkern/arch/amd64/htons.S	18 Aug 2017 02:28:21 -0000
@@ -44,6 +44,10 @@ _ENTRY(_C_LABEL(htons))
 _ENTRY(_C_LABEL(ntohs))
 _ENTRY(_C_LABEL(bswap16))
 _PROF_PROLOGUE
+	.cfi_startproc
+	RETGUARD_START
 	movl	%edi,%eax
 	xchgb	%ah,%al
+	RETGUARD_END
 	ret
+END(_C_LABEL(htons))
Index: sys/lib/libkern/arch/amd64/memchr.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/memchr.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 memchr.S
--- sys/lib/libkern/arch/amd64/memchr.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/amd64/memchr.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(memchr)
+	RETGUARD_START
 	movb	%sil,%al		/* set character to search for */
 	movq	%rdx,%rcx		/* set length of search */
 	testq	%rcx,%rcx		/* test for len == 0 */
@@ -15,6 +16,9 @@ ENTRY(memchr)
 	scasb
 	jne	L1			/* scan failed, return null */
 	leaq	-1(%rdi),%rax		/* adjust result of scan */
+	RETGUARD_END
 	ret
 L1:	xorq	%rax,%rax
+	RETGUARD_END
 	ret
+END(memchr)
Index: sys/lib/libkern/arch/amd64/memcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/memcmp.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 memcmp.S
--- sys/lib/libkern/arch/amd64/memcmp.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/amd64/memcmp.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(memcmp)
+	RETGUARD_START
 	movq	%rdx,%rcx		/* compare by longs */
 	shrq	$3,%rcx
 	repe
@@ -20,6 +21,7 @@ ENTRY(memcmp)
 	jne	L6			/* do we match? */
 	xorl	%eax,%eax		/* we match, return zero	*/
+	RETGUARD_END
 	ret
 L5:	movl	$8,%ecx			/* We know that one of the next	*/
@@ -32,4 +34,7 @@ L6:	xorl	%eax,%eax		/* Perform unsigned 
 	xorl	%edx,%edx
 	movb	-1(%rsi),%dl
 	subl    %edx,%eax
+	RETGUARD_END
 	ret
+END(memcmp)
+
Index: sys/lib/libkern/arch/amd64/memmove.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/memmove.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 memmove.S
--- sys/lib/libkern/arch/amd64/memmove.S	29 Nov 2014 18:51:23 -0000	1.5
+++ sys/lib/libkern/arch/amd64/memmove.S	18 Aug 2017 02:28:21 -0000
@@ -41,10 +41,14 @@
 	 */
 ENTRY(bcopy)
+	RETGUARD_START
 	xchgq	%rdi,%rsi
-	/* fall into memmove */
+	jmp	9f	/* go do memmove */
+END(bcopy)
 ENTRY(memmove)
+	RETGUARD_START
+9:
 	movq	%rdi,%r11	/* save dest */
 	movq	%rdx,%rcx
 	movq	%rdi,%rax
@@ -52,8 +56,10 @@ ENTRY(memmove)
 	cmpq	%rcx,%rax	/* overlapping? */
 	jb	1f
 	jmp	2f		/* nope */
+END(memmove)
 ENTRY(memcpy)
+	RETGUARD_START
 	movq	%rdi,%r11	/* save dest */
 	movq	%rdx,%rcx
 2:
@@ -65,6 +71,7 @@ ENTRY(memcpy)
 	rep
 	movsb
 	movq	%r11,%rax
+	RETGUARD_END
 	ret
 1:
 	addq	%rcx,%rdi	/* copy backwards. */
@@ -83,4 +90,6 @@ ENTRY(memcpy)
 	movsq
 	movq	%r11,%rax
 	cld
+	RETGUARD_END
 	ret
+END(memcpy)
Index: sys/lib/libkern/arch/amd64/memset.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/memset.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 memset.S
--- sys/lib/libkern/arch/amd64/memset.S	29 Nov 2014 18:51:23 -0000	1.5
+++ sys/lib/libkern/arch/amd64/memset.S	18 Aug 2017 02:28:21 -0000
@@ -7,6 +7,7 @@
 #include <machine/asm.h>
 ENTRY(memset)
+	RETGUARD_START
 	movq	%rsi,%rax
 	andq	$0xff,%rax
 	movq	%rdx,%rcx
@@ -50,4 +51,6 @@ L1:	rep
 	stosb
 	movq	%r11,%rax
+	RETGUARD_END
 	ret
+END(memset)
Index: sys/lib/libkern/arch/amd64/scanc.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/scanc.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 scanc.S
--- sys/lib/libkern/arch/amd64/scanc.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/amd64/scanc.S	18 Aug 2017 02:28:21 -0000
@@ -36,6 +36,7 @@
 #include <machine/asm.h>
 ENTRY(scanc)
+	RETGUARD_START
 	movq	%rdx,%r11
 	movb	%cl,%dl
 	movl	%edi,%ecx
@@ -51,4 +52,6 @@ ENTRY(scanc)
 	jnz	1b
 2:
 	movl	%ecx,%eax
+	RETGUARD_END
 	ret
+END(scanc)
Index: sys/lib/libkern/arch/amd64/skpc.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/skpc.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 skpc.S
--- sys/lib/libkern/arch/amd64/skpc.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/amd64/skpc.S	18 Aug 2017 02:28:21 -0000
@@ -36,6 +36,7 @@
 #include <machine/asm.h>
 ENTRY(skpc)
+	RETGUARD_START
 	movl	%edi,%eax
 	movq	%rsi,%rcx
 	movq	%rdx,%rdi
@@ -45,4 +46,6 @@ ENTRY(skpc)
 	incq	%rcx
 1:
 	movl	%ecx,%eax
+	RETGUARD_END
 	ret
+END(skpc)
Index: sys/lib/libkern/arch/amd64/strchr.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/strchr.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 strchr.S
--- sys/lib/libkern/arch/amd64/strchr.S	9 Dec 2014 15:13:57 -0000	1.3
+++ sys/lib/libkern/arch/amd64/strchr.S	18 Aug 2017 02:28:21 -0000
@@ -44,6 +44,7 @@ STRONG_ALIAS(index, strchr)
  */
 ENTRY(strchr)
+	RETGUARD_START
 	movabsq	$0x0101010101010101,%r8
 	movzbq	%sil,%rdx	/* value to search for (c) */
@@ -85,6 +86,7 @@ ENTRY(strchr)
 	bsf	%r11,%r11	/* 7, 15, 23 ... 63 */
 8:	shr	$3,%r11		/* 0, 1, 2 .. 7 */
 	lea	-8(%r11,%rdi),%rax
+	RETGUARD_END
 	ret
 /* End of string, check whether char is before NUL */
@@ -97,6 +99,7 @@ ENTRY(strchr)
 	cmp	%r11,%rax
 	jae	8b		/* return 'found' if same - searching for NUL */
 11:	xor	%eax,%eax	/* char not found */
+	RETGUARD_END
 	ret
 /* Source misaligned: read aligned word and make low bytes invalid */
@@ -123,3 +126,4 @@ ENTRY(strchr)
 	sar	%cl,%r10	/* top bytes 0xff */
 	and	%r10,%rax	/* clear lsb from unwanted low bytes */
 	jmp	21b
+END(strchr)
Index: sys/lib/libkern/arch/amd64/strcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/strcmp.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 strcmp.S
--- sys/lib/libkern/arch/amd64/strcmp.S	9 Dec 2014 15:13:57 -0000	1.3
+++ sys/lib/libkern/arch/amd64/strcmp.S	18 Aug 2017 02:28:21 -0000
@@ -9,6 +9,7 @@
 #include <machine/asm.h>
 ENTRY(strcmp)
+	RETGUARD_START
 	/*
 	 * Align s1 to word boundary.
 	 * Consider unrolling loop?
@@ -68,4 +69,6 @@ ENTRY(strcmp)
 	movzbq	%al,%rax
 	movzbq	%dl,%rdx
 	subq	%rdx,%rax
+	RETGUARD_END
 	ret
+END(strcmp)
Index: sys/lib/libkern/arch/amd64/strlen.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/strlen.S,v
retrieving revision 1.5
diff -u -p -u -r1.5 strlen.S
--- sys/lib/libkern/arch/amd64/strlen.S	20 Mar 2016 16:50:30 -0000	1.5
+++ sys/lib/libkern/arch/amd64/strlen.S	18 Aug 2017 02:28:21 -0000
@@ -112,6 +112,7 @@
  */
 ENTRY(strlen)
+	RETGUARD_START
 	movabsq	$0x0101010101010101,%r8
 	test	$7,%dil
@@ -139,6 +140,7 @@ ENTRY(strlen)
 	bsf	%rdx,%rdx		/* 7, 15, 23 ... 63 */
 	shr	$3,%rdx			/* 0, 1, 2 ... 7 */
 	lea	-8(%rax,%rdx),%rax
+	RETGUARD_END
 	ret
 /* Misaligned, read aligned word and make low bytes non-zero */
@@ -154,3 +156,4 @@ ENTRY(strlen)
 	dec	%rsi
 	or	%rsi,%rdx		/* low bytes now non-zero */
 	jmp	2b
+END(strlen)
Index: sys/lib/libkern/arch/amd64/strrchr.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/amd64/strrchr.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 strrchr.S
--- sys/lib/libkern/arch/amd64/strrchr.S	9 Dec 2014 15:13:57 -0000	1.3
+++ sys/lib/libkern/arch/amd64/strrchr.S	18 Aug 2017 02:28:21 -0000
@@ -11,6 +11,7 @@
 STRONG_ALIAS(rindex, strrchr)
 ENTRY(strrchr)
+	RETGUARD_START
 	movzbq	%sil,%rcx
 	/* zero return value */
@@ -120,4 +121,6 @@ ENTRY(strrchr)
 	jne	.Lloop
 .Ldone:
+	RETGUARD_END
 	ret
+END(strrchr)
Index: sys/lib/libkern/arch/i386/bcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/bcmp.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 bcmp.S
--- sys/lib/libkern/arch/i386/bcmp.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/i386/bcmp.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include <machine/asm.h>
 ENTRY(bcmp)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%esi
 	movl	12(%esp),%edi
@@ -29,4 +30,6 @@ ENTRY(bcmp)
 L1:	incl	%eax
 L2:	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
+END(bcmp)
Index: sys/lib/libkern/arch/i386/ffs.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/ffs.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 ffs.S
--- sys/lib/libkern/arch/i386/ffs.S	27 Sep 1996 06:47:45 -0000	1.2
+++ sys/lib/libkern/arch/i386/ffs.S	18 Aug 2017 02:28:21 -0000
@@ -8,11 +8,15 @@
 #include <machine/asm.h>
 ENTRY(ffs)
+	RETGUARD_START
 	bsfl	4(%esp),%eax
 	jz	L1	 		/* ZF is set if all bits are 0 */
 	incl	%eax			/* bits numbered from 1, not 0 */
+	RETGUARD_END
 	ret
 	.align 2
 L1:	xorl	%eax,%eax		/* clear result */
+	RETGUARD_END
 	ret
+END(ffs)
Index: sys/lib/libkern/arch/i386/htonl.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/htonl.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 htonl.S
--- sys/lib/libkern/arch/i386/htonl.S	25 Nov 2007 18:25:35 -0000	1.4
+++ sys/lib/libkern/arch/i386/htonl.S	18 Aug 2017 02:28:21 -0000
@@ -38,10 +38,31 @@
 /* netorder = htonl(hostorder) AND hostorder = ntohl(netorder) */
 ENTRY(ntohl)
+	RETGUARD_START
+	movl	4(%esp),%eax
+	rorw	$8,%ax
+	roll	$16,%eax
+	rorw	$8,%ax
+	RETGUARD_END
+	ret
+END(ntohl)
+
 ENTRY(htonl)
+	RETGUARD_START
+	movl	4(%esp),%eax
+	rorw	$8,%ax
+	roll	$16,%eax
+	rorw	$8,%ax
+	RETGUARD_END
+	ret
+END(htonl)
+
 ENTRY(swap32)
+	RETGUARD_START
 	movl	4(%esp),%eax
 	rorw	$8,%ax
 	roll	$16,%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(swap32)
Index: sys/lib/libkern/arch/i386/htons.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/htons.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 htons.S
--- sys/lib/libkern/arch/i386/htons.S	25 Nov 2007 18:25:35 -0000	1.4
+++ sys/lib/libkern/arch/i386/htons.S	18 Aug 2017 02:28:21 -0000
@@ -38,8 +38,25 @@
 /* netorder = htons(hostorder) AND hostorder = ntohs(netorder) */
 ENTRY(htons)
+	RETGUARD_START
+	movzwl	4(%esp),%eax
+	rorw	$8,%ax
+	RETGUARD_END
+	ret
+END(htons)
+
 ENTRY(ntohs)
+	RETGUARD_START
+	movzwl	4(%esp),%eax
+	rorw	$8,%ax
+	RETGUARD_END
+	ret
+END(ntohs)
+
 ENTRY(swap16)
+	RETGUARD_START
 	movzwl	4(%esp),%eax
 	rorw	$8,%ax
+	RETGUARD_END
 	ret
+END(swap16)
Index: sys/lib/libkern/arch/i386/memchr.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/memchr.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 memchr.S
--- sys/lib/libkern/arch/i386/memchr.S	29 Nov 2014 18:51:23 -0000	1.2
+++ sys/lib/libkern/arch/i386/memchr.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include <machine/asm.h>
 ENTRY(memchr)
+	RETGUARD_START
 	pushl	%edi
 	movl	8(%esp),%edi		/* string address */
 	movl	12(%esp),%eax		/* set character to search for */
@@ -19,8 +20,11 @@ ENTRY(memchr)
 	jne	L1			/* scan failed, return null */
 	leal	-1(%edi),%eax		/* adjust result of scan */
 	popl	%edi
+	RETGUARD_END
 	ret
 	.align 2,0x90
 L1:	xorl	%eax,%eax
 	popl	%edi
+	RETGUARD_END
 	ret
+END(memchr)
Index: sys/lib/libkern/arch/i386/memcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/memcmp.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 memcmp.S
--- sys/lib/libkern/arch/i386/memcmp.S	29 Nov 2014 18:51:23 -0000	1.2
+++ sys/lib/libkern/arch/i386/memcmp.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include <machine/asm.h>
 ENTRY(memcmp)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%esi
 	movl	12(%esp),%edi
@@ -28,6 +29,7 @@ ENTRY(memcmp)
 	xorl	%eax,%eax		/* we match, return zero	*/
 	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
 L5:	movl	$4,%ecx			/* We know that one of the next	*/
@@ -40,4 +42,6 @@ L6:	movzbl  -1(%edi),%eax		/* Perform un
 	subl	%edx,%eax
 	popl	%esi
 	popl	%edi
+	RETGUARD_END
 	ret
+END(memcmp)
Index: sys/lib/libkern/arch/i386/memmove.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/memmove.S,v
retrieving revision 1.7
diff -u -p -u -r1.7 memmove.S
--- sys/lib/libkern/arch/i386/memmove.S	29 Nov 2014 18:51:23 -0000	1.7
+++ sys/lib/libkern/arch/i386/memmove.S	18 Aug 2017 02:28:21 -0000
@@ -41,17 +41,20 @@
  * into memmove(), which handles overlapping regions.
  */
 ENTRY(bcopy)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%esi
 	movl	16(%esp),%edi
 	jmp	docopy
+END(bcopy)
 /*
  * memmove(caddr_t dst, caddr_t src, size_t len);
  * Copy len bytes, coping with overlapping space.
  */
 ENTRY(memmove)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%edi
@@ -63,10 +66,13 @@ docopy:
 	cmpl	%ecx,%eax		# overlapping?
 	jb	1f
 	jmp	docopyf			# nope
+END(memmove)
+
 /*
  * memcpy() doesn't worry about overlap and always copies forward
  */
 ENTRY(memcpy)
+	RETGUARD_START
 	pushl	%esi
 	pushl	%edi
 	movl	12(%esp),%edi
@@ -83,6 +89,7 @@ docopyf:
 	movsb
 	popl	%edi
 	popl	%esi
+	RETGUARD_END
 	ret
 	_ALIGN_TEXT
@@ -104,5 +111,6 @@ docopyf:
 	popl	%edi
 	popl	%esi
 	cld
+	RETGUARD_END
 	ret
-
+END(memcpy)
Index: sys/lib/libkern/arch/i386/memset.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/memset.S,v
retrieving revision 1.4
diff -u -p -u -r1.4 memset.S
--- sys/lib/libkern/arch/i386/memset.S	29 Nov 2014 18:51:23 -0000	1.4
+++ sys/lib/libkern/arch/i386/memset.S	18 Aug 2017 02:28:21 -0000
@@ -8,6 +8,7 @@
 #include <machine/asm.h>
 ENTRY(memset)
+	RETGUARD_START
 	pushl	%edi
 	pushl	%ebx
 	movl	12(%esp),%edi
@@ -51,4 +52,6 @@ L1:	rep
 	popl	%eax			/* pop address of buffer */
 	popl	%ebx
 	popl	%edi
+	RETGUARD_START
 	ret
+END(memset)
Index: sys/lib/libkern/arch/i386/scanc.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/scanc.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 scanc.S
--- sys/lib/libkern/arch/i386/scanc.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/i386/scanc.S	18 Aug 2017 02:28:21 -0000
@@ -33,6 +33,7 @@
 #include "DEFS.h"
 ENTRY(scanc)
+	RETGUARD_START
 	movl	4(%esp),%ecx
 	testl	%ecx,%ecx
 	jz	3f
@@ -53,4 +54,6 @@ ENTRY(scanc)
 	popl	%esi
 3:
 	movl	%ecx,%eax
+	RETGUARD_END
 	ret
+END(scanc)
Index: sys/lib/libkern/arch/i386/skpc.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/skpc.S,v
retrieving revision 1.3
diff -u -p -u -r1.3 skpc.S
--- sys/lib/libkern/arch/i386/skpc.S	29 Nov 2014 18:51:23 -0000	1.3
+++ sys/lib/libkern/arch/i386/skpc.S	18 Aug 2017 02:28:21 -0000
@@ -33,6 +33,7 @@
 #include "DEFS.h"
 ENTRY(skpc)
+	RETGUARD_START
 	pushl	%edi
 	movl	16(%esp),%edi
 	movl	12(%esp),%ecx
@@ -44,4 +45,6 @@ ENTRY(skpc)
 1:
 	movl	%ecx,%eax
 	popl	%edi
+	RETGUARD_END
 	ret
+END(skpc)
Index: sys/lib/libkern/arch/i386/strcmp.S
===================================================================
RCS file: /cvs/src/sys/lib/libkern/arch/i386/strcmp.S,v
retrieving revision 1.2
diff -u -p -u -r1.2 strcmp.S
--- sys/lib/libkern/arch/i386/strcmp.S	27 Sep 1996 06:47:49 -0000	1.2
+++ sys/lib/libkern/arch/i386/strcmp.S	18 Aug 2017 02:28:21 -0000
@@ -14,6 +14,7 @@
  */
 ENTRY(strcmp)
+	RETGUARD_START
 	movl	0x04(%esp),%eax
 	movl	0x08(%esp),%edx
 	jmp	L2			/* Jump into the loop! */
@@ -79,4 +80,6 @@ L2:	movb	(%eax),%cl
 L3:	movzbl	(%eax),%eax		/* unsigned comparison */
 	movzbl	(%edx),%edx
 	subl	%edx,%eax
+	RETGUARD_END
 	ret
+END(strcmp)[prev in list] [next in list] [prev in thread] [next in thread]

Configure | About |News |Add a list | Sponsored by KoreLogic

↧

What's Overleaf? Five years on from flunking our YC interview

August 19, 2017, 6:11 pm

≫ Next: Why PS4 downloads are so slow

≪ Previous: Retguard: OpenBSD/Clang

About five years ago John Lees-Miller and I had a trip to forget to San Francisco. We had made it through to the interview stage at YC with our collaborative writing platform, called WriteLaTeX.

TL;DR – Things didn’t go well in the interview, but we pressed on regardless and today have a successful, investor-backed business that now serves over two million users worldwide.

I promised to write Matthew Partridge of Errant Science a blog post, and it ended up being about the inception of WriteLaTeX (now Overleaf), with some hopefully useful advice for new founders just starting out. Here it is if you feel like reading more: https://clutter.errantscience.com/2017/07/31/reflecting-on-the-founding-growth-and-maturing-of-overleaf/

If you have similar / different stories of how you got started, feel free to post them in the comments here or on the blog post linked above. If you’re in London on the 25th September and are working in the science / research / publishing space, you should definitely come to our next #FuturePub event that evening.

We still have some speaking slots available if you fancy giving a lightning talk — just let me know 🙂

↧

Why PS4 downloads are so slow

August 19, 2017, 9:36 am

≫ Next: Laverna: An open source Evernote alternative

≪ Previous: What's Overleaf? Five years on from flunking our YC interview

Game downloads on PS4 have a reputation of being very slow, with many people reporting downloads being an order of magnitude faster on Steam or Xbox. This had long been on my list of things to look into, but at a pretty low priority. After all, the PS4 operating system is based on a reasonably modern FreeBSD (9.0), so there should not be any crippling issues in the TCP stack. The implication is that the problem is something boring, like an inadequately dimensioned CDN.

But then I heard that people were successfully using local HTTP proxies as a workaround. It should be pretty rare for that to actually help with download speeds, which made this sound like a much more interesting problem.

This is going to be a long-winded technical post. If you're not interested in the details of the investigation but just want a recommendation on speeding up PS4 downloads, skip straight to theconclusions.

Background

Before running any experiments, it's good to have a mental model of how the thing we're testing works, and where the problems might be. If nothing else, it will guide the initial experiment design.

The speed of a steady-state TCP connection is basically defined by three numbers. The amount of data the client is will to receive on a single round-trip (TCP receive window), the amount of data the server is willing to send on a single round-trip (TCP congestion window), and the round trip latency between the client and the server (RTT). To a first approximation, the connection speed will be:

    speed = min(rwin, cwin) / RTT

With this model, how could a proxy speed up the connection? Well, with a proxy the original connection will be split into two mostly independent parts; one connection between the client and the proxy, and another between the proxy and the server. The speed of the end-to-end connection will be determined by the slower of those two independent connections:

    speed_proxy_client = min(client rwin, proxy cwin) / client-proxy RTT
    speed_server_proxy = min(proxy rwin, server cwin) / proxy-server RTT
    speed = min(speed_proxy_client, speed_server_proxy)

With a local proxy the client-proxy RTT will be very low; that connection is almost guaranteed to be the faster one. The improvement will have to be from the server-proxy connection being somehow better than the direct client-server one. The RTT will not change, so there are just two options: either the client has a much smaller receive window than the proxy, or the client is somehow causing the server's congestion window to decrease. (E.g. the client is randomly dropping received packets, while the proxy isn't).

Out of these two theories, the receive window one should be much more likely, so we should concentrate on it first. But that just replaces our original question with a new one: why would the client's receive window be so low that it becomes a noticeable bottleneck? There's a fairly limited number of causes for low receive windows that I've seen in the wild, and they don't really seem to fit here.

Maybe the client doesn't support the TCP window scaling option, while the proxy does. Without window scaling, the receive window will be limited to 64kB. But since we know Sony started with a TCP stack that supports window scaling, they would have had to go out of their way to disable it. Slow downloads, for no benefit.
Maybe the actual downloader application is very slow. The operating system is supposed to have a certain amount of buffer space available for each connection. If the network is delivering data to the OS faster than the application is reading it, the buffer will start to fill up, and the OS will reduce the receive window as a form of back-pressure. But this can't be the reason; if the application is the bottleneck, it'll be a bottleneck with or without the proxy.
The operating system is trying to dynamically scale the receive window to match the actual network conditions, but something is going wrong. This would be interesting, so it's what we're hoping to find.

The initial theories are in place, let's get digging.

Experiment #1

For our first experiment, we'll start a PSN download on a baseline non-Slim PS4, firmware 4.73. The network connection of the PS4 is bridged through a Linux machine, where we can add latency to the network using tc netem. By varying the added latency, we should be able to find out two things: whether the receive window really is the bottleneck, and whether the receive window is being automatically scaled by the operating system.

This is what the client-server RTTs (measured from a packet capture using TCP timestamps) look like for the experimental period. Each dot represents 10 seconds of time for a single connection, with the Y axis showing the minimum RTT seen for that connection in those 10 seconds.

The next graph shows the amount of data sent by the server in one round trip in red, and the receive windows advertised by the client in blue.

First, since the blue dots are staying constantly at about 128kB, the operating system doesn't appear to be doing any kind of receive window scaling based on the RTT. (So much for that theory). Though at the very right end of the graph the receive window shoots out to 650kB, so it isn't totally fixed either.

Second, is the receive window the bottleneck here? If so, the blue dots would be close to the red dots. This is the case until about 10:50. And then mysteriously the bottleneck moves to the server.

So we didn't find quite what we were looking for, but there are a couple of very interesting things that are correlated with events on the PS4.

The download was in the foreground for the whole duration of the test. But that doesn't mean it was the only thing running on the machine. The Netflix app was still running in the background, completely idle [1]. When the background app was closed at 11:00, the receive window increased dramatically. This suggests a second experiment, where different applications are opened / closed / left running in the background.

The time where the receive window stops being the bottleneck is very close to the PS4 entering rest mode. That looks like another thing worth investigating. Unfortunately, that's not true, and rest mode is a red herring here. [2]

Experiment #2

Below is a graph of the receive windows for a second download, annotated with the timing of various noteworthy events.

The differences in receive windows at different times are striking. And more important, the changes in the receive windows correspond very well to specific things I did on the PS4.

When the download was started, the game Styx: Shards of Darkness was running in the background (just idling in the title screen). The download was limited by a receive window of under 7kB. This is an incredibly low value; it's basically going to cause the downloads to take 100 times longer than they should. And this was not a coincidence, whenever that game was running, the receive window would be that low.
Having an app running (e.g. Netflix, Spotify) limited the receive window to 128kB, for about a 5x reduction in potential download speed.
Moving apps, games, or the download window to the foreground or background didn't have any effect on the receive window.
Launching some other games (Horizon: Zero Dawn, Uncharted 4, Dreadnought) seemed to have the same effect as running an app.
Playing an online match in a networked game (Dreadnought) caused the receive window to be artificially limited to 7kB.
Playing around in a non-networked game (Horizon: Zero Dawn) had a very inconsistent effect on the receive window, with the effect seemingly depending on the intensity of gameplay. This looks like a genuine resource restriction (download process getting variable amounts of CPU), rather than an artificial limit.
I ran a speedtest at a time when downloads were limited to 7kB receive window. It got a decent receive window of over 400kB; the conclusion is that the artificial receive window limit appears to only apply to PSN downloads.
Putting the PS4 into rest mode had no effect.
Built-in features of the PS4 UI, like the web browser, do not count as apps.
When a game was started (causing the previously running game to be stopped automatically), the receive window could increase to 650kB for a very brief period of time. Basically it appears that the receive window gets unclamped when the old game stops, and then clamped again a few seconds later when the new game actually starts up.

I did a few more test runs, and all of them seemed to support the above findings. The only additional information from that testing is that the rest mode behavior was dependent on the PS4 settings. Originally I had it set up to suspend apps when in rest mode. If that setting was disabled, the apps would be closed when entering in rest mode, and the downloads would proceed at full speed.

A 7kB receive window will be absolutely crippling for any user. A 128kB window might be ok for users who have CDN servers very close by, or who don't have a particularly fast internet. For example at my location, a 128kB receive window would cap the downloads at about 35Mbp to 75Mbps depending on which CDN the DNS RNG happens to give me. The lowest two speed tiers for my ISP are 50Mbps and 200Mbps. So either the 128kB would not be a noticeable problem (50Mbps) or it'd mean that downloads are artificially limited to to 25% speed (200Mbps).

Conclusions

If any applications are running, the PS4 appears to change the settings for PSN store downloads, artificially restricting their speed. Closing the other applications will remove the limit. There are a few important details:

Just leaving the other applications running in the background willnot help. The exact same limit is applied whether the download progress bar is in the foreground or not.
Putting the PS4 into rest mode might or might not help, depending on your system settings.
The artificial limit applies only to the PSN store downloads. It does not affect e.g. the built-in speedtest. This is why the speedtest might report much higher speeds than the actual downloads, even though both are delivered from the same CDN servers.
Not all applications are equal; most of them will cause the connections to slow down by up to a factor of 5. Some games will cause a difference of about a factor of 100. Some games will start off with the factor of 5, and then migrate to the factor of 100 once you leave the start menu and start playing.
The above limits are artificial. In addition to that, actively playing a game can cause game downloads to slow down. This appears to be due to a genuine lack of CPU resources (with the game understandably having top priority).

So if you're seeing slow downloads, just closing all the running applications might be worth a shot. (But it's obviously not guaranteed to help. There are other causes for slow downloads as well, this will just remove one potential bottleneck). To close the running applications, you'll need to long-press the PS button on the controller, and then select "Close applications" from the menu.

The PS4 doesn't make it very obvious exactly what programs are running. For games, the interaction model is that opening a new game closes the previously running one. This is not how other apps work; they remain in the background indefinitely until you explicitly close them.

And it's gets worse than that. If your PS4 is configured to suspend any running apps when put to rest mode, you can seemingly power on the machine into a clean state, and still have a hidden background app that's causing the OS to limit your PSN download speeds.

This might explain some of the superstitions about this on the Internet. There are people who swear that putting the machine to rest mode helps with speeds, others who say it does nothing. Or how after every firmware update people will report increased download speeds. Odds are that nothing actually changed in the firmware; it's just that those people had done their first full reboot in a while, and finally had a system without a background app running.

Speculation

Those were the facts as I see them. Unfortunately this raises some new questions, which can't be answered experimentally. With no facts, there's no option except to speculate wildly!

Q: Is this an intentional feature? If so, what its purpose?

Yes, it must be intentional. The receive window changes very rapidly when applications or games are opened/closed, but not for any other reason. It's not any kind of subtle operating system level behavior; it's most likely the PS4 UI explicitly manipulating the socket receive buffers.

But why? I think the idea here must be to not allow the network traffic of background downloads to take resources away from the foreground use of the PS4. For example if I'm playing an online shooter, it makes sense to harshly limit the background download speeds to make sure the game is getting ping times that are both low and predictable. So there's at least some point in that 7kB receive window limit in some circumstances.

It's harder to see what the point of the 128kB receive window limit for running any app is. A single game download from some random CDN isn't going to muscle out Netflix or Youtube... The only thing I can think of is that they're afraid that multiple simultaneous downloads, e.g. due to automatic updates, might cause problems for playing video. But even that seems like a stretch.

There's an alternate theory that this is due to some non-network resource constraints (e.g. CPU, memory, disk). I don't think that works. If the CPU or disk were the constraint, just having the appropriate priorities in place would automatically take care of this. If the download process gets starved of CPU or disk bandwidth due to a low priority, the receive buffer would fill up and the receive window would scale down dynamically, exactly when needed. And the amounts of RAM we're talking about here are miniscule on a machine with 8GB of RAM; less than a megabyte.

Q: Is this feature implemented well?

Oh dear God, no. It's hard to believe just how sloppy this implementation is.

The biggest problem is that the limits get applied based just on what games/applications are currently running. That's just insane; what matters should be which games/applications someone is currently using. Especially in a console UI, it's a totally reasonable expectation that the foreground application gets priority. If I've got the download progress bar in the foreground, the system had damn well give that download priority. Not some application that was started a month ago, and hasn't been used since. Applying these limits in rest mode with suspended apps is beyond insane.

Second, these limits get applied per-connection. So if you've got a single download going, it'll get limited to 128kB of receive window. If you've got five downloads, they'll all get 128kB, for a total of 640kB. That means the efficiency of the "make sure downloads don't clog the network" policy depends purely on how many downloads are active. That's rubbish. This is all controlled on the application level, and the application knows how many downloads are active. If there really were an optimal static receive window X, it should just be split evenly across all the downloads.

Third, the core idea of applying a static receive window as a means of fighting bufferbloat is just fundamentally broken. Using the receive window as the rate limiting mechanism just means that the actual transfer rate will depend on the RTT (this is why a local proxy helps). For this kind of thing to work well, you can't have the rate limit depend on the RTT. You also can't just have somebody come up with a number once, and apply that limit to everyone. The limit needs to depend on the actual network conditions.

There are ways to detect how congested the downlink is in the client-side TCP stack. The proper fix would be to implement them, and adjust the receive window of low-priority background downloads if and only if congestion becomes an issue. That would actually be a pretty valuable feature for this kind of appliance. But I can kind of forgive this one; it's not an off the shelf feature, and maybe Sony doesn't employ any TCP kernel hackers.

Fourth, whatever method is being used to decide on whether a game is network-latency sensitive is broken. It's absurd that a demo of a single-player game idling in the initial title screen would cause the download speeds to be totally crippled. This really should be limited to actual multiplayer titles, and ideally just to periods where someone is actually playing the game online. Just having the game running should not be enough.

Q: How can this still be a problem, 4 years after launch?

I have no idea. Sony must know that the PSN download speeds have been a butt of jokes for years. It's probably the biggest complaint people have with the system. So it's hard to believe that nobody was ever given the task of figuring out why it's slow. And this is not rocket science; anyone bothering to look into it would find these problems in a day.

But it seems equally impossible that they know of the cause, but decided not to apply any of the the trivial fixes to it. (Hell, it wouldn't even need to be a proper technical fix. It could just be a piece of text saying that downloads will work faster with all other apps closed).

So while it's possible to speculate in an informed manner about other things, this particular question will remain as an open mystery. Big companies don't always get things done very efficiently, eh?

Footnotes

↧

Laverna: An open source Evernote alternative

August 19, 2017, 7:40 pm

≫ Next: Dplug for Developing VST Plugins on Linux

≪ Previous: Why PS4 downloads are so slow

Live markdown editing

Laverna has a simple and robust markdown editor, which can help you write faster than ever before with our live markdown preview.

Distraction free mode

The editor can be switched to distraction free mode where you can focus solely on writing notes.

↧

Dplug for Developing VST Plugins on Linux

August 20, 2017, 5:49 am

≫ Next: Eclipse Ceylon

≪ Previous: Laverna: An open source Evernote alternative

Developing VST Plugins for Linux is something that has always been of great interest to me personally. There never seemed to be a framework that was easily portable to Linux aside from JUCE. Why not use JUCE you ask? Don’t get me wrong, JUCE is an excellent framework full of a multitude of useful libraries, but the licensing fees are too high for an independent developer like me. Then there is WDL which I used for a while, but unfortunately, it has been orphaned so future developments are unlikely unless you do them yourself. WDL also doesn’t have support for Linux, and in my experience, it can be a nightmare to manage the projects. I could go on and on comparing JUCE, WDL, and Dplug but I will save that for a later post. In this post, I want to tell my story of why I switched to the new kid on the block: Dplug.

Begin long story…

Sometime about 6 or 7 months ago, I started learning a new language just for fun called D. It was a language that had caught my interest from various blog entries and youtube videos. One of the beautiful things about D is the package manager made for it called Dub. Just out of curiosity I searched the Dub package repository for “vst” and was shocked when “dplug” came up.

So what is Dplug? Dplug is an open source library written entirely in D that can build plugins for Windows and Mac in VST format and Audio Unit format. At first, this seemed too good to be true. I cloned the GitHub repository and followed the instructions in the README for building a sample plugin. I went to the distort example, opened a terminal, and typed in “dub”. Much to my surprise, the plugin built like it was magic. There wasn’t any frustration over using an IDE, no build errors, nothing. Surely something must have gone wrong, it’s never that easy. So I moved the resulting DLL over to my plugins folder and opened my host. Once again I was shocked to see the example plugin open and run beautifully.

At this point I was sold on Dplug, but there was just one thing that still bothered me. Just like WDL, there was no Linux support. After asking the author Guillaume Piolat about the state of Linux support, he mentioned that he had once attempted to add Linux support in the past but ran into problems with x11 and hosts not find the plugins’ entry points (D and the libraries written in it have matured a lot since that time). Guillaume decided to appoint a programmer named Rikki to add support for x11 windowing. Again I was shocked. There was a library for VST/AU plugins that is maintained extremely well and frequently.

Rikki is great when it comes to this sort of task. If you need any work done, especially in D or with windowing, I highly recommend him. http://cattermole.co.nz/

In just a short amount of time, Rikki finished up X11 windowing for Dplug and had a working standalone example. I could hardly contain my excitement to finally see my plugins run on Linux. The plugins didn’t work right off the bat unfortunately. This was due to the plugins not getting a handle on the window to draw to and also not receiving events for the X server. We handled this by having Dplug create its own window and then creating a separate thread to dispatch events and redraw the UI. After a few small changes, we had the plugins opening and properly handling X events.

Conclusion

Finally, there is an open source framework for creating VST plug-ins in Linux. There are no licensing fees to worry about and development is very active.

Dplug is exactly what I was looking for, but I’m not going to say that is the ultimate library for creating audio plugins. The library is still actively being developed so it doesn’t have support for formats like AAX or VST3. Also even though D has a syntax similar to C and C++, it may take a while to adapt to using a different language. If this doesn’t scare you away then I strongly encourage you to give Dplug a chance!

In the following months, if there is enough interest, I may begin a series of tutorials about writing plugins with Dplug. If you are interested in joining our discussion about Dplug, we have a discord server here https://discord.gg/V5xDwFP

The GitHub page for Dplug is here https://github.com/AuburnSounds/dplug

If you are interested in seeing what kind of plugins can be made with Dplug, then you should certainly checkout Auburn Sounds. They use Dplug exclusively. https://www.auburnsounds.com/

As always feel free to contact me at ctrecordingslive@gmail.com

Please follow and like us:

↧

Eclipse Ceylon

August 20, 2017, 3:55 am

≫ Next: Japan Is Selling Ice Cream That Doesn't Melt

≪ Previous: Dplug for Developing VST Plugins on Linux

Eclipse Ceylon is a modern statically-typed programming language for the Java, Android, and JavaScript virtual machines. The language features a unique and uncommonly elegant static type system, a flexible and very readable syntax, a powerful module architecture, a modular SDK, smooth interoperation with native Java and JavaScript, and with Maven and npm, excellent command-line tooling, and a full-featured IDE.

The Ceylon language is defined by a complete, but very readable specification.

Development of Ceylon ramped up in 2011, with the first major release 1.0 in 2013. Ceylon 1.1 was released in 2014, 1.2 in 2015, and 1.3 in 2016. Ceylon has a very active user community, with most interaction occurring in the various Gitter channels associated with the project.

The Ceylon project has already significantly advanced the state of the art in the Java/C# language family, and in the field of statically-typed languages in general. Innovations seen first in the Ceylon project are already being adopted in the type systems other new programming languages. Ceylon was the first language to demonstrate practical applications of free-form union and intersection types, and alerted the programming language community to the importance of these constructs. Ceylon was also either the first language, or at least one of the first two languages, to feature flow-sensitive typing, a concept which has now been adopted by a number of other languages. Furthermore, Ceylon has the most sophisticated and cleanly-integrated module system of any programming language.

↧