Protein superfamily

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

The term protein fold refers to a similar concept based on structual comparison. In some schemes such as SCOP and CATH it is treated as a level above the superfamily (with common ancestry at the fold level being not as strongly supported as the superfamily level), while other schemes treat them as synonymous. The level beyond the fold is the fold class, which describes a rough topology of the protein (e.g. all-α, all-β, α+β, α/β).