I. Some observations/facts about diets and insertion into diets: There are essentially three different cases that can occur during an insertion of an integer x into a diet d: (A) x is not adjacent to any other interval => Simply create a node [x,x] at the appropriate place in d (B) x is adjacent to exactly one interval [i,j] (ie, x=i-1 or x=j+1) => Replace [i,j] by [x,j] or [i,x] (C) x is adjacent to two intervals [i,j] and [i',j']. Assume wlog that i Replace one interval (the one that is located higher in d) by [i,j'] and delete the other. II. In the sequel I will use the following Haskell classes and types: class Ix a => Discrete a where pre, suc :: a -> a adjacent :: a -> a -> Bool adjacent x y = suc x==y instance Discrete Int where pre x = x-1 suc x = x+1 data Discrete a => Diet a = Empty | Node (Diet a) (a,a) (Diet a) deriving (Eq) {- for better debugging: showsDiscrete :: (Show a,Discrete a) => (a,a) -> ShowS showsDiscrete (i,j) = (" ["++) . shows i . (',':) . shows j . ("] "++) showsDiet :: (Show a,Discrete a) => Diet a -> ShowS showsDiet Empty = id showsDiet (Node l i r) = ('(':) . showsDiet l . showsDiscrete i . showsDiet r . (')':) instance (Show a,Discrete a) => Show (Diet a) where showsPrec _ d = showsDiet d -} III. Derivation of insert in several steps: (1) Formulation as two completely separate phases: insert & regroup (2) Fusion of insert & regroup (to eliminate duplicate path traversal) (3) Split regroup into two functions (to eliminate redundant tests) (4) Fusion of auxiliary functions (to eliminate duplicate path traversal) (1) First, insert is defined almost like for ordinary binary search trees: insert :: Discrete a => a -> Diet a -> Diet a insert x Empty = Node Empty (x,x) Empty insert x d@(Node l (i,j) r) = if xj then Node l (i,j) (insert x r) else d Now regroup has to find out whether the newly inserted interval [x,x] is adjacent to none, one, or two intervals, and it has to perform the appropriate reorganization. Important for the further development is the following CLAIM: If regroup does not take the last inserted value x as a parameter, its running time cannot be guaranteed to be bounded by the height of the diet. Actually, regroup follows (at least) the path that was taken by insert x to find the possibly adjacent intervals. Without guidance by x regroup had to branch at each node into two directions which would mean in the worst case a scan of the complete diet. regroup works as follows: if x is found without encountering an adjacent interval, the diet is left unchanged (case (A), last "else"). Once an adjacent adjacent interval (say, [i,j] with x+1=i) is found, it must be tested whether case (B) or (C) is present. Therefore, the (old) predecessor of [i,j] is sought (note: the current predecessor is [x,x]). Such a predecessor exists only if the right subtree of l is not empty (note: l is definitely not empty because it contains at least the interval [x,x]). If a predecessor [li,lj] exists, and if it is also adjacent to x, it can be fused with [i,j] to [li,j], and [li,lj] and [x,x] must be deleted from l. Otherwise, [x,x] (which is the maximum interval in l) must be deleted and [i,j] is just extended by one element to the left, ie, [x,j]. (The case for x>j is analogous.) regroup :: Discrete a => a -> Diet a -> Diet a regroup x Empty = Empty regroup x d@(Node l (i,j) r) = if x l /= Empty if adjacent x i then if right l /= Empty && adjacent lj x then Node (delPredMax l) (li,j) r else Node (delMax l) (x,j) r else Node (regroup x l) (i,j) r else if x>j then -- => r /= Empty if adjacent j x then if left r /= Empty && adjacent x ri then Node l (i,rj) (delSuccMin r) else Node l (i,x) (delMin r) else Node l (i,j) (regroup x r) else d where (li,lj) = predMax l (ri,rj) = succMin r The used auxiliary functions are: left (Node l _ _) = l right (Node _ _ r) = r delMin :: Discrete a => Diet a -> Diet a delMin (Node Empty _ r) = r delMin (Node l i r) = Node (delMin l) i r delSuccMin :: Discrete a => Diet a -> Diet a delSuccMin (Node (Node Empty _ Empty) _ r) = r delSuccMin (Node l i r) = Node (delSuccMin l) i r succMin :: Discrete a => Diet a -> (a,a) succMin (Node (Node Empty _ Empty) i _) = i succMin (Node l _ _) = succMin l delMax :: Discrete a => Diet a -> Diet a delMax (Node l _ Empty) = l delMax (Node l i r) = Node l i (delMax r) delPredMax :: Discrete a => Diet a -> Diet a delPredMax (Node l _ (Node Empty _ Empty)) = l delPredMax (Node l i r) = Node l i (delPredMax r) predMax :: Discrete a => Diet a -> (a,a) predMax (Node _ i (Node Empty _ Empty)) = i predMax (Node _ _ r) = predMax r Now insertion can be simply defined as: insertIt :: Discrete a => a -> Diet a -> Diet a insertIt x d = regroup x (insert x d) (2) It is striking in (1) that regroup initially follows the same path as taken by insert x. The actual work of regroup begins not before an adjacent interval has been found. It therefore seems to be reasonable to move the first/outer adjacency tests from regroup to insert. This makes the whole program more efficient, and it simplifies regroup a bit while insert is still easy to understand. insert x Empty = Node Empty (x,x) Empty insert x d@(Node l (i,j) r) = if xj then if adjacent j x then regroup x (Node l (i,x) r) else Node l (i,j) (insert x r) else d In contrast to the approach (1), regroup does not see a diet in which [x,x] has already been inserted. Instead, if x is adjacent to an interval, this is the current interval which has already been extended by insert. So the side of regrouping is determined by comparing x with i and j. regroup x d@(Node l (i,j) r) = if x==i then -- regroup left if l /= Empty then if adjacent lj x then Node (delMax l) (li,j) r else Node l (x,j) r else d else if x==j then if r /= Empty then if adjacent x ri then Node l (i,rj) (delMin r) else Node l (i,x) r else d else d where (li,lj) = findMax l (ri,rj) = findMin r The fact that [x,x] is not in the diet to be regrouped also simplifies the needed auxiliary functions. We need delMin and delMax from above and the two simple functions: findMin :: Discrete a => Diet a -> (a,a) findMin (Node Empty i _) = i findMin (Node l _ _) = findMin l findMax :: Discrete a => Diet a -> (a,a) findMax (Node _ i Empty) = i findMax (Node _ _ r) = findMax r (3) We see that in the above version of regroup x is only needed to find out the proper side for regrouping. Since we have this information already available in insert, we can fuse one step further by splitting regroup into two functions joinLeft and joinRight. This also allows to replace the second conditional by pattern matching. So instead of regroup we get the following two functions: joinLeft d@(Node Empty _ _) = d joinLeft d@(Node l (i,j) r) = if adjacent lj i then Node (delMax l) (li,j) r else Node l (i,j) r where (li,lj) = findMax l joinRight d@(Node _ _ Empty) = d joinRight d@(Node l (i,j) r) = if adjacent j ri then Node l (i,rj) (delMin r) else Node l (i,j) r where (ri,rj) = findMin r and insert has to be slightly changed to: insert x Empty = Node Empty (x,x) Empty insert x d@(Node l (i,j) r) = if xj then if adjacent j x then joinRight (Node l (i,x) r) else Node l (i,j) (insert x r) else d (4) A final source of inefficiency in (3) is that in case (C) the path to the second adjacent interval is traversed twice (by the auxiliary functions findMin and delMin, for instance). But these can be fused together into one auxiliary function splitMin that retrieves and deletes the predecessor interval (same for findMax, etc.) in one run. Thus, we finally obtain the following version: insert remains unchanged, and joining and splitting is defined as follows. splitMin :: Discrete a => Diet a -> (Diet a,(a,a)) splitMin (Node Empty i r) = (r,i) splitMin (Node l i r) = (Node l' i r,i') where (l',i') = splitMin l splitMax :: Discrete a => Diet a -> (Diet a,(a,a)) splitMax (Node l i Empty) = (l,i) splitMax (Node l i r) = (Node l i r',i') where (r',i') = splitMax r joinLeft d@(Node Empty _ _) = d joinLeft (Node l (i,j) r) = if adjacent lj i then Node l' (li,j) r else Node l (i,j) r where (l',(li,lj)) = splitMax l joinRight d@(Node _ _ Empty) = d joinRight (Node l (i,j) r) = if adjacent j ri then Node l (i,rj) r' else Node l (i,j) r where (r',(ri,rj)) = splitMin r In my opinion this last version looks quite elegant. It is reasonably concise and it is, in particular, the most efficient one. I don't believe there is a simpler solution that is as efficient.