Combining partitioned lists with C++

The task

Imagine you have a partitioned list like this:

[
    { "from": 0, "content": "blue", "to": 10 },
    { "from": 12, "content": "blue", "to": 13 },
    { "from": 14, "content": "blue", "to": 25 },
    { "from": 26, "content": "red", "to": 30 }
]

As a result, we want to merge adjacent partitions (they might have a gap between them) with the same content. That means, we expect this as a result:

[
    { "from": 0, "content": "blue", "to": 25 },
    { "from": 26, "content": "red", "to": 30 }
]

Oh, and we don't want to make any assumptions about the underlying container - this might be std::vector, std::unsorted_set, a plain C array… we simply don't know.

The task in C++

First, the structur of the array members:

template <typename t>
struct Partition {
    size_t from;
    std::optional<t> content;
    size_t to;
    template <typename u>
    bool operator==(const Partition<u>& other) const
    {
        return *content == *other.content;
    }
};

template <typename t>
std::ostream&
operator<<(std::ostream& stream, const Partition<t>& elem)
{
    return stream << "[" << elem.from << "|" << elem.to
                  << "]: " << elem.content.value_or("<undefined>");
}

Why the std::optional<t> in there? Well, we want to mark the content as invalid. We might refer to std::pair<t,bool> for this. Let's quote cppreference.com:

As opposed to other approaches, such as std::pair<T,bool>, optional handles expensive-to-construct objects well and is more readable, as the intent is expressed explicitly.

The second part struck with me - readable code lowers the WTFs/min value, which is always good.

Now, the invocation part:

int main()
{
    std::vector<Partition<std::string> > test = { { 0, "blue", 10 },
        { 12, "blue", 13 },
        { 14, "blue", 25 },
        { 26, "red", 30 } };
    const auto test_end = merge(test.begin(), test.end());
    std::for_each(test.begin(), test_end, [](const auto& element) {
        std::cout << element << std::endl;
    });
    return 0;
}

This should work similar to std::remove - we simply don't care what's after test_end.

The implementation

template <typename t>
t merge(t first, t last)
{
    t iter = first, next = first;
    for (++next; iter != last && next != last;) {
        if (*iter == *next) {
            iter->to = next->to;
            next->content = std::nullopt;
            ++next;
        } else
            iter = next++;
    }
    return std::remove_if(first, last,
        [](const auto& elem) { return !elem.content; });
}

This doesn't just look a lot like std::adjacent_find, it was also… "inspired" by the STL algorithm. Unfortunately, I couldn't pick the STL implementation, as I need to perform some additional steps in case I find two neighbors.

The central part are the iteration increments of iter and next - we have to carefully decide, when to increase which iterator. If we merge the two elements, we move next. Otherwise, we jump iter to next and increase next.

Complexity

  • One loop traversion from first to last for merging and invalidating
  • A second loop traversion for std::remove_if

⇒ this results in O(2n) = O(n) runtime.

Comments !