On our everyday job we keep collecting data with streams either with toList()
or groupingBy()
but actually we can even develop our own collector. In order to know how, we must understand
the methods provided by the Collector
interface.
public interface Collector<T, A, R> {
Supplier<A> supplier();
BiConsumer<A, T> accumulator();
BinaryOperator<A> combiner();
Function<A, R> finisher();
Set<Characteristics> characteristics();
}
<T>
is the type of input elements to the reduction operation.
<A>
is the mutable accumulation type of the reduction operation.
<R>
is the result type of the reduction operation.
At this point itβs obvious that a collector like toList()
implements the interface as Collector<T, List<T>, List<T>>
The supplier method has to return a Supplier
of an empty accumulator used during the collection process.
This empty accumulator will also represent the result of the collection process when performed against an empty stream.
The accumulator method returns the function that performs the reduction operation. Itβs internal state is changed in order to reflect the effect of the traversed element.
The finisher method returns a function in order to transform the accumulator object into the final result of the whole operation.
The combiner method defines how the accumulators resulting from the reduction of different subparts of the stream are combined when the subparts are processed in parallel.
Having a class Result
that encapsulates three result values: a, b and c; we want to reduce a collection of results
into a single combined result.
class Result {
private long a;
private long b;
private long c;
Result combine(Result result) {
return new Result(
this.a += result.a,
this.b += result.b,
this.c += result.c
);
}
}
At first we need to create a new collector class ResultCollector
that receives a Result
, combine two Result
instances into a new Result
and returns the final result which is also a new Result
.
class ResultCollector<T> implements Collector<Result, Result, Result>
The supplier method returns an empty result.
@Override
public Supplier<Result> supplier() {
return Result::new;
}
The accumulator method calls the Result.combine(result)
method in order to sum both result values.
@Override
public BiConsumer<Result, Result> accumulator() {
return Result::combine;
}
The combiner method does the same as the accumulator by receiving two partial results and sum both values.
@Override
public BinaryOperator<Result> combiner() {
return Result::combine;
}
The finisher method just returns the accumulator object.
@Override
public Function<Result, Result> finisher() {
return Function.identity();
}
The characteristics method indicates that the accumulator object is directly used as the final result of the reduction process.
@Override
public Set<Characteristics> characteristics() {
return EnumSet.of(IDENTITY_FINISH);
}
class ResultCollector<T> implements Collector<Result, Result, Result> {
@Override
public Supplier<Result> supplier() {
return Result::new;
}
@Override
public BiConsumer<Result, Result> accumulator() {
return Result::combine;
}
@Override
public BinaryOperator<Result> combiner() {
return Result::combine;
}
@Override
public Function<Result, Result> finisher() {
return Function.identity();
}
@Override
public Set<Characteristics> characteristics() {
return EnumSet.of(IDENTITY_FINISH);
}
}
Result result = IntStream.range(0, 1_000_000)
.mapToObj(i -> new Result(1, 2, 3))
.collect(new ResultCollector<>());
With our custom collector we can reduce millions of results into a single combined result:
Result{a:1000000,b:2000000,c:3000000}
.
IntStream.range(0,1_000_000)
.
mapToObj(i ->new
Result(1,2,3))
.
reduce(Result::combine);
Performing 10 times the same process using reduce:
Fastest was done in 14ms.
Performing 10 times the same process using custom collector:
Fastest was done in 7ms.
Answer: performance.