Hrorm Forums

Streams, 0.11.0


#1

Here is the new Streams stuff, first hash at it:

All tests pass.

I did start going down the path of eliminating the List<> methods altogether, but then backtracked. Alot of tests could use some style updates if that were the direction this were to go in. For example, many tests do a selection to a List, assert the count and then do assertions against individual rows. This could change to be a map/reduce where per-row and count assertions could be made in the same operation, or you could collect to a List and maintain the current style.

That, and this is already rocking the boat… All existing select methods were renamed stream*() and select*() default methods were added to collect Lists by default to keep API contract changes from disturbing things down in the tests or current implementations. I personally don’t like this, I think the streaming methods should continue to be called “select” as that’s what we’re doing in the database.

I did want to abstract further to demonstrate how this can be used to ease the efforts to separate of the entity hydration and row selection concerns by generating a Stream of ResultSet first then mapping that to BUILDER in a method call up, but I ran out of time before I had to head home.

Again, remember- you’re the one that has to live with and maintain it… show no mercy!


#2

So I think some more improvement can be made here. Specifically, I don’t even want to create the PreparedStatement / ResultSet objects until the Stream starts getting consumed. Let me see if I can nail that down…


#3

Done, ReallyLazyTest:

@Test
public void testStreamLazyEnough() throws SQLException {
    Connection connection = helper.connect();
    KeylessDao<Keyless> dao = Keyless.DAO_BUILDER.buildDao(connection);

    connection.close();

    // This shouldn't create ResultSets or PreparedStatement and thus shouldn't blow up.
    Stream<Keyless> keylessStream = dao.streamAll();
}

@Test(expected = HrormException.class)
public void testLazyResourceInitialization() throws SQLException {
    Connection connection = helper.connect();
    KeylessDao<Keyless> dao = Keyless.DAO_BUILDER.buildDao(connection);

    connection.close();

    Stream<Keyless> keylessStream = dao.streamAll();

    // Attempting to consume is where we should throw.
    keylessStream.forEach(System.out::println);
}

Now on “difficult to screw up”- here is a consideration to make. There’s nothing from preventing someone from doing:

// ...all the filtering on the client side.
personDao.streamAll()
    .filter(Person::isAdult) // Age >= 18
    .filter(p -> Objects.equals("Jack", p.getName()))

Baaaad practice. Perhaps for small datasets its fine, and maybe lets us do things hrorm currently cannot do:

// ...all the filtering on the client side.
personDao.streamAll()
    .filter(person -> person.getPets().stream().anyMatch(Pet::isCat))

But on partitioned datastores, this is a legitimate use case, and a pretty big plus, because it is often not efficient to ask the database to filter on anything other than the partition keys, unless, all of the partition keys are being queried against first.

// Some filtering client side
personDao.stream(partitionKeys)
    .filter(Person::isAdult);

This is a common use case on Spark/Cassandra.


#4

This is a bit less meaty, but some of your comments made me think about the bad naming of the various select methods.

This all started before hrorm was hrorm. When it was just a few classes in a program I was writing. The first thing I implemented was selecting by ID, returning 1 thing (or null). So, that got named select. The next thing was to get everything and that got named selectAll. I mostly did that second because it was the easiest thing to do. When I first wanted to specify something other than ID, I only needed one thing returned, so I wrote selectByColumns and that returned 1 thing (or null).

Now there are a lot more select methods, and the naming (that never really made sense, it was just three methods that I happened to use, and names I pulled out of the air because they matched what I was thinking about at the time) is all over the place. Here are all the Dao<ENTITY> methods for doing selections.

    ENTITY select(long id);
    ENTITY selectByColumns(ENTITY item, String... columnNames);
    List<ENTITY> selectAll();
    List<ENTITY> selectAll(Order order);
    List<ENTITY> select(Where where);
    List<ENTITY> select(Where where, Order order);
    List<ENTITY> selectMany(List<Long> ids);
    List<ENTITY> selectManyByColumns(ENTITY template, String... columnNames);
    List<ENTITY> selectManyByColumns(ENTITY template, Order order, String... columnNames);
    <T> T foldingSelect(T identity, BiFunction<T,ENTITY,T> accumulator, Where where);

What a mess.

So, I am thinking the correct thing is to make returning multiple records (a list) the default, and the rather less used case of selecting 1 record the one with the funny name. It could just be select and selectOne and have method overloading, rather than all the goofball names.

    ENTITY selectOne(long id);
    ENTITY selectOne(ENTITY item, String... columnNames);
    List<ENTITY> select();
    List<ENTITY> select(Where where);
    List<ENTITY> select(Order order);
    List<ENTITY> select(Where where, Order order);
    List<ENTITY> select(List<Long> ids);
    List<ENTITY> select(ENTITY template, String... columnNames);
    List<ENTITY> select(ENTITY template, Order order, String... columnNames);
    <T> T foldingSelect(T identity, BiFunction<T,ENTITY,T> accumulator, Where where);

I think that’s much cleaner, though perhaps all the overloading is annoying.

And of course, it’s another break in compatibility. But hey, I’m still on major version 0. Before version 1.0 this should be cleaned up, one way or another.


#5

A lot of libraries go in the direction you’re proposing. The proposal makes method name consistent with return behavior, if you were going in the opposing direction I’d complain about how select() can return one or many, why even distinguish between selectMany and selectAll if that’s what you wanted, etc.

Looking at it, this has come a looooong way. There’s nothing you really lack in the selection featureset that stands out as a gaping hole. The implementer has the ability to get things done (well, some people may insist you provide a .select(String sql) method, but I think that is a bad idea, at least at this point in time because you’ve lost control over column naming in the query).


#6

The thing that still is missing, is describing columns on joined or child tables.

I think the joined tables issue is more significant. I have never needed that in my own application, and I think most of your stuff is single-table, keyless, so I imagine it hasn’t come up for you.

It could be done from an interface level. Just add the name of the table as another argument when building a where clause. But that makes the implementation more complicated. Right now, where clauses are built without any consultation with the DAO that they are used in. That would have to change.

Even then, it’s only joins that can be supported that way. If you want to do child tables, you’re into having clauses, and again the SQL generation could start to get annoying.

I don’t like setting out to do this work until there’s someone using it to validate it.

Sadly, the thing that hrorm needs the most is more users. But then there is a chicken and egg problem. :man_shrugging:


#7

Well, here’s an idea for you- you could go in another direction to nab the functionality you’re going for. Where it may be difficult to implement complex joins and the query building language to back it, you could just stick with simple Daos and start going down the path of no-redemption:

public class LazyLoaderEntity {
    private long id;
    private String name;

    // Wraps lazyLoadedListDao + Where clause builder
    private Lazy<LazyLoadedEntity> children;
    // setters and getters
}

Free of reflection and proxies. Lazy could potentially implement many of the insert/update methods on Dao, or implement List and then changes to the “List” could trigger Dao updates/inserts accordingly.

lazyLoaderEntity.getChildren().stream()
    .filter(child -> child.getBirthMonth() == 10)
    .forEach(child -> {
        child.setAge(child.getAge()+1);
        lazyLoaderEntity.getChildren().update(child);
    });

In the builder:

Dao<LazyLoaderEntity> dao = new DaoBuilder<>("lazyParent", LazyLoaderEntity::new)
	.withPrimaryKey("id", "ID_SEQUENCE", LazyLoaderEntity::getId, LazyLoaderEntity::setId)
	.withLazyLoader(
		"lazyChildren",
		LazyLoaderEntity::getChildren,
		LazyLoaderEntity::setChildren,
		lazyLoaderEntity -> lazyLoadedEntityDao.lazy(
                    // Where clause builder for the children
                    Where.where("parent_id", Operator.EQUALS, lazyLoaderEntity.getId())
                    // Reinforces relationships prior to insert/update.
                    (lazyLoadedEntity) -> lazyLoadedEntity.setParent(lazyLoaderEntity.getId()))
		)
	//.buildDao

The increased cognitive complexity in the library would exist almost entirely in Lazy<> given it is designed to consume the Dao interface. If Lazy implements List, you don’t have to have a specifically Lazy-typed field any more than you’d have to use a specifically ArrayList<> typed field - you could just set a generic List field like any other.

That wouldn’t even be “lazy” in the way you critiqued Streams. Well, I guess it could be given that still, nothing loads until it is asked for.

The alternative to a Lazy<> type would be a contract on the getter/setter for a resetting-Stream:

public class LazyLoaderEntity {
    private long id;
    private String name;
    private Supplier<Stream<LazyLoadedEntity>> children; // Wraps lazyLoadedListDao + Where clause, wraps List

    public List<LazyLoadedEntity> getChildren() {
        return children.get()
            .collect(Collectors.toList());
    }

    public void setChildren(Supplier<Stream<LazyLoadedEntity>> children) {
        this.children = children;
    }
}

Then children would be immutable because there’s no to ask LazyLoadedEntity’s Dao for insert/updates, etc.


#8

I am not sure I understand what you proposed here, but I think it’s different from what I was thinking about.

I was imaging something this:

class Foo {
  Long id;
  String name;
  Bar bar;
}

class Bar {
  Long id;
  BigDecimal value;
}

Dao<Foo> fooDao = // this is built from a builder with a JoinColumn
List<Foo> foos = fooDao.select(where("BAR","VALUE",LESS_THAN,new BigDecimal("1.234")));

The difference is that I am doing a select from the FOO table based on a column comparison on the BAR table.

Maybe that’s what you were talking about too, and I just don’t quite understand your example.


#9

Joins are one way of getting related data in a RDBMS. When that becomes troublesome, you turn to additional queries and aggregation client-side. You also can’t do joins in some circumstances- different datastores, for example.

Another approach is to avoid joins and do additional querying.

// One bar query per foo
Stream<Foo> fooStream = barDao.select(where("VALUE", LESS_THAN, new BigDecimal("1.234")))
    .flatMap(bar -> fooDao.select(where("BAR_ID", EQUALS, bar.getId()))

// Or, one foo query, one bar query
Stream<Foo> fooStream = fooDao.select(where("bar_id", IN, 
    barDao.select(where("VALUE", LESS_THAN, new BigDecimal("1.234"))
        .map(Bar::getId)
        .collect(Collectors.toList()))) // Maybe make this easier by not requiring collect

It may at first glance seem like a waste or inefficient to not do a join- but joins are not always appropriate. Sometimes I don’t want something doing that extra work for me. What if I want to select 50,000 Foos with 5 or 6 Bars each on average? In 0.12.0 at least, you’re going to preload all of it into a List that each has List fields.

What if I want the Foos of some top percentage aggregate of Bar? What if there’s more like 200-2000 Bars per Foo?


#10

Updated the branch to your latest 0.11.0.

I’m exploring non-Long PKs… There’s too many generics. These interfaces need to be broken up. Look at this branch’ cornerstone method in SqlRunner.java:

public Stream<BUILDER> stream(String sql,
                                 StatementPopulator statementPopulator,
                                 Supplier<BUILDER> supplier,
                                 List<? extends ChildrenDescriptor<ENTITY,?, BUILDER,?>> childrenDescriptors){
    return new ResultSetQuery(connection, sql, statementPopulator).stream() // Stream<ResultSet>
            .map(resultSet -> hydrate(resultSet, supplier)) // Stream<BUILDER>
            .map(builder -> populateChildren(builder, childrenDescriptors)); // Stream<BUILDER>
}

That’s the 90% of HRORM’s functional logic right there. Query + Hydration + Child population. The only generics needed there currently are ENTITY and BUILDER, and one of those can be moved completely out of that class, mapped later in the pipeline. If you get rid of select*ByColumns (template selection) in the future, you can get rid of the BUILDER (and PARENTBUILDER, CHILDBUILDER) generics entirely in most of the current codebase. And the Child types are already erased here…

Something about the Descriptor classes, especially the ChildDescriptors, strikes me as too being far complex. The reason you have PARENT, CHILD generics like:

ChildrenDescriptor<PARENT,CHILD,PARENTBUILDER,CHILDBUILDER>

Is because Column doesn’t track the field type. If Column did, you wouldn’t need a ChildDescriptor at all with Parent/Child generics, you’d just need a descriptor associated with the Column or a Column descendant. But Column already has two generic types to track- and I can’t see why you need the BUILDER generic there at all at this point, even if you need it elsewhere.

If you really can’t agree with the direction of this branch, which is to expose Stream to the implementer, I completely understand. At the very least consider cherry picking some of the internal streaming code, like ResultSetQuery. That’s the start of complete separation between query execution and entity hydration which should help you find direction in simplifying the interfaces that need it the most at this point- the ones at the front door. Btw, StatementPopulator was a genius move, one that made this a very clean break.

Anything I can do to help, be it here or in other areas, let me know.


#11

I have so much to say. I’ll probably break this into parts.

First, I was trying to add more type information to columns, because I wanted to implement something that did “select distinct”. I’m not even sure how this should work at an interface level, so it is all a big experiment. Not necessarily something that will go into mainline ever.

If you’re curious, the branch is here:

One of the tests fails, for interesting reasons.

But it made me think about a lot of things having to do with types, so some of what you said hit home.

I think the one principle should always be: “What does this buy for clients of the library?” Much more important than internal cleanliness concerns.

So, there are a few things on the table that could help clients.

  1. Support for non-Long primary keys.
  2. Support for distinct queries.
  3. Better type-checking for where clauses.
  4. Better type-checking/cleaner interface for running SQL functions.

Those are the big ones that I see out there right now.

The first is just a straight-up feature. If you want it, we should do it. It’s another type parameter or interface or something, because right now, it’s just always a long. And a lot of things will have to know about it to make it work right, but that’s fine. It’s a feature that makes hrorm better.

The second is a feature too, and it’s potentially a good one. But I am not sure how the interface even looks yet. There’s a lot of distinct style queries people might want, and maybe types have little to do with it in the end.

The next two are closely related: they have to do with columns being defined as just strings. I do not see any way around that at all, without radical changes to hrorm. And I mean really radical. A much bigger change than adding some kind of stream support. Like hrorm becomes a library that code generates DAO objects for you. That could be really nice, but, no, that’s just a different solution to the problem and if I wanted to do that I would just start from scratch and name it something different.

On the plus side, the way where clauses and sql functions work right now is a bit loose from the type system point-of-view, but that’s the worst you can say about it. I actually like how where clauses work. I think they have a pretty low weight and pack a lot of functionality into it.

So, really, the non-long primary key is the least speculative, highest reward.

That has nothing to do with Streams/laziness though, from a working on hrorm or a client’s usage point-of-view. (That I can see.) So I would like to talk about that separately.

From a types point of view, the term “column” is pretty overloaded. In hrorm, we already have several different notions of column. One is the interface Column which really is something like DataElementInADaoDescriptor. Maybe Column is not the best word for that, and that can be changed.

Here are some of the concepts that we do have, one way or another. Not all of these map into hrorm classes, but they all exist somehow within hrorm.

  1. A SQL type (examples: INTEGER, VARCHAR, BOOLEAN)
  2. A Java SQL type (something that can be set onto a prepared statement or read from a result set) that supports a particular SQL type (examples Long, BigDecimal). This is very closely related to what hrorm now calls a GenericColumn, but they are subtly different, so it appears later.
  3. A Converter: something that translates between two types in both directions (generally between some arbitrary Java type and a Java SQL type) (examples, see hrorm Converters class)
  4. Java to SQL reader/writer. This is the GenericColumn. It sometimes mixes in a Converter.
  5. An Entity: an idealized model of something within a problem domain that can be persisted
  6. A Java model: A java class that represents an entity
  7. A table: A SQL structure that represents an entity
  8. A SQL column: a member of a SQL table that has a SQL type and a name
  9. A DAO Descriptor member: A SQL column combined with a Java type, and that Java type is either a Java SQL type, or has an associated Converter (this is what hrorm calls a Column right now for better or worse), so it always has a #4 in it, but also has a name, a prefix (or maybe that’s contextual or something), and getters and setters, which includes a specification of where to get and set things, so it includes an ENTITY class and a BUILDER class.

That could surely be refined further.

Your most fundamental point seems to be that concern number 9 is way down in the deepest bowels of hrorm, and that’s an annoyance, where’s it’s really trying to work on concepts 1-4 at the most. And relatedly, that some of the information that you really need at those lowest levels has been abstracted and lost.

My newest branch above simply adds some type information to column, but not enough, it turns out. And I agree that four type variables on a class is stretching it. Things like ChildDescriptor are hard to understand.

Well, this is already a wall of text, and I need to go put the laundry in the dryer.

If we want to support non-Long primary keys, let’s work on that. We can certainly start pulling apart the different types as we go down that road. I do think that will pay off for supporting other type-related things (like distinct selects).


#12

Two things can be true- you can be feature rich, and maintain a level of internal cleanliness. They’re not exclusive.

None of your generics have any bounds whatsoever. This is part because of what you’re doing (supporting whatever the implementer throws at you), but you should limit this to two per class at the maximum because that really is all you absolutely need (entity type and field type). BUIILDER is interesting. I attempted to rip it out a while back, but given alot of queries could only be done through templates it just didn’t work with the immutable use case.

If you force the implementer to provide a Converter<ENTITY, BUIILDER>, you can eliminate that generic altogether everywhere else. The advantage your approach has, is that the implementer only needs to provide half that equation- a Function<BUILDER, ENTITY>. For that feature alone, that I personally have never heard anyone absolutely needing in the ORM layer (admittedly not an argument), its too intrusive /in my opinion/. It has an objectively identifiable weakness: Because BUILDER and ENTITY are not type bound (more specifically to eachother in some way), you cannot reuse any of the method references given to us at Dao construction from BUILDER on ENTITY. Nor can you assume anything about the types of fields matching.

To sell my wares a little bit, if Streams ends up making it in, all an implementer would ever have to do to get that functionality would be to use the Builder class as ENTITY in Hrorm, then .map(BuilderClass::buiild) on the way out.

I think re-thinking the generics is a prerequisite of #1, that’s why I brought them up. Its not a criticism, I see it as an opportunity to clean things up all the way to the front door. But that “cleanup” may involve some “radical” changes as you put it.

#2, 3, 4: Ah, I see. Column Types certainly would make distinct easier. This is closely tied to the generics issue as well. I think you could start by making everything a Column<ENTITY, TYPE>: first with ChildColumn<ENTITY, CHILD> extends Column<ENTITY, CHILD>, and PrimaryKey<ENTITY, PRIMARY> extends Column<ENTITY, PRIMARY>.

I completely agree. I think the extra type security goodies you’re talking about will largely be a bolt-on, and not something central to Where clauses in general. Maybe for distinct, because the implementer wants the type information in that case for sure.

There’s a couple approaches for that:

dao.selectDistinct(Column.define("age", Person::getAge), where)

(Yes, “Column” is extremely overloaded. Maybe come up with a few different names for various useful traits of Columns to help with some of that confusion?)

One that comes to mind would be, you could reuse most of your current selection code, generate SQL selection criteria like distinct(COLUMN), hydrate a Person::new using only the Age column value, (instead of all columns), then .map(Person::getAge) on the way out.

Where can I help on this?


#13

To answer your final question, for the moment, I think we just need to think and talk some more. It’s not clear exactly what the way forward is.

One thing I was thinking about is that in a couple of places in hrorm there is an Envelope for a few reasons. Perhaps this should be expanded. Everywhere throughout hrorm where an object is passed, perhaps there should be an Envelope<ENTITY, BUILDER>. That way, wherever you are in the code, you know you have all the getters and setters.

It doesn’t reduce the number of type parameters, but it moves them around a bit, and perhaps it gives a place to say: here are all the things having to do with object population (hydration as you say), as well as object destructuring (desiccation?), and then we can more fully advertise the lower-level DB-oriented types into the SQL building and running classes.

Just an idea. This is all still muddy in my mind.


#14

I said I had more to say, and here’s one more thing.

I understand your concern with the ENTITY/BUILDER types. It basically doubles the number of type variables that flow through the parts of the system that know anything about entities. Believe me, I would not have done this if I could have thought of another way, but I couldn’t and still do not see another way.

Here’s the issue.

Suppose you have a type Foo that includes a type Bar. It could be a join or a child in hrorm terms, the issue is the same either way. Those types are both immutable, and have builder objects. In order to create a Foo, you need to have what? Not a BarBuilder, but a Bar. No one writes builder objects so the setters accept builder objects for components and instructions on how to build them. So, before you can even hydrate the FooBuilder, you need a fully built Bar. :confused:

You can get around that by having getters and setters on every builder object. But that basically means you have a fully mutable copy of your entire domain object graph. Lame.

So, anyway, that’s why hrorm works the way it does. I cannot think of another way to actually support immutable models.