Log of the #fcrepo channel on chat.freenode.net

Using timezone: Eastern Standard Time
* f4jenkins leaves00:50
* f4jenkins joins00:51
* mohamedar joins07:25
* mohamedar leaves08:14
* dhlamb joins08:44
* bseeger joins08:59
* bseeger leaves09:01
* bseeger joins
* dwilcox joins
* acoburn joins09:11
* dhlamb leaves09:15
* dhlamb_ joins
* whikloj joins09:27
* mohamedar joins09:29
* jrgriffiniii joins09:30
* dwilcox leaves09:32
* awoods joins09:39
* jrgriffiniii leaves
* jrgriffiniii joins09:45
* bseeger leaves10:09
* ajs6f joins10:14
* peichman joins10:28
* dwilcox joins10:30
* dwilcox leaves10:31
<ruebot>awoods: am i chairing today's call? or are you back for today?10:37
<awoods>ruebot: I am back, but would be happy to let you chair if you like.
ruebot: I am adding some agenda items as we speak.10:38
<ruebot>awoods: cool. it looks like i'm next in line for notes. so, i'm fine with either.10:39
<awoods>ruebot: if you take notes, I can facilitate.
<ruebot>awoods: i'll do notes. you're a better facilitator :-)10:40
<awoods>ruebot: thanks, and I doubt it.10:42
* dwilcox joins10:54
* bseeger joins11:00
<ajs6f> awoods: When I click to expand "SPI Specification", I see: JIRA Issues Macro: JIRA project does not exist or you do not have permission to view it. .11:02
I'm here.
awoods: I give you permission to do that.11:03
<ajwagner> Just got out of my meeting and joined the call.
<ajs6f>Back on mute.11:12
I'm with barmintor in that I don't think we will ever have that, _unless_ we have alternate implementations.11:14
<escowles_>i put together a quick histogram of how many classes are in each package: https://gist.github.com/escowles/bf09523270abbd00fbd611:15
<ajs6f>I suspect that a lot of what are in http-commons are exception mappers and the like.
It all sounds fine to me.
<whikloj>ajs6f++ # lots of exception mappers11:16
<ajs6f>Lots of exception types. Nothing wrong with that.
<ajs6f>We deprecate each and every class in the old one.11:17
Updated pull: https://github.com/fcrepo4-labs/fcrepo4-client/pull/3611:19
* travis-ci joins11:25
ualbertalib/fcrepo4-oaiprovider#35 (oai_dc - 6d78620 : Piyapong Charoenwattana): The build passed.
Change view : https://github.com/ualbertalib/fcrepo4-oaiprovider/compare/8d20d9d2a05e...6d78620b5e62
Build details : https://travis-ci.org/ualbertalib/fcrepo4-oaiprovider/builds/97480656
* travis-ci leaves
<ajs6f>Should it be based on an extant LDP client?11:26
If it's not, we screwed up.11:28
We're making software, so in some sense, it's all ephemeral. It's just symbols in either.11:32
Did escowles get dropped.11:33
<whikloj>oh I thought I did
<ajs6f>What's the advantage of waiting?11:37
<ruebot>if the timing works out, i can help out with the release again11:42
<bseeger>Starwars and Fedora for Christmas? That'd just be too much. ;)11:43
<escowles_>afk -- need to step away for a sec11:45
<ajs6f>The technology behind Fedora is just as well-thought through and realistic as that behind Star Wars.11:46
<escowles_>i'm back
<ajs6f>This is our release strategy: https://www.youtube.com/watch?v=6jLbHNtpigg11:49
<bseeger>I need to leave early. 'See' you all in a couple of weeks!11:52
* ruebot drops off11:57
* escowles_ has to run -- talk to y'all later12:01
<ajs6f>Rich, thick, even gooey.12:02
So we're all +1 to talking.12:03
* bseeger leaves
<ajs6f>acoburn: Seriously, it you had to reimpl tomorrow, for some reason, would you start with the kernel module?12:04
acoburn: Because we could stick Jena TDB and a filesystem under that in a day.
<acoburn>ajs6f: ideally, no
<ajs6f>acoburn: Because you don't trust the modules above it?12:05
<acoburn>ajs6f: no, because it's got too many JCR notions
<ajs6f>acoburn: So you would want a shared-nothing reimpl?
acoburn: Clean room?
<acoburn>ajs6f: it's not that — I'd actually _like_ to use the -kernel-api12:06
ajs6f: it's just that there's a lot of work to remove the JCR notions from it
ajs6f: so it's not something that could happen "tomorrow"
<ajs6f>acoburn: Okay. So what about this: https://wiki.duraspace.org/display/FF/High-level+Requirements?focusedCommentId=71991959#comment-7199195912:07
<acoburn>ajs6f: well, there's that
<ajs6f>acoburn: Is there a persistent RDF index of reasonable quality for that library?
<acoburn>ajs6f: that I'm not sure about
ajs6f: it should be easy to ascertain, though12:08
<ajs6f>acoburn: I have no idea what the foreign function work looks like there.
acoburn: What is the libary you were thinking of?
<acoburn>ajs6f: https://hackage.haskell.org/package/rdf4h12:09
ajs6f: however, I think a departure from the JVM would be too big of a step at the moment for the community12:10
<ajs6f>acoburn: Yeah, that looks nice, but I don't see any sign of persistence.
acoburn: Not sure I agree there. But it's hard to say.
<acoburn>ajs6f: in terms of persistence, we could serialize to some sort of backend, no?12:12
<ajs6f>acoburn: Are you talking about a generic object store?
acoburn: I think we need proper indexes.
<acoburn>ajs6f: I had been thinking of HBase
<ajs6f>acoburn: Maybe. cbeer and I looked at that sveral years ago (at the beginning of Fedora 4).12:13
acoburn: After some experiementing, the opinion from us (and eddies and barmintor) was that we couldn't make HBase easy enough to use.
acoburn: There wasn't a strong ability to provide a deployable artifact.
<acoburn>ajs6f: everything you need to do with LDP (Membership/Containment) can be done with keyscans
<ajs6f>acoburn: Or something that resembled a deployable artifact.12:14
<acoburn>ajs6f: you're right that deployment is really hard
<ajs6f>acoburn: nd frankly, very few people really need the scale of Hadoop. Lots of people think they do.
<acoburn>ajs6f: I would agree with that
<ajs6f>acoburn: Partly, this is why I'm always such a bugbear about not offering query across resources, or across the repo. This is exactly where that rubber meets the road: changing persistence.12:15
acoburn: What about your choice of clusterable k-v store?12:16
<acoburn>ajs6f: I've been using riak for years and really like it12:17
<ajs6f>acoburn: That clusters well and persists in a synchronous fashion?
<acoburn>ajs6f: it clusters very well12:18
ajs6f: in fact, it only runs in clustered mode
<ajs6f>acoburn: What about using Bitcask directly? I've never looked at that.
acoburn: Too low-leve?
<acoburn>ajs6f: I don't think that would give you any of the MapReduce capabilities12:19
<ajs6f>acoburn: If we want the JVM, is there a JVM bytecode client for Riak?
<acoburn>ajs6f: or any of the consistent hashing/entropy checking stuff
ajs6f: yes, there is a JVM client12:20
<ajs6f>acoburn: Oh, that's something we woudl want. The m-r is nice, but that stuff is at the heart of Fedora's elevator pitch.
<ajs6f>acoburn: And then what do we do with bitstreams? Put them on a filesystem? Put them in Riak?
<acoburn>ajs6f: use jclouds12:21
ajs6f: that way you can use a filesystem backend or put them in any blob store
<ajs6f>acoburn: And the kernel we write introduces atomicity between them?
riak and jclouds, tat is
<acoburn>ajs6f: we could use zookeeper
ajs6f: zk, for handling atomic operations12:22
<ajs6f>acoburn: I haven't seen that done. Is that how Hadoop does that kind of thing?12:23
acoburn: Cute: https://bookkeeper.apache.org/index.html
<acoburn>ajs6f: hbase relies on zk12:24
<ajs6f>acoburn: I know, I just thought that was mostly about config. I didn't realize that zk was governing xactions.
<acoburn>ajs6f: yes, zk provides primitives for all sorts of consensus- and quorum-based distributed operations
ajs6f: it's not easy to use, but it's also really solid12:25
<ajs6f>acoburn: That's why I'm looking at Bookeeper. Seems much higher-level and more intelligible fo people like us who are interested in "preservation".
<acoburn>ajs6f: you're right, that may be a better way to model these operations12:26
<ajs6f>acoburn: Bookkeepr is written over ZK.
<acoburn>ajs6f: right12:27
<ajs6f>acoburn: Alo, we've talked about: https://curator.apache.org/
<acoburn>ajs6f: yes, I think there are now a number of these projects that abstract away much of zk's abstruseness, while keeping it's essential properties as a way to manage distributed processes12:29
<ajs6f>acoburn: We're begging the question of whether a reimpl needs to cluster. If the point of reimpling is to understand the API and write it down, clustering is irrelevant at best and more likely to be ditracting. If the point of reimpling is to get off the MODE train, then clustering matters.12:30
<acoburn>ajs6f: well said12:31
<ajs6f>acoburn: Which one do you care abot more?
<acoburn>ajs6f: I care much more about the practical matters of scaling a repository than about the purity of the interface12:32
<ajs6f>acoburn: So that's the question about MODE, and you want to worry about clustering up front.12:33
<acoburn>ajs6f: I've been seeing some level of traffic over the last 6 mos related to MODE not scaling. I think that's what people will care about12:35
<ajs6f>acoburn: Well, I'm not here to carry any water for MODE. But scaling != clustering.12:36
<acoburn>ajs6f: if there's a reimpl that doesn't scale well, why would institutions want to give developer time to write that?
ajs6f: that's fair
<ajs6f>acoburn: I want to do something quick. I don't want to spend a month wading through cluster config just to store a triple.12:37
<acoburn>ajs6f: there's huge value in _any_ reimpl, because it opens the door to a lot of possibilities12:38
ajs6f: so if that means using a triplestore or a simple filesystem backend, that's great
<ajs6f>acoburn: Yes, that's why we're talking about this. But we have to make some choices at some point.12:39
<acoburn>ajs6f: presumably, the result would be a much cleaner java API, so some other impl is easier to write
<ajs6f>acoburn: I thought you don't care about the Java API as much as the HTTP API?12:40
<acoburn>ajs6f: I do care about the Java API — that's key for any JVM impl12:41
<ajs6f>acoburn: Are we even trying to standardize a Java API? I did not think we are.
<acoburn>ajs6f: it makes sense to me that, if there are multiple JVM impls, they would share code from fcrepo-kernel-api and the fcrepo-http-* modules12:43
<ajs6f>acoburn: I
'm now confused. I thought you wanted a clean-room approach because of worries about contaimnating the new impl with JCR semantics.
<acoburn>ajs6f: I'm not really concerned so much about that (I can see advantages to it, but I'm not so bent on that approach)12:44
ajs6f: the better approach, I think, is to simply remove the current JCR dependencies (and some of the underlying semantics) from the existing java APIs12:45
ajs6f: the fact that the kernel api has an "observers" package (which clearly comes from JCR semantics) doesn't actually bother me12:47
ajs6f: I see it as a curiosity, and if there is a move to reorganize the packages/classes, it may then be removed, but I see no reason to remove that simply because the terminology comes from JCR12:48
<ajs6f>acoburn: So what, then, is your objection to doing a reimpl by impling the kernel-api module?12:50
<acoburn>ajs6f: I don't have one, other than a general, but poorly-defined, desire to work within the existing codebase12:51
ajs6f: wait, I misunderstood your last comment12:52
<ajs6f>acoburn: That's not an objection, is it? That speaks in _favor_ of reimpling fcrepo-kernel-api, doesn't it?
<acoburn>ajs6f: yes, that's exactly what I mean: reimpling fcrepo-kernel-api12:53
<ajs6f>acoburn: {sigh} Okay, so we _are_ going to reimpl fcrepo-kernel?
<acoburn>ajs6f: yes, I think that's the idea12:54
<ajs6f>acoburn: Okay, glad we got that settled. And you want to bake clustering in early.12:55
<acoburn>ajs6f: I think it's easier to address clustering from the start, rather than as more of an afterthought (if the impl is to support clustering at all)12:56
<ajs6f>acoburn: I agree with that. The question is whether it _is_ to address clustering at all, or whether, once we reimpl and collect data about how to prune and grow the specfiication, we declare victory and run away.12:57
* dwilcox leaves
<ajs6f>acoburn: Is this a pure reference impl, or a tool for quotidian use?
<acoburn>ajs6f: I'm perfectly fine with either, but I suspect my institution would want me working on the latter12:58
<ajs6f>acoburn: That's legit.12:59
acoburn: So we reimpl fcrepo-kernel-api, using tech that clusters, and fcrepo-kernel-awesome is responsible for the atomicity of what occurs beneath it.13:00
<acoburn>ajs6f: yes, the impl would have to enforce atomicity
<ajs6f>acoburn: So do we need a fine grained store for the triples vs. a bistream store, or is there a store that can do both?13:02
<acoburn>ajs6f: I've looked around at that, and they really do different kinds of things
ajs6f: most k-v stores don't handle files of arbitrary size (e.g. >10 GB)13:03
<ajs6f>acoburn: Yeah, I had that sense. So it's two stores and we have to bind them up.
<acoburn>ajs6f: plus, everyone seems to want control over that storage layer, which is why I thought jclouds might work13:04
<ajs6f>acoburn: I'm no against JClouds, just making sure we _know_ why we use it, if we do.13:05
acoburn: I know why we used MODE. Becuase we were in a hell-bent hurry.
<acoburn>ajs6f: I also looked into HDFS, but that's really fraught for this use case13:06
ajs6f: mostly the high latency
<ajs6f>acoburn: Isn't that using a garbage truck to empty your kitchen compost bin, for most people?
<ajwagner>(As someone that is trying to learn the codebase, I've enjoyed watching this conversation, and am interested in contributing both from a personal and institution basis on a kernel reimpl w/ a focus on clustering. -- Stepping away for a bit.)
<ajs6f>As someone who contributed a lot to that codebase, I am very embarrassed that anyone is looking at it.
<acoburn>ajs6f: I know that hadoop gets _a lot_ of hype, but more and more tools are being built on top of it13:08
ajs6f: but my point is that hdfs is probably *not* best suited for a random-access blob store13:09
<ajs6f>acoburn: I'm not worried about the hype. I'm worried about foisting the management of a Hadoop cluster on someone who wants a turnkey Islandora solution.13:11
<acoburn>ajs6f: yes, that would be a real problem. OTOH, if there are multiple impls, one could presumably choose based on the needs of their institution13:12
<ajs6f>acoburn: Sure, but we need to get there.13:16
* bseeger joins
<ajs6f>acoburn: And my guess is that just as popular as the heavy metal clusterable solution will be the lightweight single-node solution (which is super easy to manage).13:17
<acoburn>ajs6f: yes, but won't MODE continue to work for single-node use?
<ajs6f>acoburn: Maybe. I'm not going to spend a lot of time worry about it now.13:18
acoburn: If you wanted to use HDFS+HBase, you would use what? The Cloudera packaging?13:19
<acoburn>ajs6f: I haven't gotten that far in my thinking, but that's certainly a good option13:20
<ajs6f>acoburn: We tried that and it was so hard to dev over (so hard to just spin up a clust against which to run tests, for example) that we gave it up. But maybe it's better now?13:21
<acoburn>ajs6f: I think they're all hard to deal with at some level13:22
<ajs6f>acoburn: But you want to try?
acoburn: I guess we pick one and go some distance with it.
<acoburn>ajs6f: I haven't ruled out other options: cassandra, riak13:24
<ajs6f>acoburn: We have to pick a square from which to start.
<acoburn>ajs6f: I completely agree, and honestly, I think starting with a hadoop-based system will be painful13:25
<ajs6f>acoburn: Okay, scratch Hadoop.13:26
acborun: We looked at Cassandra too at that time, and it was no easier to use.
acoburn: But maybe it is better now. It's been several years.
<acoburn>ajs6f: it's much easier to deploy than HBase13:27
<ajs6f>acoburn: https://github.com/apache/incubator-rya
<acoburn>ajs6f: yes, I saw their announcement a few weeks ago
ajs6f: they're using accumulo
ajs6f: (I know virtually nothing about accumulo)13:28
<ajs6f>acoburn: Yeah. I've never foolled with that.
acoburn: It has a cool logo. It reminds me of the graphics from Tron.
acoburn: "Apache Accumulo is based on Google's http://research.google.com/archive/bigtable.html design and is built on top of Apache http://hadoop.apache.org/, http://zookeeper.apache.org/, and http://thrift.apache.org/. "
acoburn: All these things are like the same five ingredients for a salad with different dressings.13:29
<acoburn>ajs6f: yes, that's about all I know of it — it's another bigtable system
ajs6f: I actually think that, given the kinds of data retrieval queries that are needed, a bigtable system would be very well suited to this13:30
<ajs6f>acoburn: I can see that.
<acoburn>ajs6f: that is, well suited to fedora
ajs6f: you can put properties in a column family, server managed triples in its own column family, etc13:31
<ajs6f>acoburn: Do you want to try just starting an instance of Rya? See how hard it is?
acoburn: You could use both Rya and backing Accumulo store in different ways.
<acoburn>ajs6f: I'm up for that. some "incubator" projects are more mature than others, and I wasn't clear how far along rya is13:32
<ajs6f>acoburn: https://github.com/apache/incubator-rya#user-content-direct-openrdf-api13:33
acoburn: Sesame.
acoburn: Do we have any candidates for the bitstream store?13:34
What else?
<acoburn>ajs6f: :-)
ajs6f: infinispan?13:35
<ajs6f>acoburn: Sure. Other data grids?
acoburn: Interesting: https://github.com/apache/incubator-rya/blob/master/osgi/camel.rya/src/test/java/mvm/rya/camel/cbsail/CbSailIntegrationTest.java13:36
<acoburn>ajs6f: that is interesting. seems they've considered OSGi bundling, too13:38
ajs6f: I've got to get some lunch — afk
<ajs6f>acoburn: k
* dwilcox joins13:42
* dwilcox leaves14:01
* dwilcox joins14:02
* bseeger leaves14:07
* dwilcox leaves14:15
* travis-ci joins14:21
ualbertalib/fcrepo4-oaiprovider#36 (oai_dc - 5eb5d54 : piyapongch): The build passed.
Change view : https://github.com/ualbertalib/fcrepo4-oaiprovider/compare/6d78620b5e62...5eb5d5426b66
Build details : https://travis-ci.org/ualbertalib/fcrepo4-oaiprovider/builds/97520464
* travis-ci leaves
* dwilcox joins14:24
* dwilcox leaves14:26
* bseeger joins14:52
<ruebot>awoods: we're on the normal tech call line for the book club call, right?14:59
* mksndz joins
* dwilcox joins15:08
* dwilcox leaves15:09
* dwilcox joins15:16
* dwilcox leaves15:19
* dwilcox joins15:41
* dwilcox leaves15:42
* mksndz leaves15:43
* dwilcox joins15:47
* dwilcox leaves15:48
* dhlamb_ leaves15:59
* bseeger leaves16:25
* bseeger joins16:31
* bseeger leaves16:33
* github-ff joins16:42
[fcrepo-java-client] awoods pushed 2 new commits to master: http://git.io/v05wa
fcrepo-java-client/master e83a4be Aaron Coburn: add OSGi support
fcrepo-java-client/master ee8876c Andrew Woods: Merge pull request #1 from acoburn/fcrepo-1858...
* github-ff leaves
* travis-ci joins16:45
fcrepo4-exts/fcrepo-java-client#2 (master - ee8876c : Andrew Woods): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-java-client/compare/19c07746526b...ee8876c9e1e8
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-java-client/builds/97549458
* travis-ci leaves
* travis-ci joins16:46
fcrepo4-exts/fcrepo-camel#240 (master - cade1ab : Andrew Woods): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-camel/compare/1695a1d51d35...cade1ab98f03
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-camel/builds/97549465
* travis-ci leaves
* ajs6f leaves16:49
* awead_ joins
* bseeger joins16:50
* awead leaves16:51
* bseeger leaves16:59
* ksclarke leaves
* acoburn leaves17:03
* jrgriffiniii leaves
* mohamedar leaves17:04
* ksclarke joins17:14
* peichman leaves
<awoods>escowles_ ping17:18
<escowles_>awoods: pong
<awoods>escowles_ have you had additional success with your performance testing and optimal hierarchy?17:19
<escowles_>i left it running the two-level 64K test, let me check on it17:20
looks like it got as far as making 851968 child nodes, and the server machine is now taking a while to respond...17:22
<awoods>escowles_ Are you saying 851968 child nodes were created across 64,000 containers at the top-level each with 64,000 child containers?17:23
* awead leaves17:24
<escowles_>awoods: no, 13 batches have finished processing, each with 64K children, for a total of 851968 nodes
<awoods>escowles: what does the structure look like?17:25
<escowles_>it's /rest/Small/[batch]/[child]17:26
<awoods>escowles_: and the node-type of the [batch] container is one of the new Modeshape types?17:30
<escowles_>yes, both /rest/Small and /rest/Small/[batch] are mode:unorderedSmallCollections
* mohamedar joins
* jgpawletko leaves17:31
<escowles_>looks like the test is still running, but it's slowed down so much it's still not done with batch 14
<awoods>escowles_: have you considered a test like: /rest/Small/0..64,000 followed by an iteration across the 64,000 containers adding a few new resources to each of those containers per pass?
<escowles_>awoods: i've been doing depth-first (creating a batch and filling it17:32
awoods: but i could try doing breadth-first and creating all of the containers and then adding one to each container
<awoods>escowles_: I wonder if a breadth first approach would produce a different performance profile.17:33
escowles_: would it be easy to tweak your scripts?
<escowles_>awoods: it's worth trying -- i'll start a new test tomorrow and retry the best-performing configuration so far (4K children)
<awoods>escowles_ thanks17:34
<escowles_>yes, it should be pretty easy to do that -- and i've got an email from Ben asking about them, so I think i'll put them up on github to make it easier to keep track of them
<awoods>escowles_ armintor? or wallberg?
...or pennell?
<escowles_>pennel: https://wiki.duraspace.org/display/FEDORA4x/Modeshape+Unordered+Collections?focusedCommentId=71992281#comment-7199228117:35
awoods: i'm off to make dinner -- talk to you later
<awoods>escowles_ will you be able to make Monday's meeting? https://wiki.duraspace.org/display/FF/2015-12-21+Performance+-+Scale+Meeting
<escowles_>awoods: unfortunately not -- i've only got one meeting next week and i'm double-booked :(17:36
<escowles_>awoods: i'll get my scripts uploaded and update the wiki page before then, so people can see what i've been up to17:37
* jgpawletko joins17:57
* whikloj leaves
* jgpawletko leaves
* travis-ci joins17:58
ualbertalib/fcrepo4-oaiprovider#37 (oai_dc - c495b82 : piyapongch): The build passed.
Change view : https://github.com/ualbertalib/fcrepo4-oaiprovider/compare/5eb5d5426b66...c495b827c74b
Build details : https://travis-ci.org/ualbertalib/fcrepo4-oaiprovider/builds/97563629
* travis-ci leaves
* ksclarke leaves
* jgpawletko joins18:00
* mohamedar leaves
* mohamedar joins18:17
<ruebot>awoods: for the RC, when you do you want to start? monday morning?18:23
* ksclarke joins18:47
<awoods>ruebot: The RC is basically just a tag and publishing to github.18:53
ruebot: we can do that Monday, sure.18:54
* jgpawletko leaves18:55
<ruebot>awoods: ah, ok. so no sonatype for RC then.
<awoods>ruebot: I don't think so.18:56
<ruebot>awoods: i'm pretty free monday/tuesday/wednesday, so whatever works best for you. and if escowles_ is game, we can just spread it across the three of us and get it done real quick.
<awoods>ruebot: let's plan on touching base on Monday. The main work is determining which projects to release.18:57
<ruebot>awoods: sounds good.
* mohamedar leaves19:06
* mohamedar joins20:43
* awead joins20:57
* mohamedar leaves21:04
* dhlamb joins21:37
* jgpawletko joins22:37
* dhlamb leaves23:14

Generated by Sualtam