Log of the #fcrepo channel on chat.freenode.net

Using timezone: Eastern Standard Time
* peichman leaves00:56
* ruebot_ joins04:45
* ruebot leaves04:50
* cmmills joins05:00
* chadmills leaves05:12
* ruebot_ leaves07:58
* ruebot joins
* dhlamb joins08:34
* jrgriffiniii joins08:50
* ksclarke joins09:01
* acoburn joins09:19
* acoburn leaves09:27
* ajs6f joins09:29
* peichman joins09:57
* acoburn joins10:02
* escowles leaves
* escowles joins10:03
<acoburn>ajs6f: Cassandra _does_ allow you to scale out storage — its architecture follows from dynamodb10:09
* 64MAAK02B joins
<ajs6f>acoburn: Sure. But that's not it's purpose, and there are better ways to do it. How does Cassandra help me with a 3TB data file?
<acoburn>ajs6f: it's not good at blob storage, but for "properties", I think it's an excellent fit
ajs6f: that's where jclouds comes in10:10
<ajs6f>acoburn: "it's not good at blob storage" gives the game away.
acoburn: I think we are still looking for different things.
<acoburn>ajs6f: that sort of behavior can then be passed off to swift of riak cs or AWS S3
<ajs6f>acoburn: I don't want to store a giant bag of "properties". I want to store a giant graph.10:11
<acoburn>ajs6f: sure, but with a graph, you've always got locality of reference
ajs6f: are you wanting to run graph operations on the entire graph? SVD and the like?10:12
<ajs6f>acoburn: Yes. Why would that be a bad thing? Einstein told us that it's a basic feature of reality.
acoburn: I want locality of reference as a _postitive quality_.
<acoburn>ajs6f: but by using a K-V system, you've got locality of reference baked in10:13
<ajs6f>acoburn: Cassandra is not a K-V system.
<acoburn>ajs6f: how is it not a K-V store?
<ajs6f>acoburn: The keys have special semantics for Cassandra (partiion section, column familty section, etc.)10:14
<ruebot>awoods: ready whenever you are. i just have two meeting this afternoon, and that's it for me.10:15
<acoburn>ajs6f: yes, it has wide columns, which differentiates it from something like mongodb, but lookups are by partition key (or secondary index)
* jgpawletko joins
<awoods>ruebot: are you available for a quick google-hangout at 11ET? https://plus.google.com/hangouts/_/event/c1glu6soq43r1rr6ou17qtobug8
<ruebot>awoods: totally.10:16
<awoods>ruebot: talk to you then
<ajs6f>acoburn: But partition key is also responsbile for sorting, for example. The point is that these different pieces of data structure are not opaque to the system, not at all. Using them in different ways is going to have massively different effects.
sorry s/sorting/distribution/10:17
<acoburn>ajs6f: yes, but wouldn't a fedora resource path be an excellent partition key?10:18
<ajs6f>acoburn: NO. Consider the case in which some storage is more expensive than other storage.10:19
acoburn: I'm not saying that Cassandra wouldn't make a good choice. I'm saying that it would make a good choice for the purposes for which I'm trying to choose, and I think that maybe we are trying to choose for two different sets of purposes.10:20
urg s/it would make a good/it wouldn't make a good/
acoburn: If the intention is never to support services across per-resource graphs (and nothing in the Fedora API asks to be able to do that) then literally just distributing NTriples files across filesystems will be fine.10:22
acoburn: We should just pick a distributed filesystem, an RDF API to use internally, and roll.
<awoods>ajs6f: how would you characterize the differences in your set of purposes in this exercise from acoburn's?10:23
<ajs6f>awoods: I want to be able to offer services over the union graph, and I want to do it using the standard tool for that work: SPARQL, M-R, Spark, etc.10:24
awoods: The Fedora API is the least interesting thing in the mix.
<awoods>acoburn: do you have a different set of purposes?10:25
<acoburn>awoods: I believe my intention here is to make the fedora API fast, with a fault-tolerant, horizontally scalable architecture
<ajs6f>acoburn: What do you mean by "fast"? Lots of requests per second?10:26
<acoburn>ajs6f: high throughput, high concurrency, low latency
<ajs6f>acoburn : So lots of bytes per sec, lots of reqs per sec, low average latency per req?10:27
<acoburn>ajs6f: yes10:28
<ajs6f>acoburn: Okay, got it. Yeah, I'm not interested in any of those things, because I don't think they are the responsibility of the central repository process. But I can certainly see that a lot of other people besides you are going to be interested in those things.10:29
<acoburn>ajs6f: I think the sort of analytical capabilities you describe are a good fit for an external system; something that lives external to the fedora API10:30
* bseeger joins10:31
<ajs6f>acoburn: You could do it that way, sure. Or you could make them the center of the system and take advantage of techniques like caching-in-depth and queuing transactions at the boundaries of the syste, which to me is a much more "webby" way of working.10:32
acoburn: Making the authoritative system of record responsible for answering end user requests with celerity and dispatch is not how the web scales.10:33
acoburn: It's mainfraimy
<awoods>ajs6f: would you put the current modeshape-base f4 impl into the mainframey category?10:35
<ajs6f>acoburn: But if you don't _want_ to do the kinds of unoin-graph-centric things that I want to do, and/or no one is _asking_ you to do those things, then you defintely don't want to invest it making them possible, cause it will be very expensive. I admit that frankly.
<acoburn>ajs6f: I'm not suggesting that one wouldn't use caching layers; ideally, user (web) requests would never actually touch fedora
<ajs6f>awoods: It's not a question of putting design into categories. It's a question of putting design priorities into categories.10:36
acoburn: I am very seriously tempted to try a design based on a pure distributed filesystem. Just NTriples files with hashing. At very large scale, it might be quite competitive.10:37
<acoburn>ajs6f: I think your union-graph ideas are important, and a perfect fit for hadoop/spark-like systems; as you know, my _only_ hesitation about that is that Hadoop is difficult to deploy
ajs6f: you think the I/O wouldn't be too high?10:38
<ajs6f>acoburn: Horribly, hellishly, Hadoopishly difficult.
<acoburn>ajs6f: I mean I/O latency
<ajs6f>acoburn: that's not Fedora's problem. Seriously, we are not going to solve the general problems of distributed filesystems. Even Fedo Raadmin isn't that smart.10:39
acoburn: I do think that at high enough scale, the latency will be swamped by the problems in other designs.10:40
acoburn: But maybe not. Anyway, you could do locking pretty easily.10:41
<acoburn>ajs6f: distributed locking, my favorite
<ajs6f>acoburn: awwods is getting ready to bake transactions into the public, published API.10:42
acoburn: How do you propose to impl that in a distributed environment without lcoks.10:43
<acoburn>ajs6f: I will continue to object to having transactions as part of the public API
ajs6f: you don't implement tx in a distributed context10:44
<ajs6f>acoburn: That's the kind of blank-faced obstructionism I expect from ajs6f, not a mature person like you.
<acoburn>ajs6f: ok, fine, you implement it in zookeeper
<ajs6f>Maybe if we go back in time and somehow prevent awoods from ever having been born…
acoburn: Isn't that just letting ZK manage the locks?10:45
<acoburn>ajs6f: and then all the people thinking about contributing to fedora run screaming in the other direction
<awoods>acoburn: you should make your txn objects more vocal... they seem to have been lost in the noise.
<acoburn>awoods: NO TRANSACTIONS; NO TRANSACTIONS
<awoods>acoburn: good to know. thanks.
<acoburn>ajs6f: I'd rather have zk manage distributed locks than trying to write it myself10:46
ajs6f: of course if you get a cross-datacenter partition, your entire system goes down while zk tries to maintain consistency10:47
<ajs6f>acoburn: Hm. Maybe. I think it does have to do with the granularity of locks. I just spent a lot of time with this question as part of Jena's new in-memory dataset structures. You _can_ reason about this stuff, and the most important thing is to understand what you are really locking. But this is all a side note. Neither Cassandra nor Hadoop give much help her either. I'm still willing to claim that a bunch of RDF files on a dstributed filesy10:48
acoburn: _IF_ you don't want to operate on the union gaph as part of your core services, and are willing to contract that out to asynch follower gear.10:50
<acoburn>ajs6f: your penultimate message was cut off
<ajs6f>acoburn: Hm. Maybe. I think it does have to do with the granularity of locks. I just spent a lot of time with this question as part of Jena's new in-memory dataset structures. You _can_ reason about this stuff, and the most important thing is to understand what you are really locking.10:51
acoburn: But this is all a side note. Neither Cassandra nor Hadoop give much help her either. I'm still willing to claim that a bunch of RDF files on a dstributed filesystem is a perfectly viable backend.
acoburn: _IF_ you don't want to operate on the union gaph as part of your core services, and are willing to contract that out to asynch follower gear.
<acoburn>ajs6f: using a distributed filesystem sounds like a viable approach (ceph, gluster, etc)
ajs6f: and a simple one10:52
ajs6f: I'm not sure how you'd handle containment, though
ajs6f: writing ldp:contains triples to every resource on the FS?10:53
<ajs6f>acoburn: You mean atomicity?
acoburn: Or just recordation?
<acoburn>ajs6f: both
<ajs6f>acoburn: As long as people are making requests via the Fedora API, atomicity comes from there. If they aren't, it's not Fedora's prolem.
acoburn: As far as recording it, who said we can't use sidecare files?10:54
s/sidecare/sidecar/
<acoburn>ajs6f: if you're writing a lot of records to a particular node, you'd need exclusive (W) locks on that file
ajs6f: an append-only file might be useful in that context
<ajs6f>acoburn: Yep. That can work quite well. In fact, it might help with versioning.10:55
<acoburn>ajs6f: you could periodically compact it to remove deleted records
<ajs6f>acoburn: https://github.com/afs/rdf-patch
acoburn: Oh, wait, he moved it. Fooey.
acoburn: https://afs.github.io/rdf-patch/10:56
<acoburn>ajs6f: that's interesting10:57
<ajs6f>acoburn: Yes, that could be a policy-driven asynch task.10:58
acoburn: Streamy. The state of a graph at any given time t = the integral over time from 0 to t of the record.
<acoburn>ajs6f: that's a significant (but not unwelcome) departure from the current model of storing resources in a _particular_ state.11:01
ajs6f: it would change the architectural thinking from "data stores" to "stream processors"11:02
<ajs6f>acoburn: Yes. The more I think about it, the more I like it. Ironically, I'm pretty sure that a n old-school Fedora person like Thorny Staples would like it. He was always going on about how datastreams were really meant to be just that— streams.11:03
<acoburn>ajs6f: are you familiar with storm and/or kafka? they might be useful technologies in this context
<ajs6f>acoburn: It would change the thinking as you say, but only inside the running process. The persistence still has to write down bits.
acoburn: I've glanced at Kafka, but not enough to call myself familiar with it, and not at all at Storm. There are log-centric stream managers?11:04
* peichman leaves
<acoburn>ajs6f: kafka brands itself as a "distributed commit log". it really rethinks the architecture for message processing11:05
ajs6f: storm is often used with kafka to process events reported by the kafka broker11:06
ajs6f: storm lets you define arbitrary network topologies in software for how these message/stream processing tasks are handled
ajs6f: it's sort of like hadoop but for streaming data11:07
* peichman joins
<ajs6f>acoburn: So that's again buying a lot of expensive JVM-centric cluster gear. What are we getting in exchnage for that cost?11:10
<acoburn>ajs6f: I think the main question is: to what degree does fedora use the CPU and how does it use I/O11:14
<ajs6f>acoburn: It barely uses the CPU.
acoburn: Well, that may change with the advent of API-X.
<acoburn>ajs6f: if it is principally about storing bytes, then all this stuff with spark, hadoop, etc is not useful, since we don't care about fault tolerant cpu processing11:15
ajs6f: and a distributed filesystem will solve most of it
<ajs6f>acoburn: That's kind of what I am getting at. And the Fedora API is definitely about sotring bytes.11:16
<acoburn>ajs6f: API-X may choose to use it's own complex gear, but that should be completely separate from fedora11:17
<ajs6f>acoburn: I've been pushing them in that direction.
<acoburn>ajs6f: yes, and I completely agree with the direction you're pushing them in11:18
<ajs6f>acoburn: Good. You get on your hands and knees behind them and I'll push them backwards over you.11:19
<ruebot>awoods: hangouts froze on me. one sec.11:23
<ajs6f>acoburn: So are you actually interested in trying a "bunch of files" approach?11:24
<acoburn>ajs6f: I'm not opposed to it; the main thing I'd want to think about is how well such an architecture would work when run in the cloud11:25
<ajs6f>acoburn: It's hard to say, because of low visibility.11:27
<acoburn>ajs6f: seems like this is a new service in AWS: https://aws.amazon.com/efs/11:31
ajs6f: I haven't used it, but it looks like a good fit for those people considering cloud deployments
<ajs6f>acoburn: Looks like NFS where you pay per bit, no?
<acoburn>ajs6f: yes
<ajs6f>acoburn: Okay by me. Working in a large bureaucracy as I do, my issues with this kind of service are almost never "what does it provide?" but "how can I pay for it?"11:32
<acoburn>ajs6f: of course, but there will be plenty of people who will be expecting fedora to run in the cloud11:34
<ajs6f>acoburn: Of whom perhaps thrity percent will have any real sense of what that phrase means.
acoburn: It would not be hard to extend this:11:35
https://afs.github.io/rdf-thrift/rdf-binary-thrift.html
to record change.
<acoburn>ajs6f: thrift++11:36
<ajs6f>acoburn: If we can crush the requirements down to "the POSIX filesystem API" we will be doing really well to be able to put Fedora anywhere you want. "The Cloud", your toaster's embedded microchip, whatever.11:37
<acoburn>ajs6f: cool, that also lessens the dependency on JVM-based infrastructure11:40
* bseeger leaves
<ajs6f>acoburn: Yes, and that is important if we are serious about standards vs. impl.11:43
<awoods>ruebot: looks like you froze again... we will catch up later.11:46
<ajs6f>acoburn: Do you want to try doing some design for this "files on a filesystem" thing?
acoburn: I feel like at some point we have to go in some direction.
<ruebot>awoods: sounds good. gotta use my laptop instead of desktop when i'm in the office for skype/hangout and the wifi is spotty.11:47
<acoburn>ajs6f: yes, that sounds good11:48
<ajs6f>acoburn: So we have the assumption of a more-or-less POSIX filesystem, eh?11:49
acoburn: And we are implementing fcrepo-kernel-api?
<acoburn>ajs6f: for both the properties _and_ the binaries?
ajs6f: yes, implementing fcrepo-kernel-api11:50
<ajs6f>acoburn: What, for both? The filesystem, for both?
<acoburn>ajs6f: I'd prefer to use jclouds for binaries — it gives people more flexibility
<ajs6f>acoburn: Mm. Hm. Can JClouds present block storage?11:51
acoburn: Meaning, is there apparatus to make JClouds look like afilesystem?
<acoburn>ajs6f: I know you can tell jclouds to use a filesystem11:52
<ajs6f>acoburn: Yeah, but I eamn the other way around.
<acoburn>ajs6f: you _really_ don't want to do that11:53
<ajs6f>acoburn: Because of the latency? How is it going to be any much worse than using JClouds directly?
<acoburn>ajs6f: e.g. there are systems that make S3 look like a filesystem, and they are really problematic — these storage systems have different characteristics than filesystems11:54
ajs6f: mostly because there's no concept of a directory with these blob storage systems11:55
<ajs6f>acoburn: Why would we use JClouds for the binaries but put the triples on a local filesystem? We wouldn't we put the triples in 'the cloud" too? Or neither? I know the latency is there. That's why you have indexes.
acoburn: I have used S3-hiding filesystems without any problem, but more importantly ^^^. Why are we splitting the storage?
<awoods>all: on a separate note, we will be putting out a release candidate today/tomorrow. That will require all new development to go into a "dev" branch instead of "master".11:56
<ajs6f>awoods: Why not take off a branch for the release?
awoods: OIr a "tag", if that's what I'm thinking of.11:57
<awoods>ajs6f: yes, we will do that as well.
<acoburn>awoods: what ajs6f said ^^^^
<ajs6f>awoods: So why wouldn't people be able to continue to work off master?
<awoods>ajs6f: I believe the maven release machinery merges back into master...11:58
<ajs6f>awoods: But that merge is just version numbers, right?
<awoods>ajs6f: as well as bugfixes that come up in testing the RC
<ajs6f>awoods: Which should be going into master directly anyway, no?11:59
awoods: Merge into release branch/tag, merge into master.
awoods: Isn't this why people do that Gitflow thing with separate masater and dev branches all the time?12:00
<ruebot>awoods: https://www.dropbox.com/s/xu0gsldl6q5ccdz/fcrepo-release-canidate.txt -- some notes i took... if that is helpful
<ajs6f>acoburn: Are you against using JClouds for everything?12:02
<acoburn>ajs6f: not against it, but I'd be concerned that the latency would be too high12:03
ajs6f: I see binaries as completely different beasts from containers with properties12:04
<ajs6f>acoburn: Again, that's what indexes are for. Isn't this the same conversation we had to have years ago (I don't think you were involved then) about human-readable persistence vs. machine-useful persistence?
acoburn: Well, maybe not quite the same, but related.
<awoods>ajs6f: you are probably right. The maven release machinery bases its release on the current branch (RC in this case). It should be fine continuing to develop on master with the RC branch for the release, period. We will give it a shot this time.12:05
<ajs6f>awoods: You're taking my advice about Git? FIRST. BIG. MISTAKE.
<acoburn>ajs6f: so w/r/t indexes, are you thinking fedora would manage those indexes, or would they be external?12:06
<ajs6f>acoburn: I think most Fedora users see bits. The binaries sans descriptions are usually unintelligible, the description without binaries are just a shell of content.
acoburn: Like eh way that the Roadrunner would run away from the Coyote so fast that his/her outline would hang in the air?12:07
acoburn: That's what the metadata without the bistreams is.
<awoods>acoburn: These are the modules that will be in the 4.5.0 release. Sound right?
fcrepo4
fcrepo-module-auth-rbacl
fcrepo-module-auth-xacml
fcrepo-module-auth-webac
fcrepo-mint
fcrepo-transform
fcrepo-audit
<ajs6f>acoburn: I think we have a good solution to point to already— fcrepo-camel-conquers-the-world.
<awoods>fcrepo-webapp-plus
fcrepo4-vagrant
<acoburn>ajs6f: ok, just wanted to make sure we were talking about the same thing12:08
awoods: yes, that looks correct
<ajs6f>acoburn: Then Thunderbirds Are Go, and they are flying through JClouds.
<acoburn>ajs6f: the problem with jclouds is that the objects it operates on aren't real files — e.g. you can't append to them12:10
ajs6f: you basically have GET, PUT and DELETE operations
<ajs6f>acoburn: They are blobs? That's annoying, but not insurmountable. You can add new blobs, right?
acoburn: Your IDs are whatever you want, right?
<ruebot>ajs6f, acoburn: one thing y'all have me thinking about is how we'll handle objects over 5GB w/r/t blobstorage. for example swift's object limit if 5GB, and anything over that is segemented.
<acoburn>ajs6f: yes, but if the choice is between using jclouds for everything or nothing, I'm inclined to not use it at all12:11
<ajs6f>ruebot: Everything has a limit. For example, I can listen to about fifteen second of talk about professional sports, after which I start to segment my attention.
<ruebot>ajs6f++
<ajs6f>acoburn: OOOOkay. So we start with the filesystem for everything and learn from that?12:12
<ruebot>ajs6f, acoburn: ...or i just threw out a red herring, and i should continue to sit here and watch y'all talk :-)
<ajs6f>acoburn: Actually, the other things that is nice about the filesystem for everything is that it is a very natural reference implementation.
<acoburn>ruebot: same with S3 — you've got to segment the file into a bunch of pieces
ajs6f: yes, it does make the implementation really simple12:13
<ajs6f>acoburn: And understandbale.
<ruebot>awoods: these were the two dependecy matrices i was talking about -- http://camel.apache.org/karaf.html && http://karaf.apache.org/index/documentation/karaf-dependencies/karaf-deps-3.0.x.html
<ajs6f>acoburn: Okay, I'm going to the pool. bbl
* ajs6f leaves
<ruebot>acoburn: that makes sense, since swift is based on/mimics the s3 api12:14
<acoburn>awoods: w/r/t the release, we'll also want to have a corresponding release of fcrepo-karaf, but that can come later12:15
awoods: I am not sure how it stands right now with the various snapshot artifacts12:16
<awoods>acoburn: is fcrepo-karaf in shape for release?
<acoburn>awoods: probably, I'd just need to test it
awoods: someday I need to actually add pax-exam to that project12:17
<awoods>acoburn: that would be nice.
acoburn: I am not sure if whikloj is still working that issue12:18
<ruebot>awoods, acoburn: whikloj is gone for the holidays.12:19
<acoburn>awoods: maybe I'll find some time to work on that over the next few weeks
<ruebot>https://wiki.duraspace.org/display/FF/Component+Compatibility+Matrix
^^^ let me know if that make sense for layout
then we just add version numbers that work with or are preferred for the fcrepo core release in the first row12:20
...and i can add historic versions that are pre 4.4.0 is y'all want too12:21
<awoods>ruebot: I wonder if we could collapse the modules that will always be released along with fcrepo4?
ruebot: ...which may be limited to just fcrepo-webapp-plus12:23
<ruebot>awoods: that makes sense. those could be listed in some text above it. fcrepo components include... something like that?12:24
<awoods>ruebot: for example, fcrepo-module-auth-xacml will probably not get a new release if its code does not change.12:25
ruebot: your matrix looks good (if we collapse fcrepo4 and fcrepo-webapp-plus). acoburn: in the future, do you see fcrepo-karaf being in lock-step with fcrepo4 releases?12:26
<acoburn>awoods: yes, that is the point of that project12:27
<awoods>acoburn: good. Do you see any other projects that would also be in lock-step with fcrepo4?12:28
<acoburn>awoods: what about oaiprovider and swordserver?12:30
<awoods>acoburn: those are "labs"
<acoburn>awoods: yes
<awoods>acoburn: so I would think it is up to whoever is taking responsibility for those projects to do a release... which may not be on the same schedule as fcrepo4.12:31
<acoburn>awoods: that make sense to me
<ruebot>awoods: oh, so if fcrepo-module-auth-xacml didn't get a new release, it's version number would just carry over to the next column. ...if you refresh the page. you'll see what i mean.12:34
<awoods>ruebot: looks correct13:05
* bseeger joins13:06
* bseeger leaves
<ruebot>awoods: cool. i'll start populating it.13:08
<awoods>ruebot++13:09
<acoburn>awoods: what do you think about a release for fcrepo-java-client?13:10
awoods: I could issue one today
awoods: I might need a little assistance, as I haven't run a formal release before, but it should be pretty easy
<awoods>acoburn: did you get the requisite number of +1's?13:11
<acoburn>awoods: I never heard back from dhlamb or mikeatuva
<awoods>acoburn: they probably do not mind
acoburn: I am around if you want to do a release
<acoburn>awoods: ok, thanks
<ruebot>dhlamb: you cool with acoburn doing a release for fcrepo-java-client?13:18
awoods: did you trash that spreadsheet? i just got a popup on the doc13:19
<awoods>ruebot: yes, I think using tabs is cleaner: https://docs.google.com/spreadsheets/d/1I_zTMxh2l2rf2wpafoTwhSTR5GZuEoaTcZmTKCI3xT4/edit?usp=sharing13:20
<ruebot>awoods++ agreed. that's how we do it in islandora land
<acoburn>awoods/ruebot: I'm at the point of no return with fcrepo-java-client. should I wait for dhlamb or just proceed?13:21
<awoods>acoburn: I do not see why dhlamb would have an issue here.
* dhlamb pokes his head in
<ruebot>acoburn: i pinged him in skype. i say proceed :-)
<dhlamb>go for it
<ruebot>ah, there he is.
<acoburn>dhlamb: thanks!13:22
<ruebot>awoods, acoburn: started adding version numbers to the matrix. we can tweak/update as needed.
awoods, acoburn: i can start adding 4.3.0 and prior too if you'd like.13:23
<acoburn>awoods: looks like I don't have write access to fcrepo4-exts/fcrepo-java-client (!)
<awoods>acoburn: try now13:26
<acoburn>awoods++
* github-ff joins
[fcrepo-java-client] acoburn pushed 2 new commits to master: http://git.io/vEYLb
fcrepo-java-client/master b11b4a9 Aaron Coburn: [maven-release-plugin] prepare release fcrepo-java-client-0.1.1
fcrepo-java-client/master 0093d6b Aaron Coburn: [maven-release-plugin] prepare for next development iteration
* github-ff leaves
* github-ff joins13:27
[fcrepo-java-client] acoburn tagged fcrepo-java-client-0.1.1 at cfd5da0: http://git.io/vEYtv
* github-ff leaves
* travis-ci joins13:29
fcrepo4-exts/fcrepo-java-client#3 (master - 0093d6b : Aaron Coburn): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-java-client/compare/ee8876c9e1e8...0093d6b8b889
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-java-client/builds/98170922
* travis-ci leaves
<ruebot>awoods, acoburn: now that you've done that, should i add 0.1.1 to the 4.5.0 column for fcrepo-java-client?
* travis-ci joins
fcrepo4-exts/fcrepo-java-client#4 (fcrepo-java-client-0.1.1 - b11b4a9 : Aaron Coburn): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-java-client/compare/fcrepo-java-client-0.1.1
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-java-client/builds/98171052
* travis-ci leaves
<ruebot>...just to test the matrix
<acoburn>ruebot: should be for both 4.4.0 and 4.5.013:30
* ajs6f joins
<ruebot>acoburn: updated. would we add the 0.1.0 release in there at all? or do we remain silent on that one?13:31
* jrgriffiniii leaves
* jrgriffiniii joins13:32
* bseeger joins13:47
<acoburn>awoods: seems like everything's all set w/r/t the fcrepo-java-client release13:57
awoods: overall it went very smoothly — I had to update my m2/settings.xml file a bit13:58
awoods: does it seem like there's anything else to do?
awoods: after the artifact makes its way to maven central, I'll issue a PR to fcrepo-camel to use the released version14:00
<ajs6f>acoburn: Just sit back and wait for the money to roll in.
* jrgriffiniii leaves14:05
* jrgriffiniii joins14:09
<ajs6f>acoburn: Are we in agreement to try the filesystem-based approach for now, with the idea that we can explore clustering by clustering the filesystem itself, and that in any case, the results should be either a useful reference implementation that will be much less vulnerable to the kind of disruption we are experiencing from MODE 5, or at least, should inform our specification activity?14:14
<acoburn>ajs6f: yes we are. I would note that at a certain scale (and provided a certain level of concurrency) the impl will likely be slow — but this will give us a simple impl that scales across machines in a sane manner14:18
<awoods>acoburn: do you want to publish javadocs on gh-pages?
<ajs6f>acoburn; Yes. This isn't about raw performance as much as it is about _predictable_ performance that will be good enough for many people.
* bseeger leaves14:19
<ajs6f>acoburn: We repeatedly found that it is hard to predict the behavior of Fedora/MODE in clustering.
<acoburn>ajs6f: cool, we're on the same page then
<ajs6f>acoburn: Then we can roar ahead. Roar.
acoburn: Do you want to try out the "streamy/loggy" style of RDF persistence? If we add checkpoints of some moderate sophistication, we have versioning there.14:20
<acoburn>awoods: what do you have to do to publish to gh-pages (I already pushed to that branch)
ajs6f: yes, I think that's a very good way to do it14:21
<ajs6f>acoburn: Okay, and for mapping URLs to paths. Do we just take the path segments as given (escaped, maybe) and tell people to respect the limitations of their own filesystem (<255 files/dir, etc.) or do we hash?14:22
<acoburn>awoods: it looks like that's something you need to do — I don't have access to the "settings" link in that repo
ajs6f: I'm inclined to hash — people have different notions of how many child nodes a given container should support14:24
<awoods>acoburn: you pushed the javadocs... I can link them into the main docs.fcrepo.org
<acoburn>awoods: but shouldn't they be here: http://fcrepo4-exts.github.io/fcrepo-java-client/14:25
<awoods>acoburn: I am not sure offhand why they are not showing up.
<ajs6f>acoburn: Okay, but then we have to impl parent-child containment, right? I have no huge problem with that. We can stick it in a sidecar of "administrative" RDF.
<awoods>acoburn: http://fcrepo4-exts.github.io/fcrepo-java-client/site/0.1.1/fcrepo-java-client/index.html14:30
<ajs6f>acoburn: How about has(path segments) is always a dir. It contains several RDF sources (in the style we have discussed) and optionally either a bitstream file or a dir of bitstream files (different versions). I think that covers the cases, right?
awoods: You have any object to doing this Fedora-on-the-Filesystem thing in fcrepo4-labs?
s/object/objection14:31
<awoods>acoburn: it looks like the markdown plugin is not configured correctly, to include the javadocs in the main index.html page. However, the javadocs are accessible here:
http://fcrepo4-exts.github.io/fcrepo-java-client/site/0.1.1/fcrepo-java-client/apidocs/index.html
<acoburn>awoods: oh I see, there's no index.html file in the gh-pages branch
<awoods>ajs6f: no objection at all. It seems like barmintor may have some ideas as well.
<ajs6f>awoods: I though barmintor was working on Akubra + a triplestore?14:32
<awoods>ajs6f: really?
<ajs6f>awoods: That's what he said to me, the last tie we spoke about it. Maybe he changed his mind.
awoods: Akubra really isn't that different from the JClouds API.14:33
<awoods>ajs6f: besides getting potentially better performance, scalability and management, it seems like a primary reason for an alt-impl is to prove out the REST-API.14:34
ajs6f: any alt-impl that helps us on that front is, well, helpful.
<ajs6f>awoods: "prove out" == "find places where it don't work right"
awoods: Or turns out to rely on MODE in secret.14:35
<awoods>ajs6f: yes. we want to know where our Mode impl has driven the API.
<ajs6f>awoods: It has driven it into the ground, if you want to cluster.
awoods: See what I did there?!14:36
<awoods>ajs6f: yea, that was pretty neat.
<ajs6f>awoods: Hurts, don't it.
https://github.com/fcrepo4/fcrepo-kernel-filesystem14:39
Shit, wrong account.
Okay, https://github.com/fcrepo4-labs/fcrepo-kernel-filesystem14:40
acoburn: Scala?14:46
<acoburn>ajs6f: you bet14:47
ajs6f: unless you want to check out clojure
<ajs6f>acoburn: Hm….
<acoburn>ajs6f: clojure would give us agents and refs for handling concurrency14:48
ajs6f: though scala has some similar actor-based primitives14:49
<ajs6f>acoburn: Yes, but based on the current HTTP layer, do we need that? We don't have much possible sharing going on between requests, at all. It's basically one request, one thread.14:50
acoburn: Unless you are thinking about multiple threads between the Kernel API and the filesystem?
* bseeger joins14:53
<acoburn>ajs6f: that's a good point, keeping this as one request == one thread is much simpler
<ajs6f>acoburn: This impl, for my money, is now _all about_ simplicity.14:54
<acoburn>ajs6f: so scala or clojure or java?14:55
<ajs6f>acoburm: I
am incliined to Scala, because for those who know no language other than Java, Scala is going to be less scary that Clojure.
<acoburn>ajs6f: sounds good to me14:56
<awoods>ajs6f: are you expecting some community contribution?
<acoburn>awoods: bseeger knows scala14:57
<ajs6f>awoods: Not immediately. But on the off chance of real success, this work could evolve into the reference impl.
awoods: Also bseeger just volunteered to work on it
<acoburn>awoods: and dhlamb also knows scala14:58
<ruebot>...i kinda know it. i use it a bit on my research grant project with apache spark and warcbase.
* peichman leaves14:59
<ajs6f>awoods: And ruebot, also, just volunteered.
<acoburn>ruebot++
* ruebot regrets speaking uo
:-D
<ajs6f>ruebot: are you not used to this by now? Welcome to Fedora.
<ruebot>ajs6f++
<bseeger>++ to scala, but what am I volunteered for?15:00
:)
<ajs6f>bseeger: Reimplementing Fedora. No, seriously.
<ruebot>http://ruebot.net/fda-pink.png -- end product of some scala, apache spark, and warcbase work... into gephi.
<awoods>https://wiki.duraspace.org/display/FF/2015-12-21+Performance+-+Scale+Meeting
<acoburn>bseeger: ajs6f and I will cheer from the sidelines15:01
<ajs6f>ruebot: That is a very big graph.
<bseeger>oh dear…
* ksclarke leaves
<ajwagner>(Coming out of vacation mode to call in now.)
<ajs6f>acoburn: Maven or SBT? Maybe Maven for the same reason as Scala over Clojure?15:02
* peichman joins15:03
<acoburn>ajs6f: yes (SBT is nice, but consistency and familiarity are probably more important)15:04
<ajs6f>acoburn: Right.
acoburn: Banana RDF with the Jena impl?15:06
<bseeger>bseeger's willing to help when she gets a fedora branded light saber15:08
<acoburn>ajs6f: I'd defer to you on that; I've _heard of_ Banana RDF, but I don't know anything about it
<ajs6f>acoburn: It's solid, it's under live development, and it does everything we need (including Sparql). I don't know of anything else that has all three characteristics.15:10
<acoburn>ajs6f: given its emphasis on immutability, it looks like a very, very nice library15:11
ajs6f: so yes, +1 on banana rdf
<ajs6f>acoburn: I'm happy to roll forward with it, especially because it can be used with Jena implementations of the types. That gives us a straightforward way to work with kernel-api.15:12
acoburn: What did you think of what I wrote ^^^ about directory structure?
<acoburn>ajs6f: yes, that makes a lot of sense to me15:13
<jrgriffiniii>Please forgive me, but where is the git repository for these original JMeter tests?15:14
<ajs6f>acoburn: Okay. In that case, I think we're all wrapped up here. ruebot and bseeger, ready to go ahead and impl this design? I'll be happy to review your PR. Or maybe we'll let dhlamb do that. He hasn't been pulling his weight on this effort.
<jrgriffiniii>(Nevermind, found it, sorry)15:15
<ruebot>jrgriffiniii: https://github.com/fcrepo4-archive/ff-jmeter-madness
<acoburn>ajs6f: so are URL paths going to be hashed? seems that not hashing them would be the simplest15:17
<ajs6f>acoburn: That's what I thought, but you argue above for hashing!
<acoburn>ajs6f: I know, and I think that hashing them is better, but it adds a certain complexity to the impl15:18
<ajs6f>acoburn: I'm not too worried about that level of complexity. And I am worried that without hashing, people will instantly blow out their directory systems.
acoburn: On the other hand, impling containment over filesystem containment _is_ really attractive. \15:19
<acoburn>ajs6f: containment can be handled with a little bit of cleverness and a sorted file of path names15:20
<ajs6f>acoburn: Yeah, although that file becomes a bottleneck.15:21
<acoburn>ajs6f: that file could also easily be split up into a series of blocks
<ajs6f>acoburn: I would rather not use it. I would rather keep triples (pointers) in the individual resources.
acoburn: We don't need to query an index. Walking out the "tree" from whatever resource is actually the subject of the request should do fine.15:22
<acoburn>ajs6f: ok, let's go for that strategy — containment triples stored with the resource
<ajs6f>acoburn: locality++
acoburn: So hashing, then?
<acoburn>ajs6f: yes
<ajs6f>acoburn: We have no atomic swap operation. We will have to use locks on the filesystem.15:23
lockfiles
<acoburn>ajs6f: we'll need to be careful about deletions — perhaps a separate thread, as we'd need to set a tombstone and then follow containment pointers to clean up child resources15:24
ajs6f: lockfiles seem like the appropriate mechanism15:25
<ajs6f>acoburn: Unless we get funky and clean up the children _first_.
acoburn: But there's still locking.
acoburn: Maybe the locks are on sections of the hierarchy.15:26
<acoburn>ajs6f: cleaning up the children first actually makes the most sense to me — then you don't end up with orphaned nodes if the process shuts down mid-deletion15:27
<ajs6f>acoburn: What about a swap approach? Not links (which may or may be supported) but using a move primitive?15:29
acoburn: Not rm, mv - //trash, then GC later.15:30
* jgpawletko leaves15:31
<acoburn>ajs6f: I like the swap approach, but say there are 100K descendent nodes (not all of which are direct children), if you mv the deleted node, what are the steps to deleting the descendants?15:33
<ajwagner>https://wiki.duraspace.org/display/FF/2016-01-11+Performance+-+Scale+Meeting15:35
^ Took longer as I had to make a new 2016 parent ;)
<ajs6f>acoburn: You mv, then you follow the triples on which we decided above. mv takes care of direct children, the rest is the others. It's not even close to atomic, but it's better than rm-in-place, I think.15:41
acoburn: You can mv and then still work out of the thing you mv'd, because the thread knows where it put it.
<ruebot>awoods: do i need a LICENSE for these JMeter configs? ...should I just throw an Apache License in the git repo for them?
<ajs6f>ruebot: Depends on how fast they run. In Virginia, you need a license for anything that can go more than 25 MPH.15:42
<ruebot>ajs6f++
* ajs6f is vomiting at the sound of his own joke.
<acoburn>ajs6f: wouldn't direct children be hashed to different locations? as opposed to living in the directory of the parent?15:43
* ruebot imagines mjgiarlo riding around on his moped holding out a voltmeter
<ajs6f>acoburn: Oh, we went with hashing. I have completely lost track now. Yes, in that case mv buys nothing. Forget about it.15:44
acoburn: Except, actually… it might buy you rollback.
<acoburn>ajs6f: yes, and rollback may be important, especially if the rm doesn't completely succeed15:45
<ajs6f>acoburn: Let's just agree for now that operations that go directly against the filesystem should be abstracted out into a set of primitives. We can then demand that the supertype of those operations support some useful stuff like commit/abort.15:46
acoburn: Does that sound reasonable?
<acoburn>ajs6f: yes, that seems reasonable
<ajs6f>acoburn: Yeah, it all sounds so plausible, at first.15:47
acoburn: Do you want to spend tie with a design page, or just start throwing some code up there and playing around?15:50
acoburn: This is all speculative, so I'm not sure how much time I want to spend documenting ether and vapor.
<acoburn>ajs6f: the moment we put anything on the wiki, it will be out of date or just plain wrong; I'd start by just throwing some code together15:51
<ajs6f>acoburn++
* bseeger leaves
<awoods>ruebot: dropping in a license is a safe idea: https://github.com/fcrepo4/fcrepo4/blob/master/LICENSE.txt15:56
* dhlamb leaves15:57
* jgpawletko joins15:59
* github-ff joins
[fcrepo-camel] acoburn opened pull request #101: update dependency versions (master...fcrepo-1861) http://git.io/vEOky
* github-ff leaves
* peichman leaves
* jgpawletko leaves16:05
* github-ff joins
[fcrepo-camel] awoods closed pull request #101: update dependency versions (master...fcrepo-1861) http://git.io/vEOky
* github-ff leaves
* peichman joins16:06
* ajs6f leaves
* bseeger joins
* travis-ci joins16:08
fcrepo4-exts/fcrepo-camel#242 (master - abc0ccd : Andrew Woods): The build passed.
Change view : https://github.com/fcrepo4-exts/fcrepo-camel/compare/cade1ab98f03...abc0ccd9e18f
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-camel/builds/98199602
* travis-ci leaves
* bseeger leaves
<acoburn>awoods: ping16:19
* bseeger joins
<awoods>acoburn16:20
<acoburn>awoods: for fcrepo-java-client, I've added a ./src/site directory, but how do I test that it generates a correct site?16:21
awoods: mvn site-deploy?
awoods: or just mvn site?16:22
awoods: nm, looks like it's the later one
<awoods>acoburn: ;)
<acoburn>awoods: and for the index.html file that's manually maintained, no?16:23
awoods: meaning, I'd need to just add that file to the gh-pages branch?
<awoods>acoburn: unless you want to link in the fcrepo-java-client docs into docs.fcrepo.org16:24
acoburn: that is what the other modules are doing.
acoburn: which means updating index.html in gh-pages of fcrepo416:25
<acoburn>awoods: that works for me16:26
<awoods>acoburn: thanks
<acoburn>awoods: that seems simpler than having lots of separate index.html files floating around
<awoods>acoburn: agreed16:27
acoburn: feel free to clean up the existing index.html if you are so inclined: http://docs.fcrepo.org/
* ajs6f joins16:48
fcrepo-kernel-api still depends on jcr-20. I can't wait until that is gone. What's holding us back again?16:49
acoburn: FedoraResource::geVersion returns a javax.jcr.version.Version. How the heck are we going to impl that? We will have to construct special versions of Version!16:55
acoburn: Eh, I'm going home for the day. Maybe tomorrow.
* ajs6f leaves16:56
* acoburn leaves17:07
* jrgriffiniii leaves17:12
* peichman leaves17:19
* bseeger leaves17:33
<ruebot>awoods: so. uh. i think i just got containers working in jmeter.18:23
awoods: https://github.com/ruebot/fcrepo4-jmeter18:28
<awoods>ruebot: that is great. Was it fun?18:37
<ruebot>awoods: not until i a bunch of "INFO 18:25:17.629 (FedoraLdp) Ingest with path: /c1/99/34/19/c1993419-16e1-41c2-9e61-ce36d6179e44" flying across my terminal.18:39
i saw*
<awoods>ruebot: Thanks for doing that. I look forward to giving it a spin.18:40
<ruebot>awoods: i added a comment to today's notes like promised.
<awoods>ruebot: you are a man of your word.
<ruebot>:-)
* github-ff joins19:21
[fcrepo-module-auth-rbacl] awoods created rc-4.5.0 (+1 new commit): http://git.io/vE3vp
fcrepo-module-auth-rbacl/rc-4.5.0 b7dcae2 Andrew Woods: Prepare 4.5.0-RC
* github-ff leaves
* github-ff joins
[fcrepo-module-auth-webac] awoods created rc-4.5.0 (+1 new commit): http://git.io/vE3fv
fcrepo-module-auth-webac/rc-4.5.0 e407488 Andrew Woods: Prepare 4.5.0-RC
* github-ff leaves
* github-ff joins19:22
[fcrepo-mint] awoods created rc-4.5.0 (+1 new commit): http://git.io/vE3fJ
fcrepo-mint/rc-4.5.0 72b81e6 Andrew Woods: Prepare 4.5.0-RC
* github-ff leaves
* travis-ci joins
fcrepo4/fcrepo-module-auth-rbacl#41 (rc-4.5.0 - b7dcae2 : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4/fcrepo-module-auth-rbacl/commit/b7dcae2a543e
Build details : https://travis-ci.org/fcrepo4/fcrepo-module-auth-rbacl/builds/98231100
* travis-ci leaves
* travis-ci joins
fcrepo4/fcrepo-module-auth-xacml#91 (rc-4.5.0 - d602312 : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4/fcrepo-module-auth-xacml/commit/d6023124ebe1
Build details : https://travis-ci.org/fcrepo4/fcrepo-module-auth-xacml/builds/98231104
* travis-ci leaves
* travis-ci joins
fcrepo4-exts/fcrepo-transform#19 (rc-4.5.0 - 8a2ec8b : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4-exts/fcrepo-transform/commit/8a2ec8ba301e
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-transform/builds/98231130
* travis-ci leaves
* travis-ci joins
fcrepo4-exts/fcrepo-mint#3 (rc-4.5.0 - 72b81e6 : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4-exts/fcrepo-mint/commit/72b81e61c97f
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-mint/builds/98231128
* travis-ci leaves
* travis-ci joins19:23
fcrepo4-exts/fcrepo-audit#63 (rc-4.5.0 - 1d4361a : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4-exts/fcrepo-audit/commit/1d4361aa96ff
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-audit/builds/98231155
* travis-ci leaves
* github-ff joins19:24
[fcrepo-webapp-plus] awoods tagged rc-4.5.0 at 9be1990: http://git.io/vE3fH
* github-ff leaves
* github-ff joins
[fcrepo-transform] awoods tagged rc-4.5.0 at 5fb8f22: http://git.io/vE3f7
* github-ff leaves
* travis-ci joins19:25
fcrepo4-exts/fcrepo-webapp-plus#105 (rc-4.5.0 - 1e77300 : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4-exts/fcrepo-webapp-plus/compare/rc-4.5.0
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-webapp-plus/builds/98231530
* travis-ci leaves
* travis-ci joins
fcrepo4-exts/fcrepo-mint#4 (rc-4.5.0 - 72b81e6 : Andrew Woods): The build has errored.
Change view : https://github.com/fcrepo4-exts/fcrepo-mint/compare/rc-4.5.0
Build details : https://travis-ci.org/fcrepo4-exts/fcrepo-mint/builds/98231570
* travis-ci leaves
<64MAAK02B>Project fcrepo-module-auth-rbacl build #909: UNSTABLE in 7 min 49 sec: http://jenkins.fcrepo.org/job/fcrepo-module-auth-rbacl/909/19:29
* travis-ci joins19:37
fcrepo4/fcrepo4#4213 (rc-4.5.0 - de79c1e : Andrew Woods): The build passed.
Change view : https://github.com/fcrepo4/fcrepo4/commit/de79c1e09a76
Build details : https://travis-ci.org/fcrepo4/fcrepo4/builds/98231089
* travis-ci leaves
* ksclarke joins19:50
<64MAAK02B>Yippee, build fixed!19:51
Project fcrepo-module-auth-rbacl build #910: FIXED in 5 min 17 sec: http://jenkins.fcrepo.org/job/fcrepo-module-auth-rbacl/910/
Project fcrepo-camel-toolbox build #292: UNSTABLE in 10 min: http://jenkins.fcrepo.org/job/fcrepo-camel-toolbox/292/20:03
* dhlamb joins21:24
* github-ff joins21:55
[fcrepo4] acoburn opened pull request #963: update developer documentation page (gh-pages...fcrepo-1862-ghpages) http://git.io/vE3Vy
* github-ff leaves
* dhlamb leaves23:59