plumbing/format/idxfile: add new Index and MemoryIndex #896

erizocosmico · 2018-07-19T13:29:13Z

Closes #893

This PR adds the Index and EntryIter interfaces to idxfile and renames Idxfile to MemoryIndex as described in the design document.

The new MemoryIndex now works like the JGit index reader, it loads everything as byte slices. The only change I did that diverges from the JGit implementation is adding a mapping between the fanout index and the slice index to drastically reduce the memory usage.

In JGit they use an array of 256 byte arrays for the name, offset and crc32 tables. This is such an overkill when you don't have a gigantic packfile that uses all slots of the array. By doing this, we only use as much memory as we need.

According to the benchmarks, the speed remains the same decoding the idx file, and the memory consumption is slightly increased. Access to the entries should be way faster now due to how the accesses are done.

Benchmarks have been added for all operations on idxfiles.

Warning

This will leave the repo in a broken state (which is why it's being PR'd to a feature branch), since previously there was a packfile.Index that was writable and now there is not. This is a work in progress that will probably need some of the work being done by @jfontan right now to be fully working.

Everything inside idxfile package is expected to be working correctly, though.

Caveats

As mentioned in #893, some way of retrieving hashes by offset may be needed for the packfile decoder. After some discussion with @jfontan we agreed that it's better to wait until all the pieces are ready to see how they play together before adding such a change. That is only needed when the source is not seekable, so another solution might be to just build a map from offset to hash for those cases.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

ajnavarro · 2018-07-24T07:44:12Z

If we are using the Decoder to fill a MemoryIndex struct, the Index interface is useless in my opinion.

We should use the Decoder inside the implementation, and depending on that implementation, use the decoder in different ways (loading everything at the beginning, loading each part on demand, and so on).

So, instead of doing that, we should refactor the decoder to be able to obtain the data, instead of sending a struct to fill it.

If that doesn't make sense, Index interface should be deleted and MemoryIndex should be Index. Doing that, the implementation of new Indexes will be way more difficult.

We can go even further; Instead of having a Decoder, we can have some decode helpers. Using the helpers (basically parsers) we can obtain the data on idx file in different ways, depending on our use case or index implementation.

WDYT?

erizocosmico · 2018-07-24T08:00:06Z

Yeah, right now Index is kind of useless. What is the use case of having other index implementations? Because if we can't think of any case for implementing other indexes maybe there is no point in having that interface right now and we could just have the MemoryIndex be the index.

If there is such a case and we keep the Index interface I think, as you say, we should refactor the decoder so it can be used by any other index instead of passing the MemoryIndex struct. What I don't know is what shape the new Decoder or helpers should have, because the decoding right now is very coupled with how the MemoryIndex does things (fanout and fanout mapping).

plumbing/format/idxfile: add new Index and MemoryIndex

009f106

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

erizocosmico requested review from jfontan and ajnavarro July 19, 2018 13:29

jfontan approved these changes Jul 19, 2018

View reviewed changes

jfontan mentioned this pull request Jul 19, 2018

Feature/new packfile parser #898

Merged

erizocosmico mentioned this pull request Jul 20, 2018

plumbing: packfile, new Packfile representation #899

Merged

erizocosmico merged commit a8ff3e5 into src-d:perf/packfile-reads Jul 26, 2018

erizocosmico deleted the feature/new-index-decoder branch July 26, 2018 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plumbing/format/idxfile: add new Index and MemoryIndex #896

plumbing/format/idxfile: add new Index and MemoryIndex #896

erizocosmico commented Jul 19, 2018

ajnavarro commented Jul 24, 2018

erizocosmico commented Jul 24, 2018

plumbing/format/idxfile: add new Index and MemoryIndex #896

plumbing/format/idxfile: add new Index and MemoryIndex #896

Conversation

erizocosmico commented Jul 19, 2018

Warning

Caveats

ajnavarro commented Jul 24, 2018

erizocosmico commented Jul 24, 2018