Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster dict #103

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Faster dict #103

wants to merge 6 commits into from

Conversation

alanjds
Copy link

@alanjds alanjds commented Oct 6, 2018

Rebased PR google#259 over #85

Kudos @nairb774 for the original code and @trotterdylan for original guidance and comments.

nairb774 and others added 3 commits February 18, 2017 19:11
This improves the speed of the previous dict implementation through a reduction
in the number of atomic loads in the read path (max 1 when the dict is read-only
- think globals) as well as the number of allocations needed in the write path.

Overall the performance is improved by about 30%.

Some of the major changes are as follows:
* The internal table layout was changed from []*dictEntry to []dictEntry
  reducing a memory indirection as well as hopefully improving the speed of slot
  probing in insertAbsentEntry as well as lookupEntry.

* Many iteration operations which might have needed to grab a relatively
  expensive lock previously can now do so without locking if the dict is in the
  read-only mode.

* The sizeof(Dict) increased some as a few variables (used and fill) were moved
  from the dictTable to Dict itself. The addition of a `write` and `misses`
  values to the Dict makes the overall memory usage of Dict generally larger.
  This is offset by the type change of dictTable and the reduction of additional
  pointers there. An empty dict weighs in at 304 bytes compared to 176
  previously. At 4 elements, both the old and new implementation use 304 bytes
  of memory. From that point on, the new implementation actually uses less
  memory.
@alanjds
Copy link
Author

alanjds commented Oct 8, 2018

Looks like I messed with some calls during the merge...

@alanjds alanjds requested a review from a team October 8, 2018 16:04
@r21gh r21gh requested review from r21gh and funny-falcon October 8, 2018 16:46
@funny-falcon
Copy link

@rezaghanbari I'm looking on

Copy link

@funny-falcon funny-falcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like change []*dictEntry -> []dictEntry, and I bet this change has most positive impact on performance due to better cache locality and less allocation count.

I don't like copy-on-write part. Having write part under lock, read part and iteration could be easily done lock-free without copy-on-write:

  • never overwrite hash and key of already written dictEntry.
    Delete should be accomplished by writing deletedValue into value instead of deletedKey into key.
  • writing should be done in three atomic.Stores: hash, then key, then value.
  • reading entry should be done as following:
    key := atomic.Load(&entry.key)
    if key == nil {
        return emptyValue
    }
    for {
        val := atomic.Load(&entry.Value)
        if val == &deletedValue {
            return empty
        } else if val != nil {
            return value
        }
        runtime.Gosched()
    }

Hash value should be also read after checking key

Remark: hash == 0 could be used instead of key == nil to mark never written yet entry. It is preffered order is preffered for GetEntry, because it more often will touch only hash without need to check key. Then if key's hash already equal to zero, it should be replaced with some constant.

Probably I'm mistaken, and this way GetEntry will be considerably slower because of more atomic operations. But this design should be at least tried and compared with.

There is other "issue" with dict: Python 3.7 adopts order preserving dict by storing entries in an array and storing indices in a hash part. This is the way array works in PHP7, Hash implemented in Ruby 2.5, Map in Javascript (i suppose), dict in PyPy (iirc). While "order preserving" were not declared as part of language specification yet, it is a quite pretty property.
So, if change of dict implementation is desirable, then it is better to implement "order preserving" variant right now.

@alanjds
Copy link
Author

alanjds commented Oct 9, 2018

I agree with @funny-falcon about 3.7 order preservation. If a behavior of 3.X is 2.7-compatible, I will almost always be in favor of it. Python 3 is a goal.

It is ok to have many PRs refactoring dict, for benchmark comparison...

The actual benchmarks looks like this PR is better than the existing implementation. @funny-falcon, do you think we should wait for challenge implementations before merging or merge then compare with the challenges?

@nairb774
Copy link

nairb774 commented Oct 9, 2018 via email

@funny-falcon
Copy link

do you think we should wait for challenge implementations before merging or merge then compare with the challenges?

Doesn't matter. I'll try to find time this week to make alternative variant, but I don't claim I will complete.

Most dicts are "readonly" or at least write rarely.

Yes. But some are not.

I implemented a fully lock free map, but it was slower. The implementation
I followed was the one described by Cliff Click here
https://youtu.be/HJ-719EGIts. Go doesn't directly expose "light" enough
atomic loads - at least that was the case at the time I wrote the initial
patch.

Yes, I've said already that I could be mistaken, and "lock-free" could be remarkably slower.
We need to measure.

@funny-falcon
Copy link

I've spent a day today... my first attempt were noticeably slower. Will try again next week.

@funny-falcon
Copy link

funny-falcon commented Oct 21, 2018

I've done ordered dictionary at #111 with []dictEntry instead of []*dictEntry.
It is not slower on lookup than current dictionary implementation for lookup, though not faster either.
It is much faster on iteration mainly because of trick: it doesn't create dictTable for empty dictionary, and therefore improves creation of StopIteration exception.

I didn't fix comments yet, that is why it is marked as WIP.

@funny-falcon
Copy link

#111 is ready for review

@alanjds
Copy link
Author

alanjds commented Oct 24, 2018

Today I was thinking about TimSort and wondering if a Dict can adapt itself for the use behaviors.

What about bookkeeping reads vs writes and change from Copy-On-Write (good for reads) into something else when is perceived to be a write-heavy dict?

@funny-falcon
Copy link

What about bookkeeping reads vs writes and change from Copy-On-Write (good for reads)

I still don't believe Copy-On-Write gives noticeable performance improvement.
How should I run benchmarks properly to measure #111 against this one?

@alanjds
Copy link
Author

alanjds commented Oct 25, 2018

How should I run benchmarks properly to measure #111 against this one?

Can you point somewhere on this, @nairb774 ?
(sorry for the poking 👉)

@nairb774
Copy link

I had used https://github.com/grumpyhome/grumpy/tree/master/grumpy-runtime-src/benchmarks for the primary benchmarks. The testing.B benchmarks added to the Go code (and referenced in the initial PR) were created as an early indicator for how real programs might behave. The proof is in real code - microbenchmarks allow isolating and measuring opportunities. Integrating more of https://github.com/python/performance wouldn't be a bad idea. I was using it for measuring the difference in implementation performance available.

All that being said, getting other, "real" sizable code chunks to measure is always useful.

@alanjds
Copy link
Author

alanjds commented Oct 27, 2018

The testing.B benchmarks added to the Go code (and referenced in the initial PR) were created as an early indicator for how real programs might behave.

You mean this new bigDict test on your PR ? https://github.com/grumpyhome/grumpy/pull/103/files#diff-568fd4ad4f28d648af0e9d4befa2acd9R583

	bigDict := newTestDict(
		"abc", "def",
		"ghi", "jkl",
		"mno", "pqr",
		"stu", "vwx",
		"yzA", "BCD",
		"EFG", "HIJ",
		"KLM", "OPQ",
		"RST", "UVW",
		"XYZ", "123",
		"456", "789",
		"10!", "@#$",
		"%^&", "*()")
	cases := []invokeTestCase{
		{args: wrapArgs(NewDict()), want: NewList().ToObject()},
		{args: wrapArgs(newTestDict("foo", None, 42, None)), want: newTestList(42, "foo").ToObject()},
		{args: wrapArgs(bigDict), want: newTestList("abc", "yzA", "KLM", "XYZ", "10!", "456", "stu", "%^&", "mno", "RST", "ghi", "EFG").ToObject()},
	}

If so, it should be added to the test suite anyway...

@alanjds
Copy link
Author

alanjds commented Oct 27, 2018

Integrating more of https://github.com/python/performance wouldn't be a bad idea.

Agree. Would be a nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants