Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization jumps #112

Closed
facontidavide opened this issue Oct 17, 2019 · 36 comments
Closed

Localization jumps #112

facontidavide opened this issue Oct 17, 2019 · 36 comments

Comments

@facontidavide
Copy link

Hi,

I am trying slam_toolbox with a rosbag. I play it with the option --clock and setting use_sim_time to true.

The creation of the serialized map was ok, I used the command

        roslaunch slam_toolbox online_sync.launch

As suggested in the README. I save the serialized map as "test_map". I double checked the .ros folder and it is created correctly.

Things start getting more complicated in localization mode. I use

       roslaunch slam_toolbox localization.launch

and the following parameters:

      mode: localization
      map_file_name: test_map
      map_start_pose: [0.0, 0.0, 0.0]

But apparently the localization is not working.

image

After a closer look at the data, it seems that the frame map->odom is published periodically, but it is always the same value.

Any clue?

@facontidavide
Copy link
Author

Digging a little bit more:

the TF map->odom is updated by setTransformFromPoses,

but apparently if processor_type_ == PROCESS_LOCALIZATION then that method is never called.

That is weird. Isn't localization about calculating the map->odom transform continuously?

@facontidavide facontidavide changed the title Localization with a rosbag Does Localization publish the map->odom transform? Oct 17, 2019
@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

https://github.com/SteveMacenski/slam_toolbox/blob/melodic-devel/src/slam_toolbox_localization.cpp#L153

Looking here, you are correct, that was a mistake in my refactor & I'm not surprised I didn't catch it since it was all tested on simulation and bag data. Though I do test on a physical robot before I release large changes so I'm not sure how that was missed.

How this should read is:

  // if successfully processed, create odom to map transformation
  if(!processed)
  {
    delete range_scan;
    range_scan = nullptr;
  } else
  {
    setTransformFromPoses(range_scan->GetCorrectedPose(), karto_pose,
      scan->header.stamp, update_reprocessing_transform);
  }

and then the setTransformFromPoses call that exists goes away.

Agree?

@SteveMacenski
Copy link
Owner

#113

Look good to you?

@SteveMacenski
Copy link
Owner

Keep in mind map_start_pose and the map file are only required for starting localization with that given position and map, but you can always do with the service / rviz plugin afterwards. If you're not at 0,0,0 then its going to obviously be off

@facontidavide
Copy link
Author

I will try it

@facontidavide
Copy link
Author

It is much better. I think you should merge this changes.

Anyway, how can I achieve a more aggressive localization (matching to map)?

I know I should not expect "miracles" but it seems that big jumps in the localization are done only once or twice a minute.

Which parameters are the most relevant?

slam_toolbox-2019-10-17_21.00.42.mkv.tar.gz

Attached video

@SteveMacenski
Copy link
Owner

I have never seen something like that, I need to know more about your setup. What laser, is there enough CPU available, how did you generate the original map, how did you match against the original map, etc

@facontidavide
Copy link
Author

I have a very recent I7 and the software is compiled in release. Less than 50% of a single CPU is used (I have 8 CPUs).

The laser is published a 10 Hz.

I have another video where you can better appreciate that it is "supposed" to localize easily, but it does only after a huge time (more or less at 1:02 of the video)

slam_toolbox-2019-10-17_21.20.19.mkv.tar.gz

I don't know if it may help, but this is the update frequency of the map->odom trnasform

image

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

Is this after the patch we just made?

This is odd. If you can give me a mapping bag run and a bag run of this attempt at localization that is failing, I can take a look (and hopefully a video with those datasets of it not working on your end to compare)

@facontidavide facontidavide reopened this Oct 17, 2019
@facontidavide
Copy link
Author

facontidavide commented Oct 17, 2019

before the path, basically map->odom was NEVER updated, so we can't even compare.

After the patch, map->odom is updated about once every 1.2 seconds and big jumps happen only once per minute. Let me prepare the rosbags.

Thank you VERY much for your help.

@facontidavide
Copy link
Author

I forgot to mention something vital: I am using branch melodic in my computer... that has Kinetic!!!

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

map->odom is updated about once every 1.2 seconds

well that's just based on your settings / movement. Its not time based its angular distance and linear distance based like in AMCL.

Sorry, to be clear, which version of this code is running? Both the branches compile with both versions, I'd actually be interested if you were running melodic and you were to try the kinetic branch (which will not have the filename/position parameters, you'll have to localize using the 2D pose estimate tool in rviz) if it worked or still having the odd behavior. I haven't seen this behavior in any of my testing but this could be more hardware specific, or the original localization quality was poor.

These videos... are they the beginning of localization or after some moving around? I assume you give it a good, legit, position when it starts up. If you're starting up with a poor match, then that might be part of this

@facontidavide
Copy link
Author

facontidavide commented Oct 17, 2019

  • I start with very bad match and later I use /intialpose (from rviz) to give it a better estimate. Question: isn't initialpose supposed to "reset" the previous history? I think it should.

  • I am using branch "112", because otherwise I don't see any localization at all.

@SteveMacenski
Copy link
Owner

Question: isn't initialpose supposed to "reset" the previous history? I think it should.

Mhm, good thought. Certainly wouldn't hurt. It's a little involved to change that but I'll file a separate ticket

@SteveMacenski
Copy link
Owner

I'm going to rename this ticket to designate where the discussion has gone

@SteveMacenski SteveMacenski changed the title Does Localization publish the map->odom transform? Localization jumps Oct 17, 2019
@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

I start with very bad match and later I use /intialpose (from rviz) to give it a better estimate.

What happens if you start with a good match, using the map_start_pose with the actual pose?

If that works fine, then its probably just clearing the buffer and I can do that in an afternoon

@facontidavide
Copy link
Author

Ok, attached there is the relevant data:

input_laser.bag.tar.gz

This first rosbag is the one I use to create the serialized map with online_sync.launch

Note that I have to change the name of the base frame to base_link and subscribe to /front_laser/scan

Afterward, I launch localization.launch using the serialized map.

  • As you suggested, I adjust the initial position to be quite good (small angular error). Localization "snaps" to correct pose at time 00::26 of the video

  • at time 00:40, I use Rviz and /initialposition to introduce a tiny error in the pose of the robot. This error keep growing, instead of being compensated.

  • Finally, at time 01:25, the introduced localization error disappears.

Further, there is a video and the recorded rosbag

slam_toolbox-2019-10-17_22.57.31.mkv.tar.gz

https://drive.google.com/file/d/14MvnFYP5oLMBTmnPmuLSF9MVL5Djzaje/view?usp=sharing

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

Awesome, I'll take a look tomorrow or this weekend.

to summarize your thoughts, you think the issue is introduced because of the initialpose? You otherwise do not see any such drift? You mention no real issue 0-26s and only issue when that's introduced. Or do you also see this occur when initialpose is not used at all?

If its all surrounding initial pose, I'm inclined to agree with you and should be a medium difficulty fix I can hammer out in a day

@facontidavide
Copy link
Author

facontidavide commented Oct 17, 2019

Once the robot is localized, there is no drift. But I would like to test how robust localization is to disturbances, this is the reason why I was "annoying" the robot myself from RViz.

Let's put it this way: as a user, I expect that:

  • the actual initial pose or the topic /initialpose behave similarly.
  • I want to know which parameters should I tune to converge in less than 26 seconds, that is... a lot!

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 17, 2019

My guess is that it actually snaps back super quickly, its that you're "abusing" the intent of the initialpose topic, so what you're seeing is an artifact of not resetting the entire state of localization on initialpose because I expect it to be used few and far between (or not at all in a production environment). Clearly how your using it makes me have to rethink that a bit more, but want to make sure I'm tackling the right problem

26 seconds seems suspect if it comes back, that probably indicates to me that's just when the rolling buffer from when you published a pose expires then. How fast are you moving? I think my default chain size is 10 so if you were slow moving I could see those numbers working out

@facontidavide
Copy link
Author

Keep in mind that in the first 40 seconds of the video, I do NOT use /initialpose , not even once.
I used this parameter to initialize the robot pose:

      map_start_pose: [4.0, 1.0, -0.7]

The velocity of the robot is about 0.7 m/sec

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 18, 2019

I'll take a look this weekend, this represents an annoying issue that needs to be fixed. I've done similar testing in my environments and I don't see this problem.

What laser are you using? How good is your odometry? From the beginning of this video, you're very poorly localized. Is that the starting loading point or where you moving around a bit before that?

Again, I'm trying to be really specific here because I'm not understanding exactly what you're doing and some of the information how I'm interpreting what you're saying is contradictory: Once this thing localizes to the map do you see these jumps or is it only when you initialize or artificially disturb it;.

In all of my tests and for the use of localization, I assume you know very well where you are at the start. The example being that the robot lives at a dock or cached its last pose before being turned off. If you do not start well localized, that is something very different from the industrial usecase I'm supporting and I'd merge PRs to help but not on my plate to add (that's a global scan matching problem). But once its localized on startup from knowing the position, I've never seen jumps how you've shown on 3 different industrial robots with industrial grade lidars that I have access to

There's probably something to resolve to fix what you're up to but I'm trying to understand very clearly what your test situation is prior to and during your experiments so I know where the issue(s) might be

When you artificially disturb the graph by using initialpose it doesn't shock me at all you'd see it take awhile to get back to the real pose from what we discussed above with needing to reset that state fully and that's on my list. Keep in mind this isn't a filter this is a graph, so when you do that you're creating really bogus constraints that the optimizer is trying to fix up and those "disturbances" aren't typical drift from something like odometry or wheels slippage, those are substantive corrections in the positioning the constraints then have a huge error term to try to deal with. You wouldn't "kick" the robot half a meter to the left while creating a SLAM map, you shouldn't do that here as well given the style of localization is modeled after how SLAM maps are built.

With that said, fully resetting the state will help alot, but feeding bad global corrections (which are essentially taken like "the word of god") is going to mess up any graph based solution for a finite time horizon

@facontidavide
Copy link
Author

@SteveMacenski , first of all, thank you very much for your time.
Since I don't want to waste it, let me be as clear as possible and answer your questions:

What laser are you using? How good is your odometry?

The laser is obtained from a 3D Lidar, transformed using pointcloud_to_laserscan. The quality of the distances looks very good to me (low noise, long range).
The odometry is obtained fusing wheels and IMU and it is pretty good. Any other localization and mapping package work very well with this data.

From the beginning of this video, you're very poorly localized.

Not in the last video and rosbags I sent to you. position is almost correct and rotation is off by maybe 10 degrees. other localization packages deal with this amount of error with no problem.

Once this thing localizes to the map do you see these jumps

No. It takes an unreasonable amount of time in my opinion to eventually localize itself, but once it does, the localization is good.

If you do not start well localized, that is something very different from the industrial use case I'm supporting

I understand that you are not aiming to global localization, I really do. But I think that it is weird that it takes so much time /26 seconds!) to localize when a tiny initial error is present. I do industrial robots too, and errors of this magnitude may happen.

As I said before, I probably need to understand if i can influence this behavior somehow tuning some parameters.

When you artificially disturb the graph by using initialpose it doesn't shock me at all you'd see it take awhile to get back to the real pose

Thanks for the explanation, of course it makes perfect sense what you say. I am looking forward for issue #116 to be addressed.

Said that, I am very grateful for your work and your prompt reply.

Cheers

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 18, 2019

No worries, I think you're the first non-me person to really dive into localization in a serious way so I appreciate the questions and your needs so we can make this work as well as possible for everyone. It's easy to miss more general peoples' desires and needs when its just me sitting in a dark dungeon alone making robots move.

3d lidar... ok then I assume the range is >>20m, but I'm only seeing a relatively short range. The walls also look a little hairy for a 3D lidar but I've never just taken the horizontal band and tried to do something with it in 2D mode. Probably isn't an issue just an observation. Though those usually have pretty low horizontal resolution so that might be a factor here to consider. What's the unit if you don't mind sharing (or just give me the horizontal spec). Odometry sounds good then. I see you made a map in the first place so that's probably OK.

Screenshot from 2019-10-18 14-39-14

This is frame 1 that I have from your last video, it looks pretty off, but my correction factor step should take care of more of the fine tuning of it. I'm guessing the issue with that particular localization isn't actually the package as much as that region doesn't have alot of features to grab onto (that section of the map have that large gap of missing stuff, there's already some copies of that rounded object up top, there's some laser readings on the top right that dont match anywhere to the map). EI that starting location looks like the least-ideal place in the entire map that I can see to try to match against. I think what you see there is worst case situation.

No. It takes an unreasonable amount of time in my opinion to eventually localize itself, but once it does, the localization is good.

OK good, so we're on a common starting point, this is also my experience. So we can work from here. So it looks like our problem is "how to localize faster" on initialization. Would you agree?

But I think that it is weird that it takes so much time /26 seconds!) to localize when a tiny initial error is present

Eh, on the scheme of hours of operation 26s isnt much but I get your concern, we can work on this point. An option is we could, on initialization or initialpose dynamically scale the buffer size. Dont take the first N but take the first 1 N times, then increase maximum size to 2 for N cycles, etc until at N sized buffer in steady state. I don't know if that'll fix it, but it should speed it up.

I probably need to understand if i can influence this behavior somehow tuning some parameters.

There are probably (definitely) parameters that can be tuned to make this faster. Now what they are is a long, long list and its a balancing act. All of the params in the Matcher Params section of the README are in play for that one. I'd first look at the _penalty _offset parameters. Those may give you more bang for your buck of tuning time.

Thanks for the explanation, of course it makes perfect sense what you say. I am looking forward for issue #116 to be addressed.

Glad I could make it clear -- a PF will act very differently than this. But once you know where you are and you don't "kick" the robot, you're as likely to delocalize as you are in SLAM mode, which given how I've modified the core library and tuned it, is really really unlikely. That's the huge benefit I derive from this work.

In line with your "reset buffer" service, I think you'd see after "relocalization" than 26 seconds if the buffer size was smaller, so dynamically resizing that buffer would help.

@facontidavide
Copy link
Author

The walls also look a little hairy

This is an outdoor map... so the hairy points are... bushes!

So it looks like our problem is "how to localize faster" on initialization. Would you agree?

Yes, exactly!

@SteveMacenski
Copy link
Owner

Ahhhhhhhhh I see this is a parking lot

@facontidavide
Copy link
Author

Now that I understand more about what is going on, I am tempted to close this issue... and open more focused one in the future ;)

Nevertheless, I think you should keep in the back of your mind the fact that there might be a way to speed up the initial localization, may be being a little more aggressive with pose estimation when the circular buffer is still empty (just saying, I don't know if it make sense).

@SteveMacenski
Copy link
Owner

OK, feel free!

@SteveMacenski SteveMacenski pinned this issue Oct 18, 2019
@SteveMacenski
Copy link
Owner

I'm going to pin this because it was a good discussion I think will yield value to others

@SteveMacenski
Copy link
Owner

SteveMacenski commented Oct 22, 2019

Quick followup - I haven't forgotten about this, I just haven't had the cycles yet to process your data and try to reproduce. My hope was this evening but I got caught up on IROS planning.

My plan is to address tomorrow and into Wednesday. I'll probably have an update by Wednesday evening.

If I'm blocking you, something you could try that would be helpful would be to see if you change the scan_buffer_size in localization mode to 2 from 10 to see if it converges faster (maybe lets say... 20-50%?). You'll likely see other stuff going on like more frequent loop closures but that all can be worked out as implementation details. What that is doing is changing the localization rolling buffer size to 2 so if one of the contributing factors is the memory of the disturbance, then this should shorten that memory (and is also used for loop closure candidate selector chain size, so I'll change that to 2 different parameters)

@SteveMacenski
Copy link
Owner

Assuming I downloaded all the right assets its my plan to play with this on the 14 hour flight to ROSCon. Hopefully we can meet up and I'll have some thoughts.

@facontidavide
Copy link
Author

Sorry, not attending ROSCon

@SteveMacenski
Copy link
Owner

SteveMacenski commented Nov 4, 2019

@facontidavide I'm having a very hard time trying to get your localization bag to play well. Not sure if you have some non-standard configuration or some changed some stuff in this package you didn't mention. I've found some things in this ROS1 package in melodic I've needed to change to get it to run so not sure if you made changes you hadn't mentioned.

Can you give me the specific directions you used to run the localization bag file to localize? I'm failing to even do basic things after filtering out the maps/map->odom TF/etc to just the base data on this bag file.

This is quite a little puzzle

Edit: I think I'm finding something's off with my computer. I'll need to investigate this.

@SteveMacenski
Copy link
Owner

Ah figured out why, you included /clock in the bag and I added the --clock flag. After I filtered out the TF stuff that should be removed and the map topics, I'm getting something running

@SteveMacenski
Copy link
Owner

OK @facontidavide, have some updates.

It looks to me that this isn't probably an issue requiring new code to solve (beyond the clearing buffer on initial pose). I found that when I changed the buffer size and the loop match size to 3, it converged well. In fact, the only case that I could verify that numbers you gave for convergance times of 22-26s was when starting the bag at that location, see below.

start_labeled

That particular location has alot of problems. There's large chunks of missing areas, the bushes are shaky with the laser scan so there's no good axes to pull from that. Also you only really have reliable information on one side but it represents <50% of all the information coming in from other distorted sources. I would then expect it would take awhile to converge, but I don't see the convergence relating to time or distance traveled, but rather related to getting out of that area. The 22 second time I count from motion starting to convergence is mostly just getting to an area where all the information is reliable.

For example see below. In this case, I started the rosbag and only started the localization engine at t = 20 seconds, which is after we mostly leave the area of bad or no features. The time when you see convergence is exactly the same absolute time that starting from the beginning of the bag gives. Therefore the issue isn't related directly to the time or buffer sizes, but the fact that that area is just not a good position for positioning. This convergence time was 8 seconds, but again it relates more to having a few good sequential measurements without the poor information. You can also see I had the starting position very removed from the actual position, far more than you had in the bag file and we easily return back to the true position (faster and more reliably than I would have thought, actually).

ezgif com-video-to-gif(3)

Finally, because I was curious about how extreme I could take this, I forwarded the bag now to around 45 seconds to do the same thing but then made the positioning of the robot exceptionally poor. In this case, we converged in about 6 seconds. I would only have expected that to work with a particle filter, but I guess by default expansion search region actually makes this work well enough by itself.

ezgif com-video-to-gif(4)

The tl;dr - I'm updating a couple defaults, but I think the example situation you gave was not representative of an expected production map because of the massive missing sections of navigable space. Because of the bad starting point, incomplete map, and exceptionally poor data for most of the returns of the laser, it cannot accurately relocalize in that region. Once it leaves that region to an area sufficiently well mapped, it easily converges to a solution within a couple updates, even in the dataset you provide. The worst case I found was still < 10 seconds to convergence which is well in line with a particle filter.

@facontidavide
Copy link
Author

facontidavide commented Nov 26, 2019

Thanks a lot for spending time investigating this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants